Issue 24 PDF

12
page 3 news from the EGI community ISSUE 24 JULY 2016 TOP STORIES page 7 MORE Engage - Grow - Innovate www.egi.eu EMSODEV pilot on the FedCloud page 2 Inspired EISCAT-3D goes for DIRAC4EGI 01 Computing centres: Okeanos Engage Year 1: impacts and outcomes 06 DI4R 2016 - Registration is Open! page 5 Latest numbers: EGI beyond 650k CPUs 11 EDISON: Building the data science profession EGI's CONTRIBUTIONS TO PROJECTS 08 AARC and EGI: first year of work 09 EGI and INDIGO-DataCloud integration

Transcript of Issue 24 PDF

Page 1: Issue 24 PDF

page 3

news from the EGI communityISSUE 24JULY 2016

TOP STORIES

page 7

MOREEngage - Grow - Innovate

www.egi.eu

EMSODEV pilot on the FedCloudpage 2

Inspired

EISCAT-3D goes for DIRAC4EGI

01 Computing centres: Okeanos

Engage Year 1: impacts and outcomes

06 DI4R 2016 - Registration is Open!

page 5Latest numbers: EGI beyond 650k CPUs

11 EDISON: Building the data science profession

EGI's CONTRIBUTIONS TO PROJECTS

08 AARC and EGI: first year of work

09 EGI and INDIGO-DataCloud integration

Page 2: Issue 24 PDF

In this Summer edition of our newsletter, we cover theresults of the first years of work of many of the projectswith EGI involvement. Read all about it!

Your feedback and suggestions are always welcome!Send an email to Sara & Iulia at: [email protected]

Welcome to issue 24!

1Inspired // Issue #24, July 2016

GRNET offers cloud computingservices to all members of theGreek research & educationcommunity via the Okeanosinfrastructure.With Okeanos, users can createa multi-layer virtual infrastru-cture and virtual computingmachines, local networks tointerconnect them, and areliable storage space withinseconds.Thousands of academic usershave already used virtualmachines in the course of theirresearch, experimental,educational or other activities.The cloud computinginfrastructure and services ofGRNET S.A. have been madeavailable to the pan-EuropeanR&E community via the‘okeanos-global’ service.A part of Okeanos is availablethrough open standards such asOCCI and CMDI in the EGIFederated Cloud.

Okeanos makes it possible to:> Create virtual machines withcustom images

Computing centres: Okeanos

Kostas Koumantaros presents the GRNET cloud infrastructure

EGI's next event will be the DigitalInfrastructures for Research 2016 in KrakówPoland. The event is co-organised by EUDAT,

GÉANT, OpenAIRE and RDA Europe

The Okeanos computing cluster

More information

GRNEThttp://www.grnet.gr

Okeanoshttp://okeanos.grnet.gr

Lambdahttps://lambda.grnet.gr

> Use extra resources for limitedtime frames e.g. to createvirtual laboratories for a project> Use advanced and diversesecurity policies> Manage the virtual machinesand networks via a user-friendlyinterface> Synchronise user files throughthe storage capabilities offered(Dropbox-like functionality)> Share files among differentusers> Backup as a Service (BaaS)> Image as a Service (ICaaS)> ~ORKA! a cloud-basedplatform for big data analytics

> The Lambda on-demandframework for ingesting real-time streams with efficientstream and batch analytics

Page 3: Issue 24 PDF

2

The EMSODEV project started acollaboration with EGI to deve-lop cloud services and resourcesfor the design and implemen-tation of the EMSODEV DataManagement Platform (DMP).EMSODEV developed the DMPplatform on top of the EGIFederated Cloud with the EGIsupport on virtualisation,storage, networking andsecurity. The experimentalprototype of the DMP is nowfully integrated with the EGIFederated Cloud services and ishosted at the INFN Bari site.

The design of the DMP is almostcomplete. The platform enablesthe exploitation of capabilities ofe-Infrastructures to develop aflexible and scalable DataManagement Service for a long-term, high-resolution, (near)-real-time monitoring by provi-ding a coordinated approach fordata capture, archiving, manage-ment and delivery based onOGC standards.

The DMP is based on Hadoop

EMSODEV pilot running on the EGI Federated Cloud

The EMSODEV team writes about the first step towards a production DataManagement Platform

and includes a set of commonservices, compliant to the ENVRIReference Model.

Sensors data will be comingboth in asynchro-nous/batchand real-time mode, from twodifferent data sources: a SensorObservation Service (SOS)deployed at each EGIM nodeand the EMSO regional nodes.Both historical time series andreal time data will be accessibleand searchable via ApplicationProgramming Interfaces (APIs)

More information

EMSO: European Multi-disciplinary Seafloor andwater-column Observatory

http://www.emso-eu.org

This article was contributedby :Lucio Badiali (INGV),Pasquale Andriani andMassimiliano Nigrelli(Engineering) and DiegoScardaci (EGI Foundation)

built on top of NoSQL databases(e.g., HBase, OpenTSDB).

EGI and EMSODEV are movingtowards a production environ-ment for the DMP. The first stepis a Service Level Agreement toguarantee IT resources andservices to run the DMP for along period. In the meantime,the technical collaboration willcontinue to investigate inparticular, federated deploy-ment of DMP in multiple EGIsites.

Inspired // Issue #24, July 2016

EMSODEV & DMP

EMSODEV is a Horizon 2020 project set up to implement theEuropean Research Infrastructure EMSO, a partnership betweenFrance, Greece, Ireland, Italy, Portugal, Romania, Spain and theUnited Kingdom.EMSODEV focuses on the development, testing and deploymentof an EMSO Generic Instrument Module (EGIM) which willensure accurate long-term measurements of ocean parameters.Collecting and analysing ocean data on a day-to-day basis ischallenging, but crucial to monitor environmental processes.The Data Management Platform (DMP) will implement DataAcquisition and Transformation, Data Curation, Data Processingand Data Access phases collect and analyse parameters, toaddress the needs of a diverse user community.

Page 4: Issue 24 PDF

3

The EGI-Engage project was setup to expand the capabilities ofa European backbone offederated services for advancedscientific computing, storage,data management andknowledge transfer.

The first year concluded inFebruary 2016 and I am pleasedto report that we are steadilyprogressing towards our goals.At the end of the project we aimto have secured and number ofimpacts and here is a summaryof how the work of the first yearof EGI-Engage contributed tothis.

O1: Ensure the continued coor-dination of the EGI Community instrategy and policy development,engagement, technical usersupport and operations of thefederated infrastructure inEurope and worldwide.

> Strategy. The EGI Counciladopted the new EGI strategy inMay 2015 and its new vision,mission and strategic goals withobjectives organised in fivestrategic themes.

> Governance. The EGI Councilapproved of a new governancemodel and new statutes for theEGI Foundation. The new model

First year of EGI-Engage: impacts and outcomes

Tiziana Ferrari summarises the achievements of the first project year

is more flexible and morewelcoming to national andinternational organisations anduser communities.

> The Open Science Commonsvision was endorsed in May2015 by the European Counciland the vision was adopted bythe main European e-Infrastructures (EUDAT, GEANT,OpenAIRE) and LIBER as thefoundation of the new EuropeanOpen Science Cloud initiative(EOSC) of the EC.

> The EGI Council and all EGI-Engage RIs provided input to theEGI contribution to the EuropeanOpen Science Cloud consultation.

> Security. The Security IncidentHandling Procedure wasupdated and approved by theEGI OMB. This work exploitednew possibilities in incidentresponse required by the newlyintegrated technologies,primarily cloud IaaS.

O2: Evolve the EGI Solutions,related business models andaccess policies for differenttarget groups aiming at anincreased sustainability of theseoutside of project funding. Thesolutions will be offered to largeand medium size RIs, smallresearch communities, the long-tail of science, education,industry and SMEs.

> New Business EngagementProgramme.

> Thematic solutions. We startedto set up procedures for SLAnegotiation with providers ofthematic services and usercommunities.

> New service portfolioapproved by the EGI Council inNovember 2015.

> EGI marketplace: first conceptdefined. The first prototype ofan online marketplace platformis expected in late 2016.

> The EGI Federated Cloud nowsupports a larger number ofuser communities and the OCCIstandard was extended toprovide new capabilities. Cloudfederation via native OpenStackis now possible.

> The pilot for a Long Tail ofScience platform was created toprovide simplified accessprocedures to individual usersor small research groups

> Accelerated Computing isbeing developed as a newplatform to provide distributedaccess to local GPGPUcomputing facilities.

> The High-Throughput DataAnalysis solution saw eight newreleases of the UnifiedMiddleware Distribution (UMD).

> We established 27 collabora-tions with RIs/FETs, 11 projects/communities and numerousindividual researchers.

Inspired // Issue #24, July 2016

Page 5: Issue 24 PDF

4

Objective 3 (O3): Offer andexpand an e-InfrastructureCommons solution

> New AAI architecture andprototyping of technicalsolutions. Piloting activities withresearch communities, primarilyELIXIR, allowed the integratedtesting of the new services in thecontext of the AAI of a large RI.

> Service request managementand resource allocation. Thepay-for-use group produced abeta production version of the e-GRANT tool to support allocationto user groups.

> Technical roadmap for theaccounting and monitoringsystems (backend and portal).We launched a redesignedaccounting portal we defined amulti-tenant provisioning modelaiming at offering themonitoring system (ARGO) as aservice.

> The GOCDB was enhanced tosupport multiple projects andextend its data model. GOCDBwas integrated with the EBI IdPto demonstrate the toolcapabilities to the ELIXIRcommunity.

> The Operations coordinationdefined a new set of coreactivities and services to supportthe EGI federation, which wereassigned to the service providersthrough a bidding process.

Objective 4 (O4): Prototype anopen data platform and contri-bute to the implementation ofthe European Big Data Value.

> The EGI Foundation became amember of the Big Data ValueAssociation (BDVA), an industry-led consortium for theimplementation of the Big DataValue public-private partnership.

> The EGI Data Hub was definedas a new EGI service to thedevelopment of the DataCommons across differentscientific disciplines and supportthe big data value chain byfacilitating the commercialexploitation of research data.

> We defined use cases and thearchitecture for the Open DataPlatform and started thepreparation of the prototype.

> The EGI-EUDAT collaborationcontinued to provide end-userswith a seamless access to paireddata and HTC resources. Theroadmap is defined by a set ofuser communities who arealready collaborating with bothinfrastructures in the field ofEarth Science (EPOS and ICOS),Bioinformatics (BBMRI andELIXIR) and Space Physics(EISCAT-3D).

> A MoU was signed with thePRACE-4IP project.

Objective 5 (O5): Promote theadoption of the current EGIservices and extend them withnew capabilities through user co-development

> We established 8 CompetenceCentres (CCs) linked to 8 RIs andcommunities: BBMRI, DARIAH,EISCAT-3D, ELIXIR, EPOS,MoBrain/INSTRUCT, LifeWatch,and disaster mitigation.

More information

EGI-Engage

http://go.egi.eu/eng

Tiziana Ferrari is theTechnical Director of the EGIFoundation

> We developed Service LevelAgreements as a framework toformalise collaborationsbetween user communities(VOs) and EGI providers. So far,four communities havebenefitted from this framework:

>> MoBrain, BILS, DRIHM andTerredue

> We expanded the network ofcloud user support teams acrossthe NGIs. Now we have 18national user support teamsfrom 12 countries:

>> Czech Republic (CESNET),Croatia (SRCE), France (CNRS),Greece (GRNET, IASA), Hungary(MTA SZTAKI), Italy (INFN Bari,INFN Padova), Macedonia(UKIM), Poland (CYFRONET),Portugal (LIP), Slovakia (IISAS),Spain (BIFI, BSC, CESGA,CIEMAT), Sweden(KTH);

> 8 Virtual Organisations wereset up in EGI for gaining accessto cloud resources

Inspired // Issue #24, July 2016

Page 6: Issue 24 PDF

5

The EGI infrastructure builds on15 years' experience in design,development, and deploymentof distributed data analysisservices. Today, the productioninfrastructure federateshundreds of resource centresthat in turn serve hundreds ofresearch communities.

The federation membersEGI’s main resource providersare the 24 EGI Council Members:22 national e-Infrastructures,CERN and EMBL. The EGIinfrastructure integratesresources provided by anadditional 34 worldwideorganisations.

In addition, EGI collaborates withthe US's Open Science Grid andCompute Canada to supportcommon user communitiesthrough inter-operationalagreements.

EGI infrastructure expands beyond 650,000 CPU cores

Alessandro Paolini rounds up the latest EGI figures

The federated servicesThe services offered by the EGIresource centres can begrouped in two macro-categories: High ThroughputComputing (HTC) and CloudComputing.

The overall total capacityavailable for the researchcommunities supported by EGIhas more than doubled in the

Inspired // Issue #24, July 2016

past six years, surpassing thethreshold of 650,000 coresduring 2016.

In terms of storage, EGI federatesmore than 260 PBytes of diskstorage and nearly 240 PBytes oftape storage.

These numbers make EGI thelargest international e-infrastructure in Europe.

Storage and compute capacity (CPU cores) available in EGI 2011-2016

EGI Council members

Page 7: Issue 24 PDF

More information

Alessandro Paolini is amember of the OperationsTeam

6

The top-five contributors to thecomputing capacity of EGI arethree National Grid Initiatives(Germany, Italy and the UnitedKingdom), CERN and the NordicData Grid Facility (aninternational collaborationbetween Denmark, Finland,Norway and Sweden).

Use of the resourcesThe resource providersfederated in EGI support morethan 200 Virtual Organizations(VO). We estimate that EGI isserving a total of 46,000 users:users directly registered in VOsand users who access EGIresources through portals andother high-level services.

Users of the EGI FederatedCloud instantiated 343,200Virtual Machines from January2015 to January 2016. Thisrepresents 2.31 million hours of

Inspired // Issue #24, July 2016

More information

DI4R 2016

www.digitalinfrastructures.eu

Early bird registration:5 August

CPU wall time consumed and a79% yearly increase. Thefederated cloud, with its 21providers of which one iscommercial, is a new platformthat started its productionactivities in May 2014.

Together with 580 million HTCcomputing jobs this means that,last year, the EGI usersconsumed more than 4,500million CPU core hours.

The Digital Infrastructures forResearch 2016 (28-30 Sept,Kraków) is the first user forumjointly organised by Europe'sleading e-infrastructures andprojects: EGI, EUDAT, GEANT,OpenAIRE and RDA Europe.Registration for the event is nowopen and discounted early-birdrates will be available until 5August 2016.The keynote speakers of theconference will be:> Karlheinz Meier fromHeidelberg University, who willspeak about "The brain, theuniverse and the need forintegrated infrastructures”(Wednesday 28 September)

Digital Infrastructures for Research 2016: 28-30 Sept

Registration for the conference is now OPEN!

> Laurie Goodman from theopen-data journal GigaScience,who will speak about "DataCitation: Credit Where Credit isDue”The DI4R 2016 conference ishosting three co-located events(to take place on Tuesday, 27September):> 3rd WISE Workshop - WiseInformation Security forcollaborating E-infrastructures> Design your e-InfrastructureWorkshop – where usercommunities can get togetherwith experts to discussrequirements and design the e-infrastructure they need fortheir scientific use case.

> FitSM Foundation Training -aims to provide the basic ITservice management conceptsand termsThe DI4R 2016 is hosted by ACCCyfronet AGH between 28 and30 September in Krakow,Poland.

Page 8: Issue 24 PDF

7

Having understood therequirements of the EISCAT-3DCompetence Centre, we helpedthem to select DIRAC4EGI as thetechnology for they need for theupcoming implementationphase.

The EISCAT-3D CompetenceCentre (CC) was establishedwithin the EGI-Engage project inMarch 2015 to develop a web-based user interface for sear-ching, retrieving and re-proces-sing (visualisation, analysis) dataproducts generated by EISCAT-3D radar stations.

The EISCAT-3D radar is a nextgeneration incoherent scatterradar system under constructionby EISCAT association. Onceassembled, it will be a world-leading infrastructure using theincoherent scatter technique tostudy the atmosphere in theFenno-Scandinavian Arctic andto investigate how the Earth'satmosphere is coupled to space.

The challenge of the EISCAT-3Ddata management system is tofind an efficient way to handleexperimental data that will begenerated at great speeds andvolumes. During its firstoperation stage in 2018, EISCAT-3D will produce 5PB data peryear, and the total data volumewill rise up to 40PB per year inits full operations stage in 2023.

After analysing the functionaland performance requirementsof the data portal, we proposedto use the DIRAC4EGI frameworkfor the development.

DIRAC is a software frameworkfor building distributed

EISCAT-3D goes for the DIRAC4EGI Service

Yin Chen gives the latest updates on the EISCAT-3D Competence Centre

computing systems. The DIRACproject began in 2002 within theLHCb High-Energy Physics colla-boration at CERN to manageMonte Carlo simulations andwas later extended to cover alldistributed computing opera-tions including raw detector datadistribution, data managementand end-user analysis. TheDIRAC project introduced severalinnovations focusing on theneeds of large and distributedscientific collaborations. Today itis being used or evaluated by anumber of global researchinfrastructures, such as Belle II,BES III and CTA.

The DIRAC4EGI service wasdeveloped in 2014 to integrateDIRAC in the EGI FedCloudinfrastructure.

With the support of the DIRACtechnical team, an EISCAT dataportal is customised usingDIRAC4EGI and will be up andrunning within a couple ofmonths time. The live EISCATportal running on top of the EGIFedCloud was successfullyshown to the participants of theEISCAT_3D user meeting, 18th-19th May 2016 in Uppsala, and

the demonstration of metadata-based data searching, filedownloading and plotting ofdata works well and receivedgood feedback from thecommunity.

The data portal's File Catalogueallows researchers to searchmetadata, download and plotthe selected data files. Selecteddata files can be added toprocessing algorithms andsubmitted as a job to generatedata plots.

In the next step, new applicationfeatures will be added to thedata portal to support variousrequirements from EISCATresearchers.

Inspired // Issue #24, July 2016

More information

EISCAT-3D CompetenceCentre

https://wiki.egi.eu/wiki/CC-EISCAT_3D

Yin Chen is a member of theUser Community SupportTeam

Page 9: Issue 24 PDF

8

The EC-funded project AARCstarted in 2015 to promoteinteroperability of Authorisationand Authentication systemsacross e-Infrastructures andresearch collaborations. Theresults of the first AARC year ofwork concentrated in thefollowing objectives:

Design the AARCarchitectureThe AARC team has designed ahigh-level AAI architectureblueprint based onrequirements and expertise ofmany e-infrastructures, researchinfrastructures (RIs), researchcommunities and AAI architects.

The EGI Foundation led theprocess to gather requirementsfrom the communities, includingmany Research Infrastructuresin the ESFRI roadmap.

As a result of these discussions,AARC drafted an initial ‘AARCBlueprint Architecture’highlighting a list of buildingblock requirements commonamong all stakeholders andpresents variousimplementation possibilities.

Harmonise procedures andpoliciesOne of the hurdles on the way ofimproved collaboration andintegration is incident response.Although computer securityincident response proceduresoften exist at a local level or atfederation level, there is no bestpractice guidance for securityincidents involving severalfederations, for example an IdPfederation and an SP federation,

AARC and EGI: first year of workPeter Solagna writes about EGI's contribution to the AARC project

spreading across multipleadministrative domains.

The AARC project, together witha number of stakeholdersincluding the EGI federationdirect involvement, hascontributed to The SecurityIncident Response TrustFramework for FederatedIdentity (Sirtfi).

Sirtfi offers guidelines forincident behaviour to determinewhether an organisation can beeffective in incident response.

Develop training modules fordifferent user communitiesAARC organised a training at theUniversity of Manchester, from15 to 16 of March, dedicated toorganisations that provideresources and services for theELIXIR (life sciences) and DARIAH(art and humanities) RIs. Themain objective was to helpservice providers to make theirresources and services availableto the users of the RIs.

Pilot policy frameworks andcritical componentsThe pilots for the e-infrastructures and researchinfrastructures focused onenabling federated identity ondistributed service providers.This activity is led by the EGIFoundation is driven by the

requirements and use cases ofthe EGI federation. The ultimategoal of this pilot is todemonstrate how collaborationscan use their institutionalcredentials to both manage theirorganisation and accessdistributed services.

AARC and EGIThe most important result of thecollaboration is the architecturefor the new EGI AAI services,which are fully aligned with theAARC blueprint and in the bestposition to interoperate withnew and existing communitiesand infrastructures. The smoothintegration with the ELIXIR AAI,achieved in May 2016, is a goodexample.

EGI has now more informationabout the AAI requirements ofthe research communities,thanks to our involvement in thegathering process during theearly stages of AARC.

Inspired // Issue #24, July 2016

More information

AARC Projecthttps://aarc-project.eu/

Peter Solagna leads the EGIOperations team

Page 10: Issue 24 PDF

9

INDIGO-DataCloud is an EU-funded project set up to developa new cloud software platformfor the scientific community,deployable on multiple hardwareand provisioned over hybrid(public/private) e-infrastructure.The INDIGO team has recentlycompleted a deliverableoutlining the possible roadmapsfor technical integration withdifferent infrastructures.The full "Technological Methodsof Integration with other e-Infrastructures" deliverable isavailable online. One of thechapters is dedicated topotential routes of technicalintegration between EGI &INDIGO.

Authentication andauthorisationAA in EGI is commonly based onpersonal X.509 certificates andon the user’s membership in aVirtual Organisation: the com-munity attributes are managedby the Virtual OrganisationManagement System (VOMS)that issues an attributecertificate (AC) describing theuser membership. EGI AAI isevolving and currently alternateAAI technologies are availablefor users: username/passwordauthentication based on SAMLand OIDC protocols, support forthird parties attribute manage-ment, and credential translationservices.The INDIGO Identity and AccessManagement (IAM) will provide aset of advanced interoperablesolutions:

EGI and INDIGO-DataCloud integrationDavide Salomoni and Giacinto Donvito on the EGI & INDIGO integration roadmap

> a login service (based onOpenID-Connect) able tofederate different identityproviders> a group membership servicethat will allow to group users inorganisations> an authorisation service thatprovides the tools to define andenforce authorisation policiesover the protected resources> support for controlleddelegation of privileges acrossservices> a Token Translation Service,bridging the gap betweenAuthentication mechanisms thatdo not support OpenID-Connect.INDIGO IAM supports the legacyauthentication technologies aswell as the protocols recentlyadded to the EGI AAI offer.The flexibility of the INDIGOtools will allow an easierintegration and support forpotentially all the use cases byproviding end-users with aunified view of identities andprivileges, and the tools neededto enable a secure compositionof services from multipleproviders in support of scientificapplications.

ComputingGrid-job submission via theINDIGO platformIntegrating INDIGO capabilitieswith the EGI grid infrastructurewill allow users to submit jobsand manage virtual machines viaalready available portals usingthe advanced features providedby the INDIGO platforms. Users

communities could have an easyand straightforward way ofinstantiating a community portalin a cloud environment lettingthe INDIGO PaaS layer to dealwith high-availability and scalinge.g. to accommodate unpre-dictable workloads.Moreover, already availableportals could easily exploit thenew INDIGO IaaS and PaaSfeatures by implementing simpleAPIs provided by the INDIGOSaaS layer.

Application/service deploymenton cloud via the INDIGO platformThe scientific communities thatneed to run a long runningservice or an application on theEGI Federated Cloud will be ableto deploy them in a simple andtransparent way, using both APIsand web user interfaces, thanksto the functions provided by theINDIGO platform. The

Inspired // Issue #24, July 2016

Page 11: Issue 24 PDF

More information

INDIGO-DataCloudwww.indigo-datacloud.eu/

Davide Salomoni is theINDIGO-DataCloudcoordinatorGiacinto Donvito leads theproject's technicaldevelopment

integration in the EGI FederatedCloud will occur at both IaaS andPaaS levels and will allow endusers to dynamically specify thecomposition of the cluster ofservices requested for a givenuse-case or application such, asfor example, databases, loadbalancers, automatic scalabilityfeatures or proximity to certaindatasets.

The INDIGO PaaS will thencompose the requests andautomatically instantiate theneeded services taking care ofmonitoring the status of eachservice, automatically scaling thecomputational resourcesneeded, and restarting theservices in case of failure,exploiting for example the IaaSresources available via the EGIFed Cloud.

StorageEGI grid storage infrastructureThe storage management of theEGI grid infrastructure is basedon a specific class of servicescalled “storage elements”. TheINDIGO project extends thisconcept and makes it possible toaccess files on the EGI gridstorage elements in terms ofboth transfers and read access,exploiting the protocols suppor-ted by EGI and providing a uni-fied view of storage resources.End users exploiting the INDIGOplatform will be able to import/export data transparently from/to the EGI infrastructure andprocess it using application and/or services deployed throughthe INDIGO PaaS layer. TheINDIGO PaaS Layer will take careof doing the needed datatransfers on behalf of the users,before/after executing therequested application/service.

EGI cloud storage infrastructureThe storage layer of the EGIFederated Cloud provides Blockand Object Storage capabilities.Block Storage: for this type ofstorage, the INDIGO project willenable seamless use of Blockstorage resources available ateach site. Computing resourcesinstantiated through the INDIGOPaaS will be able to connect todistributed cloud block storage,presenting it to applications as ifit was local. Replication andcaching facilities are alsoprovided by the INDIGO unifiedstorage layer. Moreover theneed of local block storagefacilities could be easily exploi-ted by means of the automatismprovided by TOSCA Templatesand INDIGO PaaS layer. Thisway, the user can compose acluster of services requestingthe block storage as part of thecluster configuration. All theneeded preparatory steps will beautomatised by INDIGO services.

Object Storage: in order to accessthis storage in the EGI FederatedCloud, a CDMI interface issuggested as the standard thateach site should adopt,regardless of the technologicalimplementation defined at eachsite. The INDIGO project isbuilding on top of that astandard CDMI implementationto also support advancedfeatures such as QoS and DataLifecycle Management so that,for example, different types of

storage technologies (cold orinline storage) or time-dependent access policies todata can be required by users.Moreover, all the available CDMIendpoints could be easilyfederated into a single geogra-phical storage federation, that isalso able to provide seamlessaccess to the storage in order torun legacy applications.The INDIGO components willcontribute to the technicalrealisation of the EGI Open DataPlatform, which is a bigmilestone of the EGI strategy.

Network virtualisationRegarding networking,integration of INDIGO productsmay add a new set ofcapabilities for EGI FederatedCloud users. Contributions byINDIGO will allow them to setup, manage and use privatenetworks on demand inparticipating cloud sites - afeature that is currentlyunavailable.

10Inspired // Issue #24, July 2016

Page 12: Issue 24 PDF

11

A data scientist is an expert inextracting value from existingdata and managing the wholedata lifecycle, includingsupporting scientific data e-Infrastructures.The Horizon 2020 project EDISONwill put in place foundationmechanisms that will increasethe number of qualified datascientists in Europe, benefitingboth academia and industry.To achieve this, EDISON will:> define the skills and compe-tences of the new profession> construct Model Curricula tohelp universities implementcourses and programs for datascience> design a framework for re-skilling and certifying datascience skills for graduates andprofessionals> create an online communityportal for data science inEurope.

Year 1 achievements

1. Defining skills andcompetencesEDISON collected and analysedinformation to establish thecompetences and skills for datascientists and identified know-ledge domains to be included ineducation and trainingprogrammes. This work definesthe Competences Framework forData Science (CF-DS).

Building the data science profession

Themis Athanassiadou writes about the achievements of the EDISON project

The CF-DS describes five groupsof competences for data science:data analytics, data scienceengineering, domain knowledge(cf. NIST definition of datascience), and adds two newgroups: data management andscientific methods.The competence framework andits mapping into knowledgedomains are critical to theassessment and design of datascience curricula.

2. Building model curriculaEDISON produced an inventoryof current data science offerings(programs, courses, books) inacademia and industry fromacross the world. Analysis of theinventory revealed both regionalvariations and competenciesand knowledge domains gaps.The outcomes of this analysiswill lead to recommendationsfor further development of well-balanced curricula, which will betested next year in three pilotcases.

3. Building a communityplatformEDISON is building an onlinecommunity platform that willinclude a training marketplace, acommunity-building area, a jobcentre and a blog/news portal.A key feature of the portal willbe a benchmark to allow dyna-mic navigation based on user

More information

http://edison-project.eu/

The EDISON project is funderunder grant agreement No.675419.This article reflects only theauthor’s view and the EC is notresponsible for any use that maybe made of the information itcontains.

profiles. The benchmark willidentify knowledge gaps, andpresent training material to helpthe users achieve their learninggoals.For educators and trainingorganisations, the portal willprovide a space to share courseinformation, and allow forauthoring on-line material. Animportant resource will be thevirtual labs, and computationalresources provided by the EGIFedCloud. They will alloweducators to create hands-ontutorials and host datasets onthe cloud, which can beaccessed by everyone.

Inspired // Issue #24, July 2016

EGI in EDISON

EGI is part of the EDISONconsortium and will contributeto the data science trainingactivities with the EGI FederatedCloud as environment to hostapplications and data fortraining. EGI is also responsiblefor the certification schemeand for developing thesustainability model of theproject outcomes.