BeSTGRID OpenGridForum 29 GIN session

50
BeSTGRID overview and advances Developing NZ research infrastructure www.bestgrid. org twitter: @bestgrid Nick Jones Director, BeSTGRID Co-Director eResearch, Centre for eResearch University of Auckland [email protected]

description

An update on BeSTGRID activity and plans, in particular in preparation for the planned future developments of a unified approach to high performance and distributed computing in NZ.

Transcript of BeSTGRID OpenGridForum 29 GIN session

Page 1: BeSTGRID OpenGridForum 29 GIN session

BeSTGRID overview and advancesDeveloping NZ research infrastructure

www.bestgrid.orgtwitter: @bestgrid

Nick JonesDirector, BeSTGRID

Co-Director eResearch, Centre for eResearchUniversity of [email protected]

Page 2: BeSTGRID OpenGridForum 29 GIN session

eResearch: research “e”lectronic

Researchers need support for “e”, to:accessstoreshareanalysesimulaterepresentdiscoverdiscuss

(a.k.a. eScience, Cyber-Infrastructure, e-Infrastructure)

Page 3: BeSTGRID OpenGridForum 29 GIN session

Doesn’t the Internet do this?

Page 4: BeSTGRID OpenGridForum 29 GIN session

eResearch isn’t for everyone..

Collaboration, Sharing– across boundaries: institutions, geographies,

disciplines, sectors, technologies

– between parties: people, computer systems, machines, instruments

– with diversity: size and scope of problems, potential solutions, technical approaches

Page 5: BeSTGRID OpenGridForum 29 GIN session

“BeSTGRID aims to enhance e-Research capability in New Zealand by providing the skill base to help the various research disciplines engage with new eResearch services”

“BeSTGRID aims to kick start centralised infrastructure with some capital investment at key institutions”

BeSTGRID: since 2006

Page 6: BeSTGRID OpenGridForum 29 GIN session

KAREN

Page 7: BeSTGRID OpenGridForum 29 GIN session

CurrentPlanned

Identity provider

Virtual Applications

ComputationCluster

Storage

Page 8: BeSTGRID OpenGridForum 29 GIN session
Page 9: BeSTGRID OpenGridForum 29 GIN session

Who uses our grid?

Page 10: BeSTGRID OpenGridForum 29 GIN session

Successes: within NZNational grid infrastructure, primarily focused on Bioscience and Geoscience

applications & services, data storage and sharing, computation, along with many others

Established the foundations of a shared research infrastructure service delivery model (governance, management, services) across research sector institutions (Universities, CRIs)

Piloted first federated identity management service in NZ, now moving to production through the NZ Access Federation (nzfed.auckland.ac.nz)

Acting as the basis for sector wide HPC and eResearch programme:– National eScience Infrastructure proposal : 2010 – 2015

Page 11: BeSTGRID OpenGridForum 29 GIN session

“BeSTGRID could well be New Zealand’s earliest dedicated eResearch infrastructure providers – even predating KAREN. Since 2006, BeSTGRID has been delivering services and tools to support research and research collaboration on shared data sets, and in accessing computational resources.”

KAREN hyphenIssue 09September 2009

BeSTGRID: since 2006

Page 12: BeSTGRID OpenGridForum 29 GIN session

What did we learn?Stay Connected, Collaborate

• With Scientists, Administrators, Technologists, Policy makers, Funders

• Locally AND Internationally

Don’t reinvent the wheel• Others have been here before – Development

where necessary

• Work with communities to localise: technologies, approaches, strategies, configurations, programmes

Page 13: BeSTGRID OpenGridForum 29 GIN session

* VDT - OpenScienceGrid* Globus – Argonne* Shibboleth – Internet2* iRODS – DICE / RENCI* Sakai – Sakai Foundation* OCI - NSF * PRAGMA – UCSD* EVO – CalTech* AccessGrid - Argonne* GenePattern – MIT Broad* caBIG - NIH

* VDT - OpenScienceGrid* Globus – Argonne* Shibboleth – Internet2* iRODS – DICE / RENCI* Sakai – Sakai Foundation* OCI - NSF * PRAGMA – UCSD* EVO – CalTech* AccessGrid - Argonne* GenePattern – MIT Broad* caBIG - NIH

* UK eScience - JISC* eFramework – JISCShibboleth – JISC, SWITCHSLCS - SwitchSakai - JISC

* UK eScience - JISC* eFramework – JISCShibboleth – JISC, SWITCHSLCS - SwitchSakai - JISC

* Grisu – ARCS* Confluence - Atlassian* NCRIS - DIISR* QFAB - QCIF

* Grisu – ARCS* Confluence - Atlassian* NCRIS - DIISR* QFAB - QCIF

Page 14: BeSTGRID OpenGridForum 29 GIN session

What did we learn?Change difficult within Institutions

– Medium to Long term timeframes required with researchers, ITS, SMT, research support units– “Final Report: A Workshop on Effective Approaches to Campus Research Computing Cyberinfrastructure” Klingenstein,

Morooney, Olshansky. 2006

Collaboration seen as critical, yet complex and expensive– Working with user communities essential to mutual understand of needs and opportunities, and developing capabilities– Need significant capability before international collaborations (and related learning) possible– New forms of organisation and support required– “Beyond Being There: A Blueprint for Advancing the Design, Development, and Evaluation of Virtual Organization ”

Cummings, Finholt, Foster, Kesselman, Lawrence. 2008– “The Importance of Long-Term Science and Engineering Infrastructure for Digital Discovery” Wilkins-Diehr, Alameda,

Droegemeier, Gannon. 2008

Government and Institutional Investment well below international levels– Australia NCRIS + Super Science– EU FP7 e-Infrastructure call + National Grid Initiatives + Identity + HPC + Networks in member states– UK JISC + NGS + DCC + AAA– US Office of Cyberinfrastructure, NSF, NIH, DoE all funding major Cyberinfrastructure initiatives

Education programs lacking– Who is training the next generation?– Need to grow awareness of career paths in computational and data intensive sciences for computer science and software

engineering graduates

Page 15: BeSTGRID OpenGridForum 29 GIN session

MoRST eResearch programme

In Vote RS&T 2008/09, funding was appropriated to develop capability within the Kiwi Advanced Research and Education Network user group to make effective use of KAREN:

• Advanced Video Conferencing Collaboration and Support Centre

• Federated Identity and Access Management• Towards a Federated Approach – strategy development• Technical Support and Resources

• Semantic Data and Public Access to Research

• BeSTGRID Grid Middleware Initiative

• Hardship fund for remote site connection to KAREN

Page 16: BeSTGRID OpenGridForum 29 GIN session

BeSTGRID 2009 – 2011

• MoRST Dec 2008 funding request forMiddleware development for eResearch in NZ

• Provides ongoing support for BeSTGRID– Widened from founders through active participation of new

Universities and CRIs

• Aims:• Coordinate access to compute and data resources• Provide discipline specific services and applications• Build a sustainable resource administration, middleware

and applications & services development community

Page 17: BeSTGRID OpenGridForum 29 GIN session

Compute and storage platformsShared investmentHeterogeneous architectures

Compute and storage platformsShared investmentHeterogeneous architectures

Grid MiddlewareCommon across compute and storage platformsProvides simplified & secure access to job queues & data sharing

Grid MiddlewareCommon across compute and storage platformsProvides simplified & secure access to job queues & data sharing

Applications and toolsCollaboration, SharingManaged, secure, stable, reliable‘Science as a service’

Applications and toolsCollaboration, SharingManaged, secure, stable, reliable‘Science as a service’

Research communitiesDeep engagement to map problems onto HPC and GridResearch communitiesDeep engagement to map problems onto HPC and Grid

Governance / PrioritiesResponsive, focused, consultativeby member institutions

Governance / PrioritiesResponsive, focused, consultativeby member institutions

Page 18: BeSTGRID OpenGridForum 29 GIN session
Page 19: BeSTGRID OpenGridForum 29 GIN session
Page 20: BeSTGRID OpenGridForum 29 GIN session
Page 21: BeSTGRID OpenGridForum 29 GIN session
Page 22: BeSTGRID OpenGridForum 29 GIN session

22

Architecture Diagram

Virtual Gateway Machine 1

GT4, GRIDFTP,GridSphere etc….

Virtual Gateway Machine 1

GT4, GRIDFTP,GridSphere etc….

Virtual Gateway Machine 2

GT4, GRIDFTP,Grisu Server.

Virtual Gateway Machine 2

GT4, GRIDFTP,Grisu Server.

Virtual Gateway Machine 3GLite etc….

Virtual Gateway Machine 3GLite etc….

Geo-SciencesGeo-Sciences

Bio-SciencesBio-Sciences

PhysicsPhysics

Institutional Setup: UoA, UC, MU (VMs same)Institutional Setup: UoA, UC, MU (VMs same)

ClustersClusters

SupercomputersSupercomputers

SuperclustersSuperclusters

Certificate Authority (APAC CA)Or

Shibbolised MyProxy

Certificate Authority (APAC CA)Or

Shibbolised MyProxy

LRM

LRM

LRM

LRM

LRM

LRM

Page 23: BeSTGRID OpenGridForum 29 GIN session
Page 24: BeSTGRID OpenGridForum 29 GIN session

Impediments• Focused on end user tools, democratisation of resources

– Java GUIs ok, though much desire for web applications– Authentication & Authorisation

• Matching attributes between X.509 & shibboleth, Delegation of credentials, Ownership of resources

• Passing credentials upwards and downwards• Matching Groups to Virtual Organisations, Priviledges

• Maturity of Implementation, especially with an enthusiastic developer community– Adaptability and responsiveness – is this what we desire? Tradeoff sometimes

quality..• versus modularity, reuse, subsumption

– Tension forming at the application and user interface layers – lack of standards– Issue with narrow funnel at the middleware – richness of bottom layers

abstracted away

Page 25: BeSTGRID OpenGridForum 29 GIN session

Site admin

Site admin

Site admin

Site admin

Site admin

Site admin

Site admin

Site admin

Site admin

Site admin

Faculty ITInstitution IT

Faculty ITInstitution IT

Technical Working GroupTechnical Working Group

GeoscienceLeads

GeoscienceLeads

eResearch Centres

eResearch Centres

eResearch Centres

eResearch Centres

eResearch Centres

eResearch Centres

BioscienceLeads

BioscienceLeads

eResearch Centres

eResearch Centres

Steering CommitteeSteering Committee

Project ManagementProject Management

Page 26: BeSTGRID OpenGridForum 29 GIN session

Take lead from community

• Lead Users from each community– Define requirements– Evaluate developments– Promote services - joint promotion– Disseminate learning – seminars, case studies

• Technology group– “User needs” inform requirements– Scans undertaken for technologies to acquire– Incremental development iterations

Page 27: BeSTGRID OpenGridForum 29 GIN session

Case Study: Drug Discovery

Virtual High-Throughput Screening pipeline– Auckland Cancer Society Research Centre

• Dr. Jack Flanagan– via Institute Molecular Biology, Queensland– Limited IT support– Constrained to desktop scale of resources

Page 28: BeSTGRID OpenGridForum 29 GIN session

Auckland Cancer Society Research CentreJack Flanagan:Molecular Mechanics

Molecular Docking & Scoring

Molecular Docking & Scoring

MoleculesMolecules

Maurice Wilkins Centre

3D Structure, Mechanics

3D Structure, Mechanics

Shared Storage

X ray crystallography

Dunbar LabShaun Lott: Biological Structure

Virtual serversDatabases of Molecules3D Docking libraries

Virtual serversDatabases of Molecules3D Docking libraries

Lab specific modules• Auckland Cancer Society Research Centre• Dunbar Lab• Maurice Wilkins Centre labs

Lab specific modules• Auckland Cancer Society Research Centre• Dunbar Lab• Maurice Wilkins Centre labs

Data StorageComputation GridData StorageComputation Grid Gene

sequence DB

Page 29: BeSTGRID OpenGridForum 29 GIN session

Case Study: Genomics

Genomics– NZ Genomics Ltd

• National Genomics Infrastrucutre Service– Otago, AgResearch, Auckland– HPC partnership under development

Page 30: BeSTGRID OpenGridForum 29 GIN session

Mik Black, Otago

Genomics processing pipeline

Genomics processing pipeline

NZ Genomics

R, Java, custom codes

R, Java, custom codes

Cris Print, Auckland

Identity FederationVirtual serversDatabases

Identity FederationVirtual serversDatabases

Lab specific Modules• Otago & Auckland• Contributing globally, MIT Broad Institute

Lab specific Modules• Otago & Auckland• Contributing globally, MIT Broad Institute

Data StorageComputation GridData StorageComputation Grid

Shared StorageDatabasesData

Page 31: BeSTGRID OpenGridForum 29 GIN session

Bioscience projects• BioMirror

– Service: Database hosting, 5TB storage, FTP– Partner: Bioinformatics Institute– A public service in New Zealand for high-speed access to up-to-date DNA & protein biological

sequence databanks

• GenePattern – Service: Genomics processing portal– Partners: Integration Genomics, Otago– Reuse existing and deploy custom genomics processing codes– Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0 Nature Genetics 38 no. 5 (2006): pp500-501

doi:10.1038/ng0506-500.

• CellML Simulator– Service: Run simulations of CellML models on computational cluster– Partner: Auckland Bioengineering Institute

• GOLD Protein Docking– Service: GOLD protein docking on computational cluster– Partner: Auckland Cancer Society Research Centre– Molecular Mechanics and Structural Biology

Page 32: BeSTGRID OpenGridForum 29 GIN session

AustraliaAustraliaNew ZealandNew Zealand

VLBI – 1. Trickle back

Data, Compute, Electronic

s

Data, Compute, Electronic

s

anzska

DataStreaming

Local Data

NZ Antennas & DSP

eVLBI – 2. Real timeTread-P

40 – 80 MHz

Data Transfer1.Trickle back – Grid FTP2.Real time – UDP

Data Transfer1.Trickle back – Grid FTP2.Real time – UDP

Antenna site connection < 1 Gb/sTrans-Tasman bandwidth1.Trickle back ~ 200 Mb/s2.Real time ~ 1 Gb/s dedicatedAntenna Storage ~ 10 – 20 TB

Antenna site connection < 1 Gb/sTrans-Tasman bandwidth1.Trickle back ~ 200 Mb/s2.Real time ~ 1 Gb/s dedicatedAntenna Storage ~ 10 – 20 TB

Post ProcessorPost Processor

CorrelationCorrelation

Central Processing

Facility

AustralianAntennas

1. Trickle back2. Real time1. Trickle back2. Real time

Page 33: BeSTGRID OpenGridForum 29 GIN session

Physics projects• NZ SKA related initiatives

– Service: Data transfer, storage, and compute– Partner: Slava Kitaeff, AUT, Tread project leader– Design and deploy data transfer, storage, and compute services

• CMS Experiment – Auckland Node – Service: Hosted compute cluster nodes– Partners: Philip Alfrey, FRST Doctoral Fellow– Deploy European middleware (gLite) to connect Philip into regional and global CMS experiment on

the LHC at CERN, integrating with the Worldwide LHC Computing Grid

Page 34: BeSTGRID OpenGridForum 29 GIN session

Data Sharing & Collaboration

Social Sciences Data Servicewww.nzssds.org.nz

Sakai– Wikis– File resources– Email listssakai.bestgrid.org

Page 35: BeSTGRID OpenGridForum 29 GIN session
Page 36: BeSTGRID OpenGridForum 29 GIN session
Page 37: BeSTGRID OpenGridForum 29 GIN session

Making use of all that noise

Each seismometer pair provides a seismic profile

Seismographic Information Service

Paul Grimwood

Page 38: BeSTGRID OpenGridForum 29 GIN session

Tomography

Sub-surface seismic velocity structure constructed from an analysis of the velocities at which waves of different pitches propagate

Page 39: BeSTGRID OpenGridForum 29 GIN session

WSData

Page 40: BeSTGRID OpenGridForum 29 GIN session

Collaboration – What have we found useful?

Must have shared and common understanding Regular meetings (fortnightly)

Video-conferencing: EVO. Briefly examined and theoretically suitable, but GNS Science and VUW are very close and we prefer contact in person

Communication/people skills (IT people and scientists tend to talk different languages)

Agile: Iterative development with rapid prototyping The Bestgrid team at Auckland host the Sakai site. Very easy

to use package with out of the box Wiki for sharing info across collaborators, Common email address and repository, and resource location

Page 41: BeSTGRID OpenGridForum 29 GIN session

A collaborative partnership, to deliver eScience Infrastructure

Design Goals• Level of commitment from partner institutions is high – necessitating high trust in partnership agreement. If this is gained then the model allows “deep” collaboration – currently unseen in NZ

• Investment will be from an informed “needs” basis, delivering efficiency gains in meeting these needs

• Capacity will be shared and accessible nationally – a single front door

• Heterogeneous array of computing platforms available nationally

• Coordinated outreach to priority research areas and next generation researchers

eScience Infrastructure

During the current planning and transition phase, this project is referred to as:

National eScience Infrastructure (NeSI)

Page 42: BeSTGRID OpenGridForum 29 GIN session

NZ eResearch Infrastructure

Page 43: BeSTGRID OpenGridForum 29 GIN session

Aims• Provision of a wide scope of coordinated HPC & HTC platforms and eScience services• Aim to secure government investment through matching• Capacity is determined by affordability and demand (fit for purpose), scale of

services must match the available funding, use of resources fits machine capability• Focus on co-ordination rather than control (through Boards, and joint procurement

of capital investments)• Investors & Noninvestors incentivized to coordinate with NeSI, to varying degrees• Investors contribute an agreed portion of system capacity and infrastructure services

(outside of their operational needs) for a merit pool (under agreed SLA)• NeSI effectively purchases services from partners and does not own assets• Merit pool access is uniform for all• Complementary education funding (TEC)• Offer strong incentives to science community to change thinking around research

infrastructure—by offering ratio of matching funds for platforms and services

Page 44: BeSTGRID OpenGridForum 29 GIN session

Consumers

Investors

Procurement Planning

Operational management

Data& Grid middleware to manage access

Outreach

NeSI Governance

• investing partners

• transparent & collaborative

NIWA investment

Discount on existing computational services, including:

• People

• HPC capacity

Auckland investment

• People

• New cluster

• New storage

• Portion of facilities, power, depreciation costs

Investor / Partner institutions• Access through a “single front door”• Specialised concentrations of capability at each institution• Receive government coinvestment• Capacity available to institutions reflects their level of investment• Managed and metered access to resources, across all resource types• Institutions’ access costs covered by investment

Services

Canterbury investment

• People

• HPC

• New storage

• Portion of facilities, power, depreciation costs

Canterbury

HPCCanterbury

HPC

Government Investment

• Invest over 4 years, plus Out-years

• Annual LSRI review with NeSI

AucklandClusterAucklandCluster

HPC

NIW

A

HPC

NIW

A

AgResearch

• People

• New cluster

• New storage

• Portion of facilities, power, depreciation costs

Cluster

AgResearch

Cluster

AgResearch

Managed Access

Legend

Financial flows

Access

Research institutions (non investors)• Access through a “single front door”• Capacity scaled out from partners capabilities• Managed and metered access to resources, across all resource types• Access fee initially calculated as partial costs of metered use of resources and reviewed annually

Industry• Access through a “single front door”• Capacity scaled out from partners capabilities• Managed and metered access to resources, across all resource types• Access fee calculated as full costs of metered use of resources

How NeSI is comprised

Page 45: BeSTGRID OpenGridForum 29 GIN session

1-2 months

Begin process of funding agreement

with MoRST

CEO and VC

sign-off

Secto

rM

oR

ST &

Govt

e-Science Transition working group to agree and

write a full investment case,

business model and implementation

plan.

June | July | August | September | October | November | December |

Morsties contribute advice to full business plan development

Analyse and

negotiate support

from Treasury, TEC, FRSTFinancials formatted by Geoff Palmer, MoRST

Cabinet Paper

To Minister

6/10

Cabinet Office and Committee 20/10

App

rova

l

Cabinet sits 17/11 for

decision 20/11

Iterations of final business

plan

** Sign off by

wider stakeholder

group**

Contr

act

$

Imp

lem

enta

tion

Appro

pri

ati

on

Cabinet Paper

negotiated with govt agenciesMoRST/

Treasury

Julie and Kathleen to prepare Cabinet Paper

Preview to Minister

Page 46: BeSTGRID OpenGridForum 29 GIN session

eResearch

Researchers need support for “e”, to:accessstoreshareanalysesimulaterepresentdiscoverdiscuss

a.k.a. eScience, Cyber-Infrastructure, e-Infrastructure

Data Storage – single sitesData Fabric – single fabric across multiple sitesData Transfer – experience and variety of toolsComputational Services – multiple sites, accessible via command lines and graphical toolsWeb sites – Wikis, file sharing, email lists (Sakai); Content Management (Drupal)

Video Collaboration – Desktop (EVO) avcc.karen.net.nz

Identity Federation – access shared services using existing logins – NZ Access Federation

Data Storage – single sitesData Fabric – single fabric across multiple sitesData Transfer – experience and variety of toolsComputational Services – multiple sites, accessible via command lines and graphical toolsWeb sites – Wikis, file sharing, email lists (Sakai); Content Management (Drupal)

Video Collaboration – Desktop (EVO) avcc.karen.net.nz

Identity Federation – access shared services using existing logins – NZ Access Federation

Page 47: BeSTGRID OpenGridForum 29 GIN session

eResearch

Collaboration, Sharing– across boundaries: institutions, geographies,

disciplines, sectors, technologies

– between parties: people, computer systems, machines, instruments

– with diversity: size and scope of problems, potential solutions, technical approaches

Page 48: BeSTGRID OpenGridForum 29 GIN session

Can I use BeSTGRID?www.bestgrid.org/getstarted

Can I join BeSTGRID?www.bestgrid.org/join

www.bestgrid.org/contact

Page 49: BeSTGRID OpenGridForum 29 GIN session

Thanks to all these people!The University of Auckland• Stephen Cope• Mark Gahegan• Yuriy Halytskyy• Nick Jones• Andrey Kharuk• Gene Soudlenkov• Markus Binsteiner

Auckland University Technology• Slava Kitaev• Gene Soudlenkov

Canterbury University• Tim David• Vladimir Mencl

Industrial Research Limited• John Burnell• Peter McGavin

Landcare Research• Robert Gibb• Aaron Hicks• Niels Hoffmann• David Medyckyj-Scott

Lincoln University• Stuart Charters

Massey University• Martin Johnson• Guy Kloss

Otago University• Mik Black• Marcus Davy• Matthew Grant• Tim Molteno

Victoria University Wellington• Kevin Buckley• John Hine

Waikato University•Joseph Lane

MoRST•Wynn Ingram•Kathleen Logan•George Slim•Julie Watson

REANNZ / KAREN•Bill Choquette•Donald Clarke•Vicki Lindsay

Any many others...•Nevil Brownlee•Paul Bonnington•Tim Chaffe•Matthew Cocker•Neil Gemmell•Robin Harrington•Chris Messom•Sam Searle

Page 50: BeSTGRID OpenGridForum 29 GIN session

Where shall we go, from here?

www.bestgrid.org@bestgrid

Nick JonesDirector, BeSTGRIDCo-Director eResearch, Centre for eResearchUniversity of [email protected]