BeSTGRID OpenGridForum 29 GIN session
-
Upload
nick-jones -
Category
Technology
-
view
3.621 -
download
2
description
Transcript of BeSTGRID OpenGridForum 29 GIN session
BeSTGRID overview and advancesDeveloping NZ research infrastructure
www.bestgrid.orgtwitter: @bestgrid
Nick JonesDirector, BeSTGRID
Co-Director eResearch, Centre for eResearchUniversity of [email protected]
eResearch: research “e”lectronic
Researchers need support for “e”, to:accessstoreshareanalysesimulaterepresentdiscoverdiscuss
(a.k.a. eScience, Cyber-Infrastructure, e-Infrastructure)
Doesn’t the Internet do this?
eResearch isn’t for everyone..
Collaboration, Sharing– across boundaries: institutions, geographies,
disciplines, sectors, technologies
– between parties: people, computer systems, machines, instruments
– with diversity: size and scope of problems, potential solutions, technical approaches
“BeSTGRID aims to enhance e-Research capability in New Zealand by providing the skill base to help the various research disciplines engage with new eResearch services”
“BeSTGRID aims to kick start centralised infrastructure with some capital investment at key institutions”
BeSTGRID: since 2006
KAREN
CurrentPlanned
Identity provider
Virtual Applications
ComputationCluster
Storage
Who uses our grid?
Successes: within NZNational grid infrastructure, primarily focused on Bioscience and Geoscience
applications & services, data storage and sharing, computation, along with many others
Established the foundations of a shared research infrastructure service delivery model (governance, management, services) across research sector institutions (Universities, CRIs)
Piloted first federated identity management service in NZ, now moving to production through the NZ Access Federation (nzfed.auckland.ac.nz)
Acting as the basis for sector wide HPC and eResearch programme:– National eScience Infrastructure proposal : 2010 – 2015
“BeSTGRID could well be New Zealand’s earliest dedicated eResearch infrastructure providers – even predating KAREN. Since 2006, BeSTGRID has been delivering services and tools to support research and research collaboration on shared data sets, and in accessing computational resources.”
KAREN hyphenIssue 09September 2009
BeSTGRID: since 2006
What did we learn?Stay Connected, Collaborate
• With Scientists, Administrators, Technologists, Policy makers, Funders
• Locally AND Internationally
Don’t reinvent the wheel• Others have been here before – Development
where necessary
• Work with communities to localise: technologies, approaches, strategies, configurations, programmes
* VDT - OpenScienceGrid* Globus – Argonne* Shibboleth – Internet2* iRODS – DICE / RENCI* Sakai – Sakai Foundation* OCI - NSF * PRAGMA – UCSD* EVO – CalTech* AccessGrid - Argonne* GenePattern – MIT Broad* caBIG - NIH
* VDT - OpenScienceGrid* Globus – Argonne* Shibboleth – Internet2* iRODS – DICE / RENCI* Sakai – Sakai Foundation* OCI - NSF * PRAGMA – UCSD* EVO – CalTech* AccessGrid - Argonne* GenePattern – MIT Broad* caBIG - NIH
* UK eScience - JISC* eFramework – JISCShibboleth – JISC, SWITCHSLCS - SwitchSakai - JISC
* UK eScience - JISC* eFramework – JISCShibboleth – JISC, SWITCHSLCS - SwitchSakai - JISC
* Grisu – ARCS* Confluence - Atlassian* NCRIS - DIISR* QFAB - QCIF
* Grisu – ARCS* Confluence - Atlassian* NCRIS - DIISR* QFAB - QCIF
What did we learn?Change difficult within Institutions
– Medium to Long term timeframes required with researchers, ITS, SMT, research support units– “Final Report: A Workshop on Effective Approaches to Campus Research Computing Cyberinfrastructure” Klingenstein,
Morooney, Olshansky. 2006
Collaboration seen as critical, yet complex and expensive– Working with user communities essential to mutual understand of needs and opportunities, and developing capabilities– Need significant capability before international collaborations (and related learning) possible– New forms of organisation and support required– “Beyond Being There: A Blueprint for Advancing the Design, Development, and Evaluation of Virtual Organization ”
Cummings, Finholt, Foster, Kesselman, Lawrence. 2008– “The Importance of Long-Term Science and Engineering Infrastructure for Digital Discovery” Wilkins-Diehr, Alameda,
Droegemeier, Gannon. 2008
Government and Institutional Investment well below international levels– Australia NCRIS + Super Science– EU FP7 e-Infrastructure call + National Grid Initiatives + Identity + HPC + Networks in member states– UK JISC + NGS + DCC + AAA– US Office of Cyberinfrastructure, NSF, NIH, DoE all funding major Cyberinfrastructure initiatives
Education programs lacking– Who is training the next generation?– Need to grow awareness of career paths in computational and data intensive sciences for computer science and software
engineering graduates
MoRST eResearch programme
In Vote RS&T 2008/09, funding was appropriated to develop capability within the Kiwi Advanced Research and Education Network user group to make effective use of KAREN:
• Advanced Video Conferencing Collaboration and Support Centre
• Federated Identity and Access Management• Towards a Federated Approach – strategy development• Technical Support and Resources
• Semantic Data and Public Access to Research
• BeSTGRID Grid Middleware Initiative
• Hardship fund for remote site connection to KAREN
BeSTGRID 2009 – 2011
• MoRST Dec 2008 funding request forMiddleware development for eResearch in NZ
• Provides ongoing support for BeSTGRID– Widened from founders through active participation of new
Universities and CRIs
• Aims:• Coordinate access to compute and data resources• Provide discipline specific services and applications• Build a sustainable resource administration, middleware
and applications & services development community
Compute and storage platformsShared investmentHeterogeneous architectures
Compute and storage platformsShared investmentHeterogeneous architectures
Grid MiddlewareCommon across compute and storage platformsProvides simplified & secure access to job queues & data sharing
Grid MiddlewareCommon across compute and storage platformsProvides simplified & secure access to job queues & data sharing
Applications and toolsCollaboration, SharingManaged, secure, stable, reliable‘Science as a service’
Applications and toolsCollaboration, SharingManaged, secure, stable, reliable‘Science as a service’
Research communitiesDeep engagement to map problems onto HPC and GridResearch communitiesDeep engagement to map problems onto HPC and Grid
Governance / PrioritiesResponsive, focused, consultativeby member institutions
Governance / PrioritiesResponsive, focused, consultativeby member institutions
22
Architecture Diagram
Virtual Gateway Machine 1
GT4, GRIDFTP,GridSphere etc….
Virtual Gateway Machine 1
GT4, GRIDFTP,GridSphere etc….
Virtual Gateway Machine 2
GT4, GRIDFTP,Grisu Server.
Virtual Gateway Machine 2
GT4, GRIDFTP,Grisu Server.
Virtual Gateway Machine 3GLite etc….
Virtual Gateway Machine 3GLite etc….
Geo-SciencesGeo-Sciences
Bio-SciencesBio-Sciences
PhysicsPhysics
Institutional Setup: UoA, UC, MU (VMs same)Institutional Setup: UoA, UC, MU (VMs same)
ClustersClusters
SupercomputersSupercomputers
SuperclustersSuperclusters
Certificate Authority (APAC CA)Or
Shibbolised MyProxy
Certificate Authority (APAC CA)Or
Shibbolised MyProxy
LRM
LRM
LRM
LRM
LRM
LRM
Impediments• Focused on end user tools, democratisation of resources
– Java GUIs ok, though much desire for web applications– Authentication & Authorisation
• Matching attributes between X.509 & shibboleth, Delegation of credentials, Ownership of resources
• Passing credentials upwards and downwards• Matching Groups to Virtual Organisations, Priviledges
• Maturity of Implementation, especially with an enthusiastic developer community– Adaptability and responsiveness – is this what we desire? Tradeoff sometimes
quality..• versus modularity, reuse, subsumption
– Tension forming at the application and user interface layers – lack of standards– Issue with narrow funnel at the middleware – richness of bottom layers
abstracted away
Site admin
Site admin
Site admin
Site admin
Site admin
Site admin
Site admin
Site admin
Site admin
Site admin
Faculty ITInstitution IT
Faculty ITInstitution IT
Technical Working GroupTechnical Working Group
GeoscienceLeads
GeoscienceLeads
eResearch Centres
eResearch Centres
eResearch Centres
eResearch Centres
eResearch Centres
eResearch Centres
BioscienceLeads
BioscienceLeads
eResearch Centres
eResearch Centres
Steering CommitteeSteering Committee
Project ManagementProject Management
Take lead from community
• Lead Users from each community– Define requirements– Evaluate developments– Promote services - joint promotion– Disseminate learning – seminars, case studies
• Technology group– “User needs” inform requirements– Scans undertaken for technologies to acquire– Incremental development iterations
Case Study: Drug Discovery
Virtual High-Throughput Screening pipeline– Auckland Cancer Society Research Centre
• Dr. Jack Flanagan– via Institute Molecular Biology, Queensland– Limited IT support– Constrained to desktop scale of resources
Auckland Cancer Society Research CentreJack Flanagan:Molecular Mechanics
Molecular Docking & Scoring
Molecular Docking & Scoring
MoleculesMolecules
Maurice Wilkins Centre
3D Structure, Mechanics
3D Structure, Mechanics
Shared Storage
X ray crystallography
Dunbar LabShaun Lott: Biological Structure
Virtual serversDatabases of Molecules3D Docking libraries
Virtual serversDatabases of Molecules3D Docking libraries
Lab specific modules• Auckland Cancer Society Research Centre• Dunbar Lab• Maurice Wilkins Centre labs
Lab specific modules• Auckland Cancer Society Research Centre• Dunbar Lab• Maurice Wilkins Centre labs
Data StorageComputation GridData StorageComputation Grid Gene
sequence DB
Case Study: Genomics
Genomics– NZ Genomics Ltd
• National Genomics Infrastrucutre Service– Otago, AgResearch, Auckland– HPC partnership under development
Mik Black, Otago
Genomics processing pipeline
Genomics processing pipeline
NZ Genomics
R, Java, custom codes
R, Java, custom codes
Cris Print, Auckland
Identity FederationVirtual serversDatabases
Identity FederationVirtual serversDatabases
Lab specific Modules• Otago & Auckland• Contributing globally, MIT Broad Institute
Lab specific Modules• Otago & Auckland• Contributing globally, MIT Broad Institute
Data StorageComputation GridData StorageComputation Grid
Shared StorageDatabasesData
Bioscience projects• BioMirror
– Service: Database hosting, 5TB storage, FTP– Partner: Bioinformatics Institute– A public service in New Zealand for high-speed access to up-to-date DNA & protein biological
sequence databanks
• GenePattern – Service: Genomics processing portal– Partners: Integration Genomics, Otago– Reuse existing and deploy custom genomics processing codes– Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0 Nature Genetics 38 no. 5 (2006): pp500-501
doi:10.1038/ng0506-500.
• CellML Simulator– Service: Run simulations of CellML models on computational cluster– Partner: Auckland Bioengineering Institute
• GOLD Protein Docking– Service: GOLD protein docking on computational cluster– Partner: Auckland Cancer Society Research Centre– Molecular Mechanics and Structural Biology
AustraliaAustraliaNew ZealandNew Zealand
VLBI – 1. Trickle back
Data, Compute, Electronic
s
Data, Compute, Electronic
s
anzska
DataStreaming
Local Data
NZ Antennas & DSP
eVLBI – 2. Real timeTread-P
40 – 80 MHz
Data Transfer1.Trickle back – Grid FTP2.Real time – UDP
Data Transfer1.Trickle back – Grid FTP2.Real time – UDP
Antenna site connection < 1 Gb/sTrans-Tasman bandwidth1.Trickle back ~ 200 Mb/s2.Real time ~ 1 Gb/s dedicatedAntenna Storage ~ 10 – 20 TB
Antenna site connection < 1 Gb/sTrans-Tasman bandwidth1.Trickle back ~ 200 Mb/s2.Real time ~ 1 Gb/s dedicatedAntenna Storage ~ 10 – 20 TB
Post ProcessorPost Processor
CorrelationCorrelation
Central Processing
Facility
AustralianAntennas
1. Trickle back2. Real time1. Trickle back2. Real time
Physics projects• NZ SKA related initiatives
– Service: Data transfer, storage, and compute– Partner: Slava Kitaeff, AUT, Tread project leader– Design and deploy data transfer, storage, and compute services
• CMS Experiment – Auckland Node – Service: Hosted compute cluster nodes– Partners: Philip Alfrey, FRST Doctoral Fellow– Deploy European middleware (gLite) to connect Philip into regional and global CMS experiment on
the LHC at CERN, integrating with the Worldwide LHC Computing Grid
Data Sharing & Collaboration
Social Sciences Data Servicewww.nzssds.org.nz
Sakai– Wikis– File resources– Email listssakai.bestgrid.org
Making use of all that noise
Each seismometer pair provides a seismic profile
Seismographic Information Service
Paul Grimwood
Tomography
Sub-surface seismic velocity structure constructed from an analysis of the velocities at which waves of different pitches propagate
WSData
Collaboration – What have we found useful?
Must have shared and common understanding Regular meetings (fortnightly)
Video-conferencing: EVO. Briefly examined and theoretically suitable, but GNS Science and VUW are very close and we prefer contact in person
Communication/people skills (IT people and scientists tend to talk different languages)
Agile: Iterative development with rapid prototyping The Bestgrid team at Auckland host the Sakai site. Very easy
to use package with out of the box Wiki for sharing info across collaborators, Common email address and repository, and resource location
A collaborative partnership, to deliver eScience Infrastructure
Design Goals• Level of commitment from partner institutions is high – necessitating high trust in partnership agreement. If this is gained then the model allows “deep” collaboration – currently unseen in NZ
• Investment will be from an informed “needs” basis, delivering efficiency gains in meeting these needs
• Capacity will be shared and accessible nationally – a single front door
• Heterogeneous array of computing platforms available nationally
• Coordinated outreach to priority research areas and next generation researchers
eScience Infrastructure
During the current planning and transition phase, this project is referred to as:
National eScience Infrastructure (NeSI)
NZ eResearch Infrastructure
Aims• Provision of a wide scope of coordinated HPC & HTC platforms and eScience services• Aim to secure government investment through matching• Capacity is determined by affordability and demand (fit for purpose), scale of
services must match the available funding, use of resources fits machine capability• Focus on co-ordination rather than control (through Boards, and joint procurement
of capital investments)• Investors & Noninvestors incentivized to coordinate with NeSI, to varying degrees• Investors contribute an agreed portion of system capacity and infrastructure services
(outside of their operational needs) for a merit pool (under agreed SLA)• NeSI effectively purchases services from partners and does not own assets• Merit pool access is uniform for all• Complementary education funding (TEC)• Offer strong incentives to science community to change thinking around research
infrastructure—by offering ratio of matching funds for platforms and services
Consumers
Investors
Procurement Planning
Operational management
Data& Grid middleware to manage access
Outreach
NeSI Governance
• investing partners
• transparent & collaborative
NIWA investment
Discount on existing computational services, including:
• People
• HPC capacity
Auckland investment
• People
• New cluster
• New storage
• Portion of facilities, power, depreciation costs
Investor / Partner institutions• Access through a “single front door”• Specialised concentrations of capability at each institution• Receive government coinvestment• Capacity available to institutions reflects their level of investment• Managed and metered access to resources, across all resource types• Institutions’ access costs covered by investment
Services
Canterbury investment
• People
• HPC
• New storage
• Portion of facilities, power, depreciation costs
Canterbury
HPCCanterbury
HPC
Government Investment
• Invest over 4 years, plus Out-years
• Annual LSRI review with NeSI
AucklandClusterAucklandCluster
HPC
NIW
A
HPC
NIW
A
AgResearch
• People
• New cluster
• New storage
• Portion of facilities, power, depreciation costs
Cluster
AgResearch
Cluster
AgResearch
Managed Access
Legend
Financial flows
Access
Research institutions (non investors)• Access through a “single front door”• Capacity scaled out from partners capabilities• Managed and metered access to resources, across all resource types• Access fee initially calculated as partial costs of metered use of resources and reviewed annually
Industry• Access through a “single front door”• Capacity scaled out from partners capabilities• Managed and metered access to resources, across all resource types• Access fee calculated as full costs of metered use of resources
How NeSI is comprised
1-2 months
Begin process of funding agreement
with MoRST
CEO and VC
sign-off
Secto
rM
oR
ST &
Govt
e-Science Transition working group to agree and
write a full investment case,
business model and implementation
plan.
June | July | August | September | October | November | December |
Morsties contribute advice to full business plan development
Analyse and
negotiate support
from Treasury, TEC, FRSTFinancials formatted by Geoff Palmer, MoRST
Cabinet Paper
To Minister
6/10
Cabinet Office and Committee 20/10
App
rova
l
Cabinet sits 17/11 for
decision 20/11
Iterations of final business
plan
** Sign off by
wider stakeholder
group**
Contr
act
$
Imp
lem
enta
tion
Appro
pri
ati
on
Cabinet Paper
negotiated with govt agenciesMoRST/
Treasury
Julie and Kathleen to prepare Cabinet Paper
Preview to Minister
eResearch
Researchers need support for “e”, to:accessstoreshareanalysesimulaterepresentdiscoverdiscuss
a.k.a. eScience, Cyber-Infrastructure, e-Infrastructure
Data Storage – single sitesData Fabric – single fabric across multiple sitesData Transfer – experience and variety of toolsComputational Services – multiple sites, accessible via command lines and graphical toolsWeb sites – Wikis, file sharing, email lists (Sakai); Content Management (Drupal)
Video Collaboration – Desktop (EVO) avcc.karen.net.nz
Identity Federation – access shared services using existing logins – NZ Access Federation
Data Storage – single sitesData Fabric – single fabric across multiple sitesData Transfer – experience and variety of toolsComputational Services – multiple sites, accessible via command lines and graphical toolsWeb sites – Wikis, file sharing, email lists (Sakai); Content Management (Drupal)
Video Collaboration – Desktop (EVO) avcc.karen.net.nz
Identity Federation – access shared services using existing logins – NZ Access Federation
eResearch
Collaboration, Sharing– across boundaries: institutions, geographies,
disciplines, sectors, technologies
– between parties: people, computer systems, machines, instruments
– with diversity: size and scope of problems, potential solutions, technical approaches
Can I use BeSTGRID?www.bestgrid.org/getstarted
Can I join BeSTGRID?www.bestgrid.org/join
www.bestgrid.org/contact
Thanks to all these people!The University of Auckland• Stephen Cope• Mark Gahegan• Yuriy Halytskyy• Nick Jones• Andrey Kharuk• Gene Soudlenkov• Markus Binsteiner
Auckland University Technology• Slava Kitaev• Gene Soudlenkov
Canterbury University• Tim David• Vladimir Mencl
Industrial Research Limited• John Burnell• Peter McGavin
Landcare Research• Robert Gibb• Aaron Hicks• Niels Hoffmann• David Medyckyj-Scott
Lincoln University• Stuart Charters
Massey University• Martin Johnson• Guy Kloss
Otago University• Mik Black• Marcus Davy• Matthew Grant• Tim Molteno
Victoria University Wellington• Kevin Buckley• John Hine
Waikato University•Joseph Lane
MoRST•Wynn Ingram•Kathleen Logan•George Slim•Julie Watson
REANNZ / KAREN•Bill Choquette•Donald Clarke•Vicki Lindsay
Any many others...•Nevil Brownlee•Paul Bonnington•Tim Chaffe•Matthew Cocker•Neil Gemmell•Robin Harrington•Chris Messom•Sam Searle
Where shall we go, from here?
www.bestgrid.org@bestgrid
Nick JonesDirector, BeSTGRIDCo-Director eResearch, Centre for eResearchUniversity of [email protected]