Cyberinfrastructure and the Transformation of Science, Education and Engineering
description
Transcript of Cyberinfrastructure and the Transformation of Science, Education and Engineering
Cyberinfrastructure and the Transformation of Science, Education and Engineering
Tony Hey
Director of UK e-Science Core Program
J.C.R.Licklider’s Vision “Lick had this concept of the intergalactic network
which he believed was everybody could use computers anywhere and get at data anywhere in the world. He didn’t envision the number of computers we have today by any means, but he had the same concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job. The vision was really Lick’s originally.”
Larry Roberts – Principal Architect of the ARPANET
A Definition of e-Science ‘e-Science is about global collaboration in key
areas of science, and the next generation of infrastructure that will enable it.’
John Taylor
Director General of Research Councils
Office of Science and Technology Purpose of e-Science initiative is to allow
scientists to do faster, different, better research
The e-Science Paradigm
• The Integrative Biology Project involves the University of Oxford (and others) in the UK and the University of Auckland in New Zealand
Models of electrical behaviour of heart cells developed by Denis Noble’s team in Oxford
Mechanical models of beating heart developed in Auckland
Need to be able to build a ‘Virtual Organisation’ allowing routine access for researchers to specific resources in the UK and New Zealand
Common FabricGroup A
Group B
Resources
Generic services
e-Infrastructure/Cyberinfrastructurefor Research
Private Resources
PrivateResources
Grids in Education• Education is a classic distributed organization
• New multi-disciplinary curricula require distributed experts interacting with mentors and students
• Education requires rich integration of data sources with people and computing
• Grids will ‘democratize’ resources enabling universal and ubiqitous access
• Learning Management systems such as WebCT, Blackboard, Placeware, WebEx and Groove all have natural Grid implementations
Information Grid
Enterprise Grid
Compute Grid
Campus Grid
R2R1
Teacher
Students
Dynamic light-weight Peer-to-peerCollaboration Training Grid
Overlapping HeterogeneousDynamic Grid Islands
Grid Middleware Infrastructure
• Global e-Science Infrastructure must support genuine needs of users/applications
• To support ‘routine’ collaboration between institutions in different countries need set of robust middleware services and agreed‘policies’
• Require global Authentication, Authorization and Accounting (AAA) Services
• Need robust and secure middleware services supported on top of network services
Global Terabit Research Network
The Grid software and resources run on top of high performance global networks
UK e-Science Funding
First Phase: 2001 –2004• Application Projects
– £74M– All areas of science
and engineering• Core Programme
– £15M OST– £20M DTI
Collaborative industrial projects
Second Phase: 2003 –2006• Application Projects
– £96M– All areas of science and
engineering• Core Programme
– £16M OST– DTI Technology Fund
GridPP Presentation to PPARC Grid Steering Committee26 July 2001
Steve Lloyd Tony DoyleJohn Gordon
Powering the Virtual Universe
http://www.astrogrid.ac.uk(Edinburgh, Belfast, Cambridge,
Leicester, London, Manchester, RAL)
Multi-wavelength showing the jet in M87: from top to bottom – Chandra X-ray, HST optical, Gemini mid-IR, VLA radio. AstroGrid will provide advanced, Grid based, federation and data mining tools to facilitate better and faster scientific output.
Picture credits: “NASA / Chandra X-ray Observatory / Herman Marshall (MIT)”, “NASA/HST/Eric Perlman (UMBC), “Gemini Observatory/OSCIR”, “VLA/NSF/Eric Perlman (UMBC)/Fang Zhou, Biretta (STScI)/F Owen (NRA)”
p13 Printed: 21/04/23
Comb-e-Chem Project
X-Raye-Lab
Analysis
Properties
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
In flight data
Airline
Maintenance Centre
Ground Station
Global Networkeg: SITA
Internet, e-mail, pager
DS&S Engine Health Center
Data centre
DAME Project
APPLICATION SERVICE
PROVIDERCOMPUTATION
GEODISE PORTAL
OPTIMISATION
Engineer
Parallel machinesClusters
Internet Resource ProvidersPay-per-use
Optimisation archive
Intelligent Application Manager
Intelligent Resource Provider
Licenses and code
Session database
Design archive
OPTIONSSystem
Knowledge repository
Traceability
Visualization
Globus, Condor, SRB
Ontology for Engineering,
Computation, &Optimisation and Design Search
CAD SystemCADDSIDEASProE
CATIA, ICAD
AnalysisCFDFEMCEM
ReliabilitySecurity
QoS
GEODISE Project
Computational science
• Molecular dynamics
• Mesoscale modelling
• High throughput experiments
• High performance visualization
• Computational steering
• Terascale parallel computing
myGrid Project
• Imminent ‘deluge’ of data
• Highly heterogeneous• Highly complex and
inter-related• Convergence of data
and literature archives
Nucleotide Annotation Workflows
Discovery Net Project
Download sequence
from Reference
Server
Save to Distributed Annotation
Server
InteractiveEditor &
Visualisation
Execute distributed annotation workflow
NCBIEMBL
TIGR SNP
InterPro
SMART
SWISSPROT
GO
KEGG
1800 clicks 500 Web access200 copy/paste 3 weeks work in 1 workflow and few second execution
BioSim Grid
An e-science challenge – non-trivial
NASA IPG as a possible paradigm
Need to integrate rigorously if to deliver accurate & hence biomedically useful results
Noble (2002) Nature Rev. Mol. Cell.Biol. 3:460
Sansom et al. (2000) Trends Biochem. Sci. 25:368
molecular
cellular
organism
GENIE: Delivering e-Science to the environmental scientist
http://www.genie.ac.uk/
MIAS Devices Project
• Easy Plug and Play of Sensors• Wireless connection using
802.11• Positioning information from
GPS• Mobile medical technologies
on a distributed Grid
Sensor bus
GPS ariel
eDiamond Applications of SMF
Training and Differential Diagnosis “Find one like it”
Teleradiology and QC VirtualMammo
Epidemiology SMFcomputed breast density
?
Advanced CAD SMF-CAD workstation
e-Science Core Program (CP)
Overall Rationale: Four major functions of CP
– Assist development of essential, well-engineered, generic, Grid middleware usable by both e-scientists and industry
– Provide necessary infrastructure support for UK e-Science Research Council projects
– Collaborate with the international e-Science and Grid communities
– Work with UK industry to develop industrial-strength Grid middleware
Support for e-Science Projects
• Grid Support Centre in operation– supported Grid middleware & users– see www.grid-support.ac.uk
• National e-Science Institute – Research Seminars– Training Programme – See www.nesc.ac.uk
• e-Science Certificate Authority– Issue digital certificates for projects– Goal is ‘single sign-on'
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Southampton
London
Belfast
DL
RAL Hinxton
UK e-Science Grid
Access Grid – Group Conferencing
Multi-site group-to-group conferencing system
Continuous audio and video contact with all participants
Globally deployed
All UK e-Science Centres have AG rooms
Widely used for technical and management meetings
e-Science Centres of Excellence
• Birmingham/Warwick – Modelling
• Bristol – Media
• UCL – Networking
• Leeds, York, Sheffield - White Rose Grid
• Lancaster – Social Science
• Leicester – Astronomy
• Reading - Environment
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Soton
London
Belfast
DL
RL Hinxton
UK e-Science Grid: Second Phase OGSA Grid
SuperJANET4
Networking Research Projects
Network Infrastructure
GRS, GRID resource management
MB-NG, QoS Features
GRIDprobe, backbone passive monitoring at 10Gbps
‘FutureGRID, P2P architecture
GridMcast, Multicast-enabled data distribution
GRID Infrastructure
Service Infrastructure
UKLight• Infrastructure to support the UK part of an
International facility – the Global Lambda Integrated Facility - for Network R&D
– e.g. international Gigabit ethernet ‘channels’
• An example of the kind of infrastructure which might be incorporated in to SuperJANET5
• Implemented with:
– Telco Wavelength services
– Multiplexed channels (SDH) to carry Gigabit ethernet
GEANT
UKLightLondonStarLight
Chicago
NetherLightAmsterdam
CERN
CzechLight
UK Researchers
Extended JANETDevelopment Network
Local Research Equipment
International Point-of-Access
CA*net
Abilene
UKLight – showing connections to selected International peer facilities
10Gb/s
10Gb/s
10Gb/s
10Gb/s
10Gb/s
2.5Gb/s
Existing connections
Proposedconnections
The UK Grid Experience: Phase 1
• UK Programme on Grids for e-Science– £75M for e-Science Applications
• UK Grid Core Programme for Industry– £35M for collaborative industrial R&D
Over 80 UK companies participating Over £30M industrial contributions
• Engineering, Pharmaceutical, Petrochemical• IT companies, Commerce, Media
e-Science CP – Next Steps Deploy ‘production National Grid Service’
based on four dedicated JISC nodes plus the two UK Supercomputer Facilities
Develop operational policies, security, …Gain experience with genuine user
community
Develop ‘OGSA’ based e-Science Grid Based on two OGSA Grid projects and
e-Science CentresWork with EGEE project
Research Prototype Middleware to Production Quality
• Research projects are not funded to do the regression testing, configuration and QA required to produce production quality middleware
• Common rule of thumb (Brooks) is that it requires at least 10 times more effort to take ‘proof of concept’ research software to production quality
Key issue for UK e-Science projects is to ensure that there is some documented, maintainable, robust grid middleware by the end of the 5 year £250M initiative
Open Grid Services Architecture • Development of Web Services• OGSA and WSRF will provide
Naming /Authorization / Security / Privacy/… Projects looking at higher level services: Workflow,
Transactions, DataMining, Knowledge Discovery… Exploit Synergy: Commercial Internet
with Grid Services
The Key Problem: Research Prototype Middleware to Production Quality
• Research projects are not funded to do the regression testing, configuration and QA required to produce production quality middleware
• Common rule of thumb is that it requires at least 10 times more effort to take ‘proof of concept’ research software to production quality
Key issue for UK e-Science projects is to ensure that there is some documented, maintainable, robust grid middleware by the end of the 5 year £250M initiative
The UK Open Middleware Infrastructure Institute (OMII)
• Repository for UK-developed Open Source ‘e-Science/Cyber-infrastructure’ Middleware
• Documentation, specification,QA and standards
• Fund work to bring ‘research project’ software up to ‘production strength’
• Fund Middleware projects for identified ‘gaps’
• Work with US NSF, EU Projects and others
• Supported by major IT companies Southampton selected as the OMII site
2.4 Petabytes Today
Digital Curation Centre (DCC)• In next 5 years e-Science projects will produce
more scientific data than has been collected in the whole of human history
• In 20 years can guarantee that the operating and spreadsheet program and the hardware used to store data will not exist
Research curation technologies and best practice Need to liaise closely with individual research
communities, data archives and libraries
Edinburgh with Glasgow, CLRC and UKOLN selected as site of DCC
The UK ‘Dual Support’ SystemUK Government provides two streams of public funding for university research:– Funding provided to the universities for
research infrastructure – salaries of permanent academic staff, premises, libraries & central computing costs
– Funding from the Research Councils for specific projects – in response to proposals submitted & approved through peer review
This ‘Well Founded Laboratory’ concept needs to be extended to support ‘virtual laboratories’
Initial Research Infrastructure Portfolio
• Prototype ‘National Grid Service’ based on dedicated Compute and Data Clusters
• Semantic Grid development projects
• AccessGrid Support Service
• e-Social Science Training material
• Intelligent Text Mining Service for Biosciences
• Digital Curation Centre
New Research Infrastructure Funding• £3M for Security Development Projects
– Combine Shibboleth with PERMIS Authorization Services
– Joint project with NSF Internet2 NMI project on Security Services for ‘Virtual Organizations’
• £2M+ for Identified JCSR Topics– Data Handling, Visualization, Knowledge
Management, Collaborative Tools, Human Factors– New ‘Grids for Education’ program?
• £3M for ‘Collaborative e-Research Environments’– Data Analysis and Visualization Centres?
New Research Infrastructure Funding 2
• £3.4M for ‘National Middleware Services’
– Deployment of NationalAuthentication Framework based on Shibboleth
– Provide support Digital Library and e-Science communities
• Goal is to develop global AAA infrastructure
- Australia and Switzerland exploring this route
- Work with US NMI and EU EGEE projects
- Work with Internet2, TERENA and the NRENs
What is Shibboleth? An architecture developed by the Internet2
middleware community• NOT an authentication scheme (relies on
home site infrastructure to do this)• NOT an authorisation scheme (leaves this
to the resource owner)• BUT an open, standards-based protocol for
securely transferring attributes between home site and resource site
• Also provided as an open-source reference software implementation
UK e-Science Timeframes
2001 2002 2003 2004 2005 2006 2007
SR2000 * * *
SR2002 * * *
SR2004 * * *
SJ5/AAA Service * *
LHC/LCG *
e-Science Infrastructure beyond 2006
1. Persistent UK e-Science Research Grid
2. Grid Operations Centre
3. Open Middleware Infrastructure Institute
4. National e-Science Institute
5. Digital Curation Centre
6. AccessGrid Support Service
7. e-Science/Grid Legal Service
8. International Standards Activity
Cyberinfrastructure and Universities ‘e-Science will change the dynamic of the way science
is undertaken.’ John Taylor, 2001
Need to break down the barriers between the Victorian ‘bastions’ of science – biology, chemistry, physics, ….
Problems with tenure and publicationsDevelop ‘permeable’ structures that promote rather
than hinder multidisciplinary collaboration Need to engage University IT Service Departments –
Computing, Library, ..
A Definition of the Grid
‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’
Tony Blair, 2002