1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big...
-
Upload
gabriella-montgomery -
Category
Documents
-
view
213 -
download
1
Transcript of 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big...
1(#total)
CS5038 The Electronic SocietyLecture 10: e-Science Lecture Outline
• Background: “Big Science”• Grid Computing
• Standards for Grid Computing
• e-Science – what is it
• e-Science Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis
2(#total)
e-Science - Background“Big Science”
During early part of 20th Century, Science became crucial in warfare World War II : Scientists developed new weapons and tools
proximity fuse, radar, atomic bomb, cryptography Lead to a new form of research facility: Government-sponsored laboratory
thousands of technicians and scientists, managed by universities Enabled hitherto impossible scientific projects heavy investment by government and industrial interests:
blurred line between public and private researchCriticisms:
Undermines basic principles of scientific method: Results difficult to verify. Access to facilities limited to those who are accomplished -> elitism. Increased government funding often implies military agenda Subverts the Enlightenment-era ideal of science as quest for knowledge. Increased administrative overhead – e.g. filling out grant requests Connections between academic, governmental, and industrial interests
Concern about Scientists’ objectivity (e.g. pharmaceutical industry)
Internet was born from "Big Science" August 1991 CERN (Switzerland) : new World Wide Web project
3(#total)
Grid ComputingGrid computing evolved from the computational needs of “Big Science”“Grid computing uses the resources of many separate computers connected by
a network (usually the internet) to solve large-scale computation problems.”A conceptual framework rather than a physical resource:
flexible computational provisioning beyond the local administrative domain.
Involves sharing computing power: heterogeneous resources (based on different platforms, hardware/software
architectures, and computer languages), located in different places belonging to different administrative domains using open standards.
Requires security : to allow remote users to control computing resources.Special Purpose Grid – Example: SETI@home project General Purpose Grid - Example: Parabon Computation (Commercial)In terms of function: Three types of grid:
Computational Grids : computationally-intensive operations. Data grids: sharing and management of large amounts of distributed data. Equipment Grids: control equipment remotely and analyse data produced.
e.g. controlling a telescope
4(#total)
Grid Standards - GlobusGlobus Alliance is an association – mainly Universities
(e.g. Chicago, Edinburgh, Southern California) Developing fundamental technologies needed to build grid computing
infrastructures Most grids in Europe and North America use the Globus Toolkit as their
core middleware. Globus software provides (e.g.):
Resource management: Grid Resource Allocation & Management Protocol (GRAM)
Information Services: Monitoring and Discovery Service (MDS) Security Services: Grid Security Infrastructure (GSI) Data Movement and Management: Global Access to Secondary Storage
(GASS) and GridFTP
XML-based web services allow access to services/applications grid computing and web services converge: Grid Service Open Grid Services Architecture (OGSA): vision is to describe and build a
well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications.
5(#total)
e-ScienceWhat is e-Science? - science enabled by electronic infrastructure
Computationally intensive Uses highly distributed network environments Requires access to immense data sets May require Grid Computing High performance visualisation back to the individual user scientists
Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis Middleware: Data communication, data integration
Organisations: Requires large and complex infrastructure
Research Labs, Large Universities, Governments (e.g. UK)
6(#total)
e-Science Examples: Particle Physics Large Hadron Collider (LHC) at CERN
Currently the most developed e-Science infrastructure LHC due to start generating data in 2007/8/9?? Massive amount of data generated
Estimated at 10 petabytes each year (peta=1015) Thousands of researchers across the world will be
involved in the LHC experiments and in analysing results.
GridPP UK’s contribution to analysing this data deluge. Six-year, £33m project Collaboration of around 100 researchers in 19 UK
University particle physics groups, CCLRC and CERN. More than 100,000 PCs, spread at one hundred
institutions across the world. Three main areas of work:
• Applications to allow physicists to submit data to Grid for analysis
• Middleware to manage the distribution of computing jobs around the grid and deal with security
• Deploying computing infrastructure at sites across the UK, to build a prototype Grid.
7(#total)
8(#total)
e-Science Examples: AstronomyAstrogrid £10M project to build a data-grid for UK astronomy Forms the UK’s contribution to a global
VirtualObservatory Three main strands to VirtualObservatory
1. International standards for astronomical data, metadata, and software Interoperability
2. New software infrastructure using emerging technology: web services and the Grid.
3. Science user tools to exploit the new infrastructure will bring the VO to the astronomer’s desktop.
Goals of Astrogrid (mainly thread 2): Datagrid for key UK databases Datamining facilities for interrogating those databases -
e.g. search for ‘cloaked’ objects A uniform archive query and data-mining interface A facility for users to upload code to run their own
algorithms on the datamining machines An exploration of techniques for open-ended resource
discovery
9(#total)
e-Science Examples: Climate ChangeClimateprediction.net To address the enormous variation in current climate
predictions Existing climate models have to include the effects of small-
scale physical processes (such as clouds) through simplifications (parameterisations) Results can be out by an order of magnitude
Experimental Objective: Ensemble Forecasting Run thousands of climate models with slightly different
physics in order to represent the whole range of uncertainties in all the parameterisations.(parameters are varied within their current range of uncertainty)
The project has already recruited 37,000 users
Project Goal: to make the first fully probability-based
fifty-year forecast of human-induced climate change using a full-scale 3-D atmosphere-ocean climate simulation model.
10(#total)
e-Science Example: Aircraft MaintenanceDAME project
£3.2 Million, 3 years, commenced Jan 2002.4 Universities:
York, Sheffield, Oxford, LeedsIndustrial Partners:
Rolls-Royce, Data Systems, Cybula Ltd
Aim: aerospace diagnostics Remote, secure access to flight data and other operational data and resources Rapid data mining and analysis of fault data Distributed search on massive data collections using scalable, neural network type
methods for comparing data with archived fleet engine data. Each flight could produce up to 1GB of vibration data
The DAME workbench (portal) Analysis tools for the engine diagnosis process Central control point for automated workflows Manages distributed diagnosis team and virtual organisations Manages issues of security and user roles.
11(#total)
e-Science Example: Aircraft Maintenance
Engine flight data
Airline office
Maintenance Centre
European data center
London Airport
New York Airport
American data center
GridDiagnostics Centre
12(#total)
e-Science Example: Predicting Markets The INWA Grid project (Innovation Node: Western Australia) :
Investigating suitability of existing Grid technologies for secure, commercial data mining.
The three-continent Grid: Edinburgh Parallel Computing Center (EPCC) Curtin University in Western Australia (WA) Chinese Academy of Sciences in Beijing.
Data mining to predict customer trends, develop new products and better meet customer needs.
Samples drawn from a region + publicly available -> build a clearer picture of regional behaviour within the economy But: need a distributed-aggregated approach to preserve anonymity
Resources UK mortgage data + UK property data Australian telco data +Australian property data Compute power at EPCC + Curtin
Scenario A bank wants to predict if home owners are likely to move house within 5 years of
taking out a mortgage to buy the house Bank wants to use its own data and publicly available data to help improve the
prediction
13(#total)
e-Science Example: Simulated BiologyBioSimGrid project Aim: to make the results of large-scale
computer simulations of biomolecules more accessible to the biological community.
Simulations of the motions of proteins are a key component in understanding how the structure of a protein is related to its dynamic function.
Data distributed between University of California, San Diego and Oxford.
Simulations were run using different programs and protocols Data in very different formats.
• Software tools for interrogation and data-mining• Generic analysis tools (python), visualisation VMD • Annotation of simulation data• Readily modifiable simple example scripts• Underlying data storage structure hidden
14(#total)
e-Science Examples: Cancer DiagnosisTelemedicine on the Grid Multi-site videoconferencing Real-time delivery of microscope imagery Communication and archiving of radiological
images Supports multi-disciplinary meetings for the
review of cancer diagnoses and treatment. Remote access to computational medical
simulations of tumours and other cancer-related problems
Data-mining of patient record databases Improved clinical decision making.
Currently clinicians travel large distances
Grid technology can provide access to appropriate clinical information and images across the network.