1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big...

14
1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: “Big Science” Grid Computing Standards for Grid Computing e-Science – what is it e-Science Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis

Transcript of 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big...

Page 1: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

1(#total)

CS5038 The Electronic SocietyLecture 10: e-Science Lecture Outline

• Background: “Big Science”• Grid Computing

• Standards for Grid Computing

• e-Science – what is it

• e-Science Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis

Page 2: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

2(#total)

e-Science - Background“Big Science”

During early part of 20th Century, Science became crucial in warfare World War II : Scientists developed new weapons and tools

proximity fuse, radar, atomic bomb, cryptography Lead to a new form of research facility: Government-sponsored laboratory

thousands of technicians and scientists, managed by universities Enabled hitherto impossible scientific projects heavy investment by government and industrial interests:

blurred line between public and private researchCriticisms:

Undermines basic principles of scientific method: Results difficult to verify. Access to facilities limited to those who are accomplished -> elitism. Increased government funding often implies military agenda Subverts the Enlightenment-era ideal of science as quest for knowledge. Increased administrative overhead – e.g. filling out grant requests Connections between academic, governmental, and industrial interests

Concern about Scientists’ objectivity (e.g. pharmaceutical industry)

Internet was born from "Big Science" August 1991 CERN (Switzerland) : new World Wide Web project

Page 3: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

3(#total)

Grid ComputingGrid computing evolved from the computational needs of “Big Science”“Grid computing uses the resources of many separate computers connected by

a network (usually the internet) to solve large-scale computation problems.”A conceptual framework rather than a physical resource:

flexible computational provisioning beyond the local administrative domain.

Involves sharing computing power: heterogeneous resources (based on different platforms, hardware/software

architectures, and computer languages), located in different places belonging to different administrative domains using open standards.

Requires security : to allow remote users to control computing resources.Special Purpose Grid – Example: SETI@home project General Purpose Grid - Example: Parabon Computation (Commercial)In terms of function: Three types of grid:

Computational Grids : computationally-intensive operations. Data grids: sharing and management of large amounts of distributed data. Equipment Grids: control equipment remotely and analyse data produced.

e.g. controlling a telescope

Page 4: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

4(#total)

Grid Standards - GlobusGlobus Alliance is an association – mainly Universities

(e.g. Chicago, Edinburgh, Southern California) Developing fundamental technologies needed to build grid computing

infrastructures Most grids in Europe and North America use the Globus Toolkit as their

core middleware. Globus software provides (e.g.):

Resource management: Grid Resource Allocation & Management Protocol (GRAM)

Information Services: Monitoring and Discovery Service (MDS) Security Services: Grid Security Infrastructure (GSI) Data Movement and Management: Global Access to Secondary Storage

(GASS) and GridFTP

XML-based web services allow access to services/applications grid computing and web services converge: Grid Service Open Grid Services Architecture (OGSA): vision is to describe and build a

well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications.

Page 5: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

5(#total)

e-ScienceWhat is e-Science? - science enabled by electronic infrastructure

Computationally intensive Uses highly distributed network environments Requires access to immense data sets May require Grid Computing High performance visualisation back to the individual user scientists

Examples: Social Simulations – modelling land-use change Particle Physics (LHC), Astronomy (VirtualObservatory) Environmental Sciences – Climate Change Engineering - Aircraft Maintenance Economics – Predicting Markets Bio-informatics – Simulated Biology Healthcare - Cancer Diagnosis Middleware: Data communication, data integration

Organisations: Requires large and complex infrastructure

Research Labs, Large Universities, Governments (e.g. UK)

Page 6: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

6(#total)

e-Science Examples: Particle Physics Large Hadron Collider (LHC) at CERN

Currently the most developed e-Science infrastructure LHC due to start generating data in 2007/8/9?? Massive amount of data generated

Estimated at 10 petabytes each year (peta=1015) Thousands of researchers across the world will be

involved in the LHC experiments and in analysing results.

GridPP UK’s contribution to analysing this data deluge. Six-year, £33m project Collaboration of around 100 researchers in 19 UK

University particle physics groups, CCLRC and CERN. More than 100,000 PCs, spread at one hundred

institutions across the world. Three main areas of work:

• Applications to allow physicists to submit data to Grid for analysis

• Middleware to manage the distribution of computing jobs around the grid and deal with security

• Deploying computing infrastructure at sites across the UK, to build a prototype Grid.

Page 7: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

7(#total)

Page 8: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

8(#total)

e-Science Examples: AstronomyAstrogrid £10M project to build a data-grid for UK astronomy Forms the UK’s contribution to a global

VirtualObservatory Three main strands to VirtualObservatory

1. International standards for astronomical data, metadata, and software Interoperability

2. New software infrastructure using emerging technology: web services and the Grid.

3. Science user tools to exploit the new infrastructure will bring the VO to the astronomer’s desktop.

Goals of Astrogrid (mainly thread 2): Datagrid for key UK databases Datamining facilities for interrogating those databases -

e.g. search for ‘cloaked’ objects A uniform archive query and data-mining interface A facility for users to upload code to run their own

algorithms on the datamining machines An exploration of techniques for open-ended resource

discovery

Page 9: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

9(#total)

e-Science Examples: Climate ChangeClimateprediction.net To address the enormous variation in current climate

predictions Existing climate models have to include the effects of small-

scale physical processes (such as clouds) through simplifications (parameterisations) Results can be out by an order of magnitude

Experimental Objective: Ensemble Forecasting Run thousands of climate models with slightly different

physics in order to represent the whole range of uncertainties in all the parameterisations.(parameters are varied within their current range of uncertainty)

The project has already recruited 37,000 users

Project Goal: to make the first fully probability-based

fifty-year forecast of human-induced climate change using a full-scale 3-D atmosphere-ocean climate simulation model.

Page 10: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

10(#total)

e-Science Example: Aircraft MaintenanceDAME project

£3.2 Million, 3 years, commenced Jan 2002.4 Universities:

York, Sheffield, Oxford, LeedsIndustrial Partners:

Rolls-Royce, Data Systems, Cybula Ltd

Aim: aerospace diagnostics Remote, secure access to flight data and other operational data and resources Rapid data mining and analysis of fault data Distributed search on massive data collections using scalable, neural network type

methods for comparing data with archived fleet engine data. Each flight could produce up to 1GB of vibration data

The DAME workbench (portal) Analysis tools for the engine diagnosis process Central control point for automated workflows Manages distributed diagnosis team and virtual organisations Manages issues of security and user roles.

Page 11: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

11(#total)

e-Science Example: Aircraft Maintenance

Engine flight data

Airline office

Maintenance Centre

European data center

London Airport

New York Airport

American data center

GridDiagnostics Centre

Page 12: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

12(#total)

e-Science Example: Predicting Markets The INWA Grid project (Innovation Node: Western Australia) :

Investigating suitability of existing Grid technologies for secure, commercial data mining.

The three-continent Grid: Edinburgh Parallel Computing Center (EPCC) Curtin University in Western Australia (WA) Chinese Academy of Sciences in Beijing.

Data mining to predict customer trends, develop new products and better meet customer needs.

Samples drawn from a region + publicly available -> build a clearer picture of regional behaviour within the economy But: need a distributed-aggregated approach to preserve anonymity

Resources UK mortgage data + UK property data Australian telco data +Australian property data Compute power at EPCC + Curtin

Scenario A bank wants to predict if home owners are likely to move house within 5 years of

taking out a mortgage to buy the house Bank wants to use its own data and publicly available data to help improve the

prediction

Page 13: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

13(#total)

e-Science Example: Simulated BiologyBioSimGrid project Aim: to make the results of large-scale

computer simulations of biomolecules more accessible to the biological community.

Simulations of the motions of proteins are a key component in understanding how the structure of a protein is related to its dynamic function.

Data distributed between University of California, San Diego and Oxford.

Simulations were run using different programs and protocols Data in very different formats.

• Software tools for interrogation and data-mining• Generic analysis tools (python), visualisation VMD • Annotation of simulation data• Readily modifiable simple example scripts• Underlying data storage structure hidden

Page 14: 1(#total) CS5038 The Electronic Society Lecture 10: e-Science Lecture Outline Background: Big Science Grid Computing Standards for Grid Computing e-Science.

14(#total)

e-Science Examples: Cancer DiagnosisTelemedicine on the Grid Multi-site videoconferencing Real-time delivery of microscope imagery Communication and archiving of radiological

images Supports multi-disciplinary meetings for the

review of cancer diagnoses and treatment. Remote access to computational medical

simulations of tumours and other cancer-related problems

Data-mining of patient record databases Improved clinical decision making.

Currently clinicians travel large distances

Grid technology can provide access to appropriate clinical information and images across the network.