Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd...
-
date post
23-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd...
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 1
The Data Delugeand the
GridSteve Lloyd
Professor of Experimental Particle Physics Inaugural Lecture
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 2
Outline
• What is Data?• Where it comes from – e-
Science• The CERN LHC and
Experiments• What is the Grid?• GridPP• Challenges ahead
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 3
What is Data?
Anything that can be expressed as numbers
Raw Information → Numbers → Binary Digits
Pictures
Electrical Signals
Sound
Store amount of Red, Green and Blue
Store loudness at each time
Lots of Pictures + Sound = DVD Video
Store voltage or current
Text
Every character has a numerical code
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 4
Digital DataNumbers are stored as Binary digits1 Bit = 0 or 11 Byte = 8 bits
Can store yes/no or on/offCan store numbers from 0 to 255(Enough for a character a-z, A-Z, *£$@<... )
25 = 0x128 + 0x64 + 0x32 + 1x16 +1x8 + 0x4 + 0x2 + 1x1 = 00011001
1 kiloByte = ~1,000 Bytes
Typical Word Document ~30kB
1 MegaByte = ~1,000,000 Bytes
A Floppy Disk ~1.4MBA CD ~700MB
1 GigaByte = ~1,000,000,000 Bytes
Typical PC Hard Drive 20-120 GB
1 TeraByte = ~1,000,000,000,000 Bytes1 PetaByte = ~1,000,000,000,000,000 Bytes
~1.4 Million CDs
1 ExaByte = ~1,000,000,000,000,000,000 Bytes
World Annual Book Production
World Annual Information Production
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 5
Data Analysis
What is done with data?
Nothing
Read it Listen to it
Watch it
Analyse it
23Read A
Read BC = A + BPrint C
5
Computer
Program
"Job"
Calculate how proteins fold
Calculate what the weather is going to do
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 6
e-Science
In the UK this sort of activity has become known as "e-Science"
"e-Science will change the dynamic of the way Science is undertaken"
"Science increasingly done through distributed global collaborations enabled by the internet using very large data collections, terascale computing resources and high performance visualisation"
Dr John Taylor - Director General of Research Councils:
"e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it"
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 7
Astronomy
Crab Nebula
Optical
RadioInfra-red
X-ray
Jet in M87
HST optical
Gemini mid-IR
VLA radio
Chandra X-ray Virtual Observatories
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 8
Earth Observation
1 TB/day
Ozone map
Ottawa
Trafalgar Square
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 9
Species 2000
To enumerate all ~1.7 million known species of plants, animals, fungi and microbes on Earth for studies of biodiversity
A federation of initially 18 taxonomic databases - eventually ~ 200 databases
From protozoa to platypus to primates
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 10
Bioinformatics
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 11
Healthcare
Dynamic Brain Atlas
Breast Screening
Scanning
Remote Consultancy
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 12
Collaborative Engineering
Real-timecollection
Multi-sourceData Analysis
Unitary Plan Wind Tunnel
Archival storage
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 13
Digital Curation
Digitization of almost anything
To create Digital Libraries and Museums
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 14
The CERN LHC
4 Large Experiments
The world’s most powerful particle accelerator - 2007
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 15
7,000 tonnes 42m long22m wide22m high
2,000 Physicists150 Institutes34 Countries
ATLAS Detector
(About the height of a 5 storey building)
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 16
ATLAS Pit
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 17
The Higgs
• Primary objective of the LHC -
• What is the origin of Mass? • Is it the Higgs Particle?
Massless Particle – Travels at the speed of light
Low Mass Particle – Travels slower
High Mass Particle – Travels slower still
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 18
Starting from this event…
We are looking for this “signature” Selectivity: 1 in 1013
Like looking for 1 person in a thousand world populations
Or for a needle in 20 million haystacks!
LHC Data Challenge
• ~100,000,000 electronic channels
• 800,000,000 proton-proton interactions per second
• 0.0002 Higgs per second
• 10 PBytes of data a year
• (10 Million GBytes = 14 Million CDs)
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 19
LHC Computing
RequirementsCPU Power (Reconstruction, Simulation, User Analysis etc) - 50,000 of today's PCs
Distributed Computing Solution – "The Grid"
'Tape' Storage20 PetaBytes (= 20 M GBytes)
Disk Storage – 2.5 PetaBytes (= 2.5 M GBytes)
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 20
The Grid
Ian Foster / Carl Kesselman:
"A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities."
'Grid' means different things to different people
All agree it's a funding opportunity!
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 21
Electricity Grid
Analogy with the Electricity Power Grid
'Standard Interface'
Power Stations
Distribution Infrastructure
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 22
Computing Grid
Computing and Data Centres
Fibre Optics of the Internet
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 23
What is the Grid?
MIDDLEWARE
CPUDisks, CPU etc
PROGRAMS
OPERATING SYSTEM
Word/Excel
Email/Web
Your Progra
mGames
CPUCluste
r
UserInterfac
eMachine
CPUCluste
r
CPUCluste
r
Resource Broker
Information Service
Single PC
Grid
DiskCluste
r
Your Progra
m
Middleware is the Operating System of a distributed computing system
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 24
What is the Grid?
From this:
To this:
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 25
SETI@home
• A distributed computing project - not really a Grid project
• You pull the data from them rather than they submit the job to you
Arecibo telescope in Puerto Rico
Users - 5,240,038
Results received – 1,632,106,991
Years of CPU Time – 2,121,057
Extraterrestrials found – 0
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 26
Entropia
Uses idle cycles on Home PCs for profit and non-profit projects:
FightAIDS@Home• 60,000 Machines• 1,400 years of cpu time
Rebranding!
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 27
GridPP
19 UK Universities, CCLRC (RAL & Daresbury) and CERN
Funded by the Particle Physics and Astronomy Research Council (PPARC)
GridPP1 - 2001-2004 £17m "From Web to Grid"
GridPP2 - 2004-2007 £15m "From Prototype to Production"
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 28
International Collaboration
• EU DataGrid (EDG) 2001-2004– Middleware Development
Project
• US and other Grid projects– Interoperability
• LHC Computing Grid (LCG)– Grid Deployment Project for
LHC • EU Enabling Grids for e-Science
in Europe (EGEE) 2004-2006– Grid Deployment Project for all
disciplines
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 29
Application Development
ATLAS LHCb CMS
BaBar (SLAC) SAMGrid (FermiLab) QCDGrid
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 30
Middleware Development
Configuration Management
Storage Interfaces
Network Monitoring
Security
Information Services
Grid Data Management
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 31
Tier Structure
'Tier-0' – where the data comes from
'Tier-1' – major centres in large countries
'Tier-2' – smaller centres in large countries or smaller countries
UK Tier-1
US Tier-1
Italy Tier-1
Germany Tier-
1
France
Tier-1
Spain Tier-2
Poland
Tier-2
. . . Tier-2
. . . Tier-1
UK Tier-2
UK Tier-2
UK Tier-2
UK Tier-2
CERN Tier-0
Tier structure not necessarily appropriate for all disciplines
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 32
UK Tier-1/A Centre
• High quality data services• National and International
Role• UK focus for International Grid
development
•700 Dual CPU•80 TB Disk•60 TB Tape (Capacity 1PB)Grid Operations
Centre
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 33
UK Tier-2 Centres
ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
LondonBrunel, Imperial, QMUL, RHUL, UCL
Mostly funded by HEFCE
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 34
The Grid at QM
The Queen Mary e-Science High Throughput Cluster
174 PCs (348 CPUs)40 TByte Disk Storage
Part of the London Tier-2 Centre
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 35
The LCG Grid
89 Sites
9,056 CPUs
3 PBytes Disk
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 36
Grid Snapshot
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 37
Challenges
(Ex-)Concorde(15 km)
CD stack with1 year LHC data(~ 20 km)
We are here(1 km)
• Scaling to full size ~10,000 → 100,000 CPUs
• Stability, Robustness etc• Security (Hackers Paradise!)• Sharing resources (in RAE
environment!)• International Collaboration• Continued funding beyond
start of LHC!
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 38
Further Info
http://www.gridpp.ac.uk