Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd...

38
Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd...

Page 1: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 1

The Data Delugeand the

GridSteve Lloyd

Professor of Experimental Particle Physics Inaugural Lecture

Page 2: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 2

Outline

• What is Data?• Where it comes from – e-

Science• The CERN LHC and

Experiments• What is the Grid?• GridPP• Challenges ahead

Page 3: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 3

What is Data?

Anything that can be expressed as numbers

Raw Information → Numbers → Binary Digits

Pictures

Electrical Signals

Sound

Store amount of Red, Green and Blue

Store loudness at each time

Lots of Pictures + Sound = DVD Video

Store voltage or current

Text

Every character has a numerical code

Page 4: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 4

Digital DataNumbers are stored as Binary digits1 Bit = 0 or 11 Byte = 8 bits

Can store yes/no or on/offCan store numbers from 0 to 255(Enough for a character a-z, A-Z, *£$@<... )

25 = 0x128 + 0x64 + 0x32 + 1x16 +1x8 + 0x4 + 0x2 + 1x1 = 00011001

1 kiloByte = ~1,000 Bytes

Typical Word Document ~30kB

1 MegaByte = ~1,000,000 Bytes

A Floppy Disk ~1.4MBA CD ~700MB

1 GigaByte = ~1,000,000,000 Bytes

Typical PC Hard Drive 20-120 GB

1 TeraByte = ~1,000,000,000,000 Bytes1 PetaByte = ~1,000,000,000,000,000 Bytes

~1.4 Million CDs

1 ExaByte = ~1,000,000,000,000,000,000 Bytes

World Annual Book Production

World Annual Information Production

Page 6: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 6

e-Science

In the UK this sort of activity has become known as "e-Science"

"e-Science will change the dynamic of the way Science is undertaken"

"Science increasingly done through distributed global collaborations enabled by the internet using very large data collections, terascale computing resources and high performance visualisation"

Dr John Taylor - Director General of Research Councils:

"e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it"

Page 7: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 7

Astronomy

Crab Nebula

Optical

RadioInfra-red

X-ray

Jet in M87

HST optical

Gemini mid-IR

VLA radio

Chandra X-ray Virtual Observatories

Page 8: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 8

Earth Observation

1 TB/day

Ozone map

Ottawa

Trafalgar Square

Page 9: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 9

Species 2000

To enumerate all ~1.7 million known species of plants, animals, fungi and microbes on Earth for studies of biodiversity

A federation of initially 18 taxonomic databases - eventually ~ 200 databases

From protozoa to platypus to primates

Page 10: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 10

Bioinformatics

Page 12: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 12

Collaborative Engineering

Real-timecollection

Multi-sourceData Analysis

Unitary Plan Wind Tunnel

Archival storage

Page 13: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 13

Digital Curation

Digitization of almost anything

To create Digital Libraries and Museums

Page 14: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 14

The CERN LHC

4 Large Experiments

The world’s most powerful particle accelerator - 2007

Page 15: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 15

7,000 tonnes 42m long22m wide22m high

2,000 Physicists150 Institutes34 Countries

ATLAS Detector

(About the height of a 5 storey building)

Page 16: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 16

ATLAS Pit

Page 17: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 17

The Higgs

• Primary objective of the LHC -

• What is the origin of Mass? • Is it the Higgs Particle?

Massless Particle – Travels at the speed of light

Low Mass Particle – Travels slower

High Mass Particle – Travels slower still

Page 18: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 18

Starting from this event…

We are looking for this “signature” Selectivity: 1 in 1013

Like looking for 1 person in a thousand world populations

Or for a needle in 20 million haystacks!

LHC Data Challenge

• ~100,000,000 electronic channels

• 800,000,000 proton-proton interactions per second

• 0.0002 Higgs per second

• 10 PBytes of data a year

• (10 Million GBytes = 14 Million CDs)

Page 19: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 19

LHC Computing

RequirementsCPU Power (Reconstruction, Simulation, User Analysis etc) - 50,000 of today's PCs

Distributed Computing Solution – "The Grid"

'Tape' Storage20 PetaBytes (= 20 M GBytes)

Disk Storage – 2.5 PetaBytes (= 2.5 M GBytes)

Page 20: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 20

The Grid

Ian Foster / Carl Kesselman:

"A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities."

'Grid' means different things to different people

All agree it's a funding opportunity!

Page 22: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 22

Computing Grid

Computing and Data Centres

Fibre Optics of the Internet

Page 23: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 23

What is the Grid?

MIDDLEWARE

CPUDisks, CPU etc

PROGRAMS

OPERATING SYSTEM

Word/Excel

Email/Web

Your Progra

mGames

CPUCluste

r

UserInterfac

eMachine

CPUCluste

r

CPUCluste

r

Resource Broker

Information Service

Single PC

Grid

DiskCluste

r

Your Progra

m

Middleware is the Operating System of a distributed computing system

Page 24: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 24

What is the Grid?

From this:

To this:

Page 25: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 25

SETI@home

• A distributed computing project - not really a Grid project

• You pull the data from them rather than they submit the job to you

Arecibo telescope in Puerto Rico

Users - 5,240,038

Results received – 1,632,106,991

Years of CPU Time – 2,121,057

Extraterrestrials found – 0

Page 26: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 26

Entropia

Uses idle cycles on Home PCs for profit and non-profit projects:

FightAIDS@Home• 60,000 Machines• 1,400 years of cpu time

Rebranding!

Page 27: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 27

GridPP

19 UK Universities, CCLRC (RAL & Daresbury) and CERN

Funded by the Particle Physics and Astronomy Research Council (PPARC)

GridPP1 - 2001-2004 £17m "From Web to Grid"

GridPP2 - 2004-2007 £15m "From Prototype to Production"

Page 28: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 28

International Collaboration

• EU DataGrid (EDG) 2001-2004– Middleware Development

Project

• US and other Grid projects– Interoperability

• LHC Computing Grid (LCG)– Grid Deployment Project for

LHC • EU Enabling Grids for e-Science

in Europe (EGEE) 2004-2006– Grid Deployment Project for all

disciplines

Page 29: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 29

Application Development

ATLAS LHCb CMS

BaBar (SLAC) SAMGrid (FermiLab) QCDGrid

Page 30: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 30

Middleware Development

Configuration Management

Storage Interfaces

Network Monitoring

Security

Information Services

Grid Data Management

Page 31: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 31

Tier Structure

'Tier-0' – where the data comes from

'Tier-1' – major centres in large countries

'Tier-2' – smaller centres in large countries or smaller countries

UK Tier-1

US Tier-1

Italy Tier-1

Germany Tier-

1

France

Tier-1

Spain Tier-2

Poland

Tier-2

. . . Tier-2

. . . Tier-1

UK Tier-2

UK Tier-2

UK Tier-2

UK Tier-2

CERN Tier-0

Tier structure not necessarily appropriate for all disciplines

Page 32: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 32

UK Tier-1/A Centre

• High quality data services• National and International

Role• UK focus for International Grid

development

•700 Dual CPU•80 TB Disk•60 TB Tape (Capacity 1PB)Grid Operations

Centre

Page 33: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 33

UK Tier-2 Centres

ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield

SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick

LondonBrunel, Imperial, QMUL, RHUL, UCL

Mostly funded by HEFCE

Page 34: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 34

The Grid at QM

The Queen Mary e-Science High Throughput Cluster

174 PCs (348 CPUs)40 TByte Disk Storage

Part of the London Tier-2 Centre

Page 35: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 35

The LCG Grid

89 Sites

9,056 CPUs

3 PBytes Disk

Page 36: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 36

Grid Snapshot

Page 37: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 37

Challenges

(Ex-)Concorde(15 km)

CD stack with1 year LHC data(~ 20 km)

We are here(1 km)

• Scaling to full size ~10,000 → 100,000 CPUs

• Stability, Robustness etc• Security (Hackers Paradise!)• Sharing resources (in RAE

environment!)• International Collaboration• Continued funding beyond

start of LHC!

Page 38: Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Steve Lloyd Inaugural Lecture - 24 November 2004 Slide 38

Further Info

http://www.gridpp.ac.uk