Tony Doyle [email protected] GridPP – From Prototype To Production, HEPiX Meeting,...

30
Tony Doyle [email protected]. ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004

Transcript of Tony Doyle [email protected] GridPP – From Prototype To Production, HEPiX Meeting,...

Page 1: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony [email protected]

GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004

Page 2: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

OutlineOutline

• GridPP Project• Introduction• UK Context• Components:

A. ManagementB. MiddlewareC. ApplicationsD. Tier-2E. Tier-1F. Tier-0

• Challenges:– Middleware Validation– Improving Efficiency– Meeting Experiment Requirements– ..Via The Grid?– Work Group Computing– Events.. To Files.. To Events– Software Distribution– Distributed Analysis

• Historical Perspective

• What is the Grid Anyway?

• Is GridPP a Grid?• Summary

Page 3: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

GridPP – A UK Computing Grid for GridPP – A UK Computing Grid for Particle PhysicsParticle Physics

GridPP

19 UK Universities, CCLRC (RAL & Daresbury) and CERN

Funded by the Particle Physics and Astronomy Research Council (PPARC)

GridPP1 - Sept. 2001-2004 £17m "From Web to Grid"

GridPP2 – Sept. 2004-2007 £16(+1)m "From Prototype to Production"

Page 4: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

UK Core e-Science

Programme

Institutes

Tier-2 Centres

CERNLCG

EGEE

GridPP

GridPP in ContextGridPP in Context

Tier-1/A

Middleware, Security,

Networking

Experiments

GridSupportCentre

Not to scale!

Apps Dev

AppsInt

GridPP

Page 5: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

GridPP1 ComponentsGridPP1 Components

6/Feb/2004

£3.57m

£5.67m

£3.74m

£2.08m£1.84m

CERN

DataGrid

Tier - 1/A

ApplicationsOperations

LHC Computing Grid Project (LCG)Applications, Fabrics, Technology and Deployment

European DataGrid (EDG)Middleware Development

UK Tier-1/A Regional CentreHardware and Manpower

Grid Application DevelopmentLHC and US Experiments + Lattice QCD

Management Travel etc

Page 6: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

May 2004

£0.75m

£2.62m

£3.02m

£0.88m

£0.69m

£2.75m

£2.79m

£1.00m

£2.40m

Tier-1/AHardware

Tier-2Operations

Applications

M/S/N

LCG-2

MgrTravel

Ops

Tier-1/AOperations

GridPP2 ComponentsGridPP2 Components

C. Grid Application DevelopmentLHC and US Experiments + Lattice QCD + Phenomenology

B. Middleware Security NetworkDevelopment

F. LHC Computing Grid Project (LCG Phase 2) [review]

E. Tier-1/A Deployment:Hardware, System Management, Experiment Support

A. Management, Travel, Operations

D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management

Page 7: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

A. GridPP ManagementA. GridPP Management

Collaboration Board

Project ManagementBoard

Project Leader

Project Manager

Technical (Deployment)

Board

Experiments (User)Board

(Production Manager)

(Dissemination Officer)

GGF, LCG, EDG (EGEE), UK e-

Science, Liaison

GridPP1 (GridPP2)

Project Map

Risk Register

Page 8: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

In LCG In LCG ContextContext

A. Management A. Management StructureStructure

ARDA

Ex

pm

tsEG

EE LCG

Deployment Board

Tier1/Tier2,Testbeds,

Rollout

Servicespecification& provision

User Board

Requirements

ApplicationDevelopment

Userfeedback

Metadata

Workload

Network

Security

Info. Mon.

PMB

CB

Storage

Page 9: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

ARDA

Expmts

EGEE

LCG

Dep

loym

ent

Bo

ard

Tie

r1/T

ier2

,T

estb

eds,

Ro

llou

t

Ser

vice

spec

ific

atio

n&

pro

visi

on

Use

r B

oar

d

Req

uir

emen

ts

Ap

plic

atio

nD

evel

op

men

t

Use

rfe

edb

ack

Met

adat

a

Wo

rklo

ad

Net

wo

rk

Sec

uri

ty

Info

. M

on

.

PM

B

Sto

rag

e

III. Grid Middleware

I. Experiment Layer

II. Application Middleware

IV. Facilities and Fabrics

UserBoard

DeploymentBoard

GridPP2 Project GridPP2 Project Managing the MiddlewareManaging the Middleware

B. Middleware, Security and B. Middleware, Security and Network Development Network Development

Page 10: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

B. Middleware, Security and B. Middleware, Security and Network Development Network Development

M/S/N builds upon UK strengths as part of International development

Configuration Management

Storage Interfaces

Network Monitoring

Security

Information Services

Grid Data Management

SecurityMiddleware

Networking

Page 11: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

C. Application DevelopmentC. Application Development

Fabric

TapeStorage

Elements

RequestFormulator and

Planner

Client Applications

ComputeElements

Indicates component that w ill be replaced

DiskStorage

Elements

LANs andWANs

Resource andServices Catalog

ReplicaCatalog

Meta-dataCatalog

Authentication and SecurityGSISAM-specific user, group , node, st at ion regis tration B bftp ‘cookie’

Connectivity and Resource

CORBA UDP File transfer protocol s - ftp, b bftp, rcp GridFTP

Mass Storage s ystems protocol se.g. encp, hp ss

Collective Services

C atalogproto co ls

Signi fi cant Event Log ger Naming Service Database ManagerC atalog Manager

SAM R es ource M an ag em entB atch Sys tems - LSF, FB S, PB S,

C ondorData Mov erJob Services

Storage ManagerJob ManagerCache ManagerRequest Manager

“Dataset Editor” “File Storage Server”“Project Master” “Station M aster” “Station M aster”

Web Python codes, Java codesCom mand line D0 Fram ework C++ codes

“Stager”“Optim iser”

CodeRepostory

Name in “quotes” is SAM-given software component name

or addedenhanced using PPDG and Grid tools

GANGA

SAMGridLattice QCD

AliEn → ARDA

CMS

BaBar

Page 12: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

D. UK Tier-2 CentresD. UK Tier-2 Centres

NorthGrid ****Daresbury, Lancaster, Liverpool,Manchester, Sheffield

SouthGrid *Birmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick

ScotGrid *Durham, Edinburgh, Glasgow

LondonGrid ***Brunel, Imperial, QMUL, RHUL, UCL

Current UK Status:10 Sites via LCG

Page 13: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

D. The UK Testbed: D. The UK Testbed: Hidden SectorHidden Sector

Page 14: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

E. The UK Tier-1/A CentreE. The UK Tier-1/A Centre

• High quality data services• National and International Role• UK focus for International Grid

development

LHCb

ATLAS

CMS

BaBar

April 2004:• 700 Dual CPU• 80TB Disk• 60TB Tape (Capacity 1PB)

Grid Operations Centre

Page 15: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

Real Time Grid MonitoringReal Time Grid Monitoring

LCG224 May

2004

Page 16: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

E. Grid OperationsE. Grid Operations

• Grid Operations Centre– Core Operational Tasks – Monitor infrastructure, components and

services– Troubleshooting– Verification of new sites joining Grid– Acceptance tests of new middleware

releases– Verify suppliers are meeting SLA– Performance tuning and optimisation– Publishing use figures and accounts– Grid information services – Monitoring services – Resource brokering – Allocation and scheduling services – Replica data catalogues – Authorisation services – Accounting services

• Grid Support Centre– Core Support Tasks – Running UK Certificate Authority

Page 17: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

F. Tier 0 and LCG: F. Tier 0 and LCG: Foundation ProgrammeFoundation Programme

• Aim: build upon Phase 1

• Ensure development programmes are linked

• Project management:

GridPP LCG

• Shared expertise:

• LCG establishes the global computing infrastructure

• Allows all participating physicists to exploit LHC data

• Earmarked UK funding to be reviewed in Autumn 2004

Required Foundation: LCG Fabric, Technology and Deployment

Page 18: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

Ta

gg

ed

re

lea

se s

ele

cte

d f

or

cert

ifica

tion

Ce

rtifi

ed

re

lea

se s

ele

cte

d f

or

de

plo

yme

nt

Ta

gg

ed

pa

cka

ge

Problem reports

add unittested code to

repository

Run nightly build

& auto. testsGrid certification

Fix problemsApplication Certification

BuildSystem

CertificationTestbed ~40CPU

ApplicationTestbed ~1000CPU

Certified publicrelease

for use by apps.

24x7

Build system

Test Group

WPs

Unit Test Build Certification Production

Users

DevelopmentTestbed ~15CPU

Individual WP tests

IntegrationTeam

Integration

Overall release

tests

Releases candidate

Tagged Releases

Releases candidate

Certified Releases

Apps. Representatives

Process to:Test frameworksTest supportTest policiesTest documentationTest platforms/compilers

The Challenges Ahead I: The Challenges Ahead I: Implementing the Validation ProcessImplementing the Validation Process

Page 19: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

The Challenges Ahead II: The Challenges Ahead II: Improving Grid “Efficiency”Improving Grid “Efficiency”

Efficiency (Successful Jobs / Jobs submitted)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

De

c-0

2

Jan

-03

Fe

b-0

3

Ma

r-0

3

Ap

r-0

3

Ma

y-0

3

Jun

-03

Jul-

03

Au

g-0

3

Se

p-0

3

Oct

-03

No

v-0

3

De

c-0

3

Jan

-04

Fe

b-0

4Su

cc

es

sfu

l Jo

bs

/ J

ob

s s

ub

mit

ed

CMS EDGv1.4 Altlas EDGv1.4 LHCb EDGv1.4 LCG1 (EDG v2.0) EDG appl. TB v2.x

Page 20: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

The Challenges Ahead III: The Challenges Ahead III: Meeting Meeting Experiment Requirements (UK)Experiment Requirements (UK)

CPU

0

2000

4000

6000

8000

10000

12000

2004 2005 2006 2007

Year

kS

I20

00

ye

ar

ATLAS

CMS

LHCb

ALICE

Phenomenology

ZEUS

UKQCD

UKDMC

MINOS

MICE

LISA

D0

CDF

BaBar

ANTARES

LHC

NonLHC

Disk

0

500

1000

1500

2000

2500

2004 2005 2006 2007

Year

TB

ATLASCMSLHCbALICEPhenomenologyUKQCDUKDMCMINOSMICED0CRESSTCDFBaBarANTARES

LHC

NonLHC

Total Requirement:

Year 2004 2005 2006 2007

CPU [kSI2000] 2395 4066 6380 9965

Disk [TB] 369 735 1424 2285

Tape [TB] 376 752 1542 2623

In International Context -Q2 2004 LCGResources:

Page 21: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

Dynamic Grid Optimisation over JANET

Network

2004 2007 ~7,000 1GHz CPUs ~30,000 1GHz CPUs ~400 TB disk ~2200 TB disk

(note x2 scale change)

The Challenges Ahead IV: The Challenges Ahead IV: Using (Anticipated) Grid ResourcesUsing (Anticipated) Grid Resources

Page 22: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

The Challenges Ahead V: The Challenges Ahead V: Work Group ComputingWork Group Computing

Page 23: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

The Challenges Ahead VI:The Challenges Ahead VI:Events.. to Files.. to EventsEvents.. to Files.. to Events

RAWRAW

ESDESD

AODAOD

TAGTAG

““Interesting Events List” Interesting Events List”

RAWRAW

ESDESD

AODAOD

TAGTAG

RAWRAW

ESDESD

AODAOD

TAGTAG

Tier-0Tier-0(International)(International)

Tier-1Tier-1(National)(National)

Tier-2Tier-2(Regional)(Regional)

Tier-3Tier-3(Local)(Local)

DataFiles

DataFiles

DataFiles

TAGData

DataFilesData

FilesDataFiles

RAWDataFile

DataFilesData

FilesESDData

DataFilesData

FilesAODData

Event 1 Event 2 Event 3

• VOMS-enhanced Grid certificates to access databases via metadata

• Non-Trivial..

Page 24: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

The Challenges Ahead VII:The Challenges Ahead VII:software distributionsoftware distribution

• ATLAS Data Challenge (DC2) this year to validate world-wide computing model

• Packaging, distribution and installation: Scale:one release build takes 10 hours produces 2.5 GB of files

• Complexity: 500 packages, Mloc, 100s of developers and 1000s of users– ATLAS collaboration

is widely distributed:140 institutes, all wanting to use the software

– needs ‘push-button’ easy installation..

Physics Models

Monte Carlo Truth DataMonte Carlo Truth Data

MC Raw DataMC Raw Data

Reconstruction

MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags

Detector Simulation

Raw DataRaw Data

Reconstruction

Data Acquisition

Level 3 trigger

Trigger TagsTrigger Tags

Event Summary Data

ESD

Event Summary Data

ESD Event Tags Event Tags

Calibration DataCalibration Data

Run ConditionsRun Conditions

Trigger System

Step 1: Monte Carlo

Data Challenges

Step 1: Monte Carlo

Data Challenges

Step 2: Real DataStep 2: Real Data

Page 25: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

Complex workflow… Complex workflow… LCG/ARDA DevelopmentLCG/ARDA Development

1. AliEn (ALICE Grid) provided a pre-Grid implementation [Perl scripts]

2. ARDA provides a framework for PP application middleware

The Challenges Ahead VIII:The Challenges Ahead VIII:distributed analysisdistributed analysis

Page 26: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

Historical PerspectiveHistorical Perspective

• I wrote in 1990 a program called "WorlDwidEweb", a point and click hypertext editor which ran on the "NeXT" machine. This, together with the first Web server, I released to the High Energy Physics community at first, and to the hypertext and NeXT communities in the summer of 1991.

• Tim Berners-Lee

• The first three years were a phase of persuasion, aided by my colleague and first convert Robert Cailliau, to get the Web adopted…

• We needed seed servers to provide incentive and examples, and all over the world inspired people put up all kinds of things…

• Between the summers of 1991 and 1994, the load on the first Web server ("info.cern.ch") rose steadily by a factor of 10 every year…

Page 27: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

What is The Grid Anyway?

From Particle Physics PerspectiveThe Grid is:

not hype, but surrounded by it

a working prototype running on testbed(s)…

about seamless discovery of PC resources around the world

using evolving standards for interoperation

the basis for particle physics computing in the 21st Century

not (yet) as transparent as end-users want it to be

Page 28: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

What is “The Grid”What is “The Grid” Is GridPP a Grid?Is GridPP a Grid?Anyway?Anyway?

1. Coordinates resources that are not subject to centralized control

2. … using standard, open, general-purpose protocols and interfaces

3. … to deliver nontrivial qualities of service

1. YES. This is why development and maintenance of a UK-EU-US testbed is important

2. YES... Globus/CondorG/EDG meet this requirement. Common experiment application layers are also important here.

3. NO(T YET)… Experiments define whether this is true - currently only ~100,000 jobs submitted via the testbed c.f. internal component tests of up 10,000 jobs per day. Next step: LCG-2 deployment outcome… this year

http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

Page 29: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

GridPP –GridPP – Theory and Experiment Theory and Experiment

• UK GridPP started 1/9/01• EU DataGrid: First

Middleware ~1/9/01 Development requires a testbed with feedback– “Operational Grid”

• Fit into UK e-Science structures

• Experience in distributed computing essential to build and exploit the Grid

Scale in UK? 0.5 PBytes and 2,000 distributed CPUs

GridPP in Sept 2004 • Grid jobs are being submitted

now.. user feedback loop is important..

• All experiments have immediate requirements

• Current Experiment Production: “The Grid” is a small component

• Non-technical issues:– Recognising context– Building upon expertise– Defining roles

– Sharing resources

• Major deployment activity is LCG– We contribute significantly to

LCG and our success depends critically on LCG

• “Production Grid” will be difficult to realise: GridPP2 planning underway as part of LCG/EGEE

• Many Challenges Ahead..

GridPP Summary: GridPP Summary: From Web to GridFrom Web to Grid

Page 30: Tony Doyle a.doyle@physics.gla.ac.uk GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

Tony Doyle - University of Glasgow

GridPP Summary: GridPP Summary: From Prototype to ProductionFrom Prototype to Production

BaBar

D0CDF

ATLAS

CMS

LHCb

ALICE

19 UK Institutes

RAL Computer Centre

CERN ComputerCentre

SAMGrid

BaBarGrid

LCG

EDGGANGA

EGEE

UK PrototypeTier-1/A Centre

CERN PrototypeTier-0 Centre

4 UK Tier-2 Centres

LCG

UK Tier-1/ACentre

CERN Tier-0Centre

200720042001

4 UK Prototype Tier-2 Centres

ARDA

Separate Experiments, Resources, Multiple

Accounts 'One' Production GridPrototype Grids