Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report...

15
3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow GridPP Status Report Tony Doyle

Transcript of Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report...

Page 1: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

GridPP Status Report

Tony Doyle

Page 2: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Contents

• What was GridPP1?

• What is GridPP2?• Challenges abound• LCG

– Issues

• Deployment Status (9-28-30/1/05) – UK Grid

• The UK mountain climb

• Summary

Page 3: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

What was GridPP1?

• A team that built a working prototype grid of significant scale

> 2,000 (9,000) CPUs> 1,000 (5,000) TB of available storage> 1,000 (6,000) simultaneous jobs

• A complex project where 88% of the milestones were completed and all metrics were within specification

1 . 1 2 . 1 3 . 1 4 . 1 5 . 1 6 . 1 7 . 1

1 . 1 . 1 1 . 1 . 2 1 . 1 . 3 1 . 1 . 4 2 . 1 . 1 2 . 1 . 2 2 . 1 . 3 2 . 1 . 4 3 . 1 . 1 3 . 1 . 2 3 . 1 . 3 3 . 1 . 4 4 . 1 . 1 4 . 1 . 2 4 . 1 . 3 4 . 1 . 4 5 . 1 . 1 5 . 1 . 2 5 . 1 . 3 6 . 1 . 1 6 . 1 . 2 6 . 1 . 3 6 . 1 . 4 7 . 1 . 1 7 . 1 . 2 7 . 1 . 3 7 . 1 . 41 . 1 . 5 2 . 1 . 5 2 . 1 . 6 2 . 1 . 7 2 . 1 . 8 3 . 1 . 5 3 . 1 . 6 3 . 1 . 7 3 . 1 . 8 4 . 1 . 5 4 . 1 . 6 4 . 1 . 7 4 . 1 . 8 6 . 1 . 5

2 . 1 . 9 3 . 1 . 9 3 . 1 . 1 0 4 . 1 . 9

1 . 2 2 . 2 3 . 2 4 . 2 5 . 2 6 . 2 7 . 2 1 . 2 . 1 1 . 2 . 2 1 . 2 . 3 1 . 2 . 4 2 . 2 . 1 2 . 2 . 2 2 . 2 . 3 2 . 2 . 4 3 . 2 . 1 3 . 2 . 2 3 . 2 . 3 3 . 2 . 4 4 . 2 . 1 4 . 2 . 2 4 . 2 . 3 4 . 2 . 4 5 . 2 . 1 5 . 2 . 2 5 . 2 . 3 6 . 2 . 1 6 . 2 . 2 6 . 2 . 3 7 . 2 . 1 7 . 2 . 2 7 . 2 . 31 . 2 . 5 1 . 2 . 6 1 . 2 . 7 1 . 2 . 8 2 . 2 . 5 2 . 2 . 6 2 . 2 . 7 3 . 2 . 5 3 . 2 . 6 3 . 2 . 7 3 . 2 . 8 4 . 2 . 5 4 . 2 . 6 4 . 2 . 71 . 2 . 9 1 . 2 . 1 0 3 . 2 . 9

1 . 3 2 . 3 3 . 3 4 . 3 5 . 3 6 . 3 7 . 3

1 . 3 . 1 1 . 3 . 2 1 . 3 . 3 1 . 3 . 4 2 . 3 . 1 2 . 3 . 2 2 . 3 . 3 2 . 3 . 4 3 . 3 . 1 3 . 3 . 2 3 . 3 . 3 3 . 3 . 4 4 . 3 . 1 4 . 3 . 2 4 . 3 . 3 4 . 3 . 4 5 . 3 . 1 5 . 3 . 2 5 . 3 . 3 6 . 3 . 1 6 . 3 . 2 6 . 3 . 3 6 . 3 . 4 7 . 3 . 1 7 . 3 . 2 7 . 3 . 3 7 . 3 . 41 . 3 . 5 1 . 3 . 6 1 . 3 . 7 1 . 3 . 8 2 . 3 . 5 2 . 3 . 6 2 . 3 . 7 3 . 3 . 5 3 . 3 . 6 4 . 3 . 51 . 3 . 9 1 . 3 . 1 0 1 . 3 . 1 1

1 . 4 2 . 4 3 . 4 4 . 4 5 . 4 1 . 4 . 1 1 . 4 . 2 1 . 4 . 3 1 . 4 . 4 2 . 4 . 1 2 . 4 . 2 2 . 4 . 3 2 . 4 . 4 3 . 4 . 1 3 . 4 . 2 3 . 4 . 3 3 . 4 . 4 4 . 4 . 1 4 . 4 . 2 4 . 4 . 3 4 . 4 . 4 5 . 4 . 1 5 . 4 . 2 5 . 4 . 3 5 . 4 . 41 . 4 . 5 1 . 4 . 6 1 . 4 . 7 1 . 4 . 8 2 . 4 . 5 2 . 4 . 6 2 . 4 . 7 3 . 4 . 5 3 . 4 . 6 3 . 4 . 7 3 . 4 . 8 4 . 4 . 5 4 . 4 . 6 5 . 4 . 51 . 4 . 9 3 . 4 . 9 3 . 4 . 1 0 M e t r i c O K 1 . 1 . 1

M e t r i c n o t O K 1 . 1 . 1 1 . 5 2 . 5 3 . 5 4 . 5 T a s k c o m p le t e 1 . 1 . 1

1 . 5 . 1 1 . 5 . 2 1 . 5 . 3 1 . 5 . 4 2 . 5 . 1 2 . 5 . 2 2 . 5 . 3 2 . 5 . 4 3 . 5 . 1 3 . 5 . 2 3 . 5 . 3 3 . 5 . 4 4 . 5 . 1 4 . 5 . 2 4 . 5 . 3 4 . 5 . 4 T a s k o v e r d u e 1 . 1 . 11 . 5 . 5 1 . 5 . 6 1 . 5 . 7 1 . 5 . 8 2 . 5 . 5 2 . 5 . 6 2 . 5 . 7 3 . 5 . 5 3 . 5 . 6 3 . 5 . 7 6 0 d a y s 1 . 1 . 11 . 5 . 9 1 . 5 . 1 0 T a s k n o t d u e s o o n 1 . 1 . 1

N o t A c t i v e 1 . 1 . 1 2 . 6 3 . 6 4 . 6 N o T a s k o r m e t r i c

2 . 6 . 1 2 . 6 . 2 2 . 6 . 3 2 . 6 . 4 3 . 6 . 1 3 . 6 . 2 3 . 6 . 3 3 . 6 . 4 4 . 6 . 1 4 . 6 . 2 4 . 6 . 32 . 6 . 5 2 . 6 . 6 2 . 6 . 7 2 . 6 . 8 3 . 6 . 5 3 . 6 . 6 3 . 6 . 7 3 . 6 . 8 N a v ig a t e u p 2 . 6 . 9 3 . 6 . 9 3 . 6 . 1 0 3 . 6 . 1 1 3 . 6 . 1 2 N a v ig a t e d o w n

E x t e r n a l l i n k 2 . 7 3 . 7 L in k t o g o a l s

2 . 7 . 1 2 . 7 . 2 2 . 7 . 3 2 . 7 . 4 3 . 7 . 1 3 . 7 . 2 3 . 7 . 3 3 . 7 . 42 . 7 . 5 2 . 7 . 6 2 . 7 . 7 2 . 7 . 8 3 . 7 . 5 3 . 7 . 6

2 . 8 3 . 8 2 . 8 . 1 2 . 8 . 2 2 . 8 . 3 2 . 8 . 4 3 . 8 . 1 3 . 8 . 2 3 . 8 . 32 . 8 . 5

W P 8

1 2 3

D e p l o y m e n t

W P 4

W P 5

F a b r i c

T e c h n o l o g y

W P 6

D u e w i t h i n

A T L A S

G r i d P P G o a l

R e s o u r c e sI n t e r o p e r a b i l i t y D i s s e m i n a t i o n

T i e r - 1

T i e r - A

L H C b T i e r - 2

C E R N D a t a G r i d A p p l i c a t i o n s I n f r a s t r u c t u r e

W P 1

W P 2

W P 3

L C G C r e a t i o n

A p p l i c a t i o n s

W P 7

A T L A S / L H C b

C M S

B a B a r

C D F / D O

U K Q C D

O t h e r

D a t a C h a l l e n g e s

R o l l o u t

T e s t b e d

1 - J a n - 0 4S t a t u s D a t e

I n t . S t a n d a r d s

O p e n S o u r c e

W o r l d w i d e I n t e g r a t i o n

U K I n t e g r a t i o n

M o n i t o r i n g

D e v e l o p i n gE n g a g e m e n t

P a r t i c i p a t i o n

T o d e v e l o p a n d d e p l o y a l a r g e s c a l e s c i e n c e G r i di n t h e U K f o r t h e u s e o f t h e P a r t i c l e P h y s i c s c o m m u n i t y

P r e s e n t a t i o n D e p l o y m e n t

5 6 74

U p d a t e

C l e a r

A Success

“The achievement of

something desired, planned, or

attempted”

Page 4: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Executive Summary I

• “The GridPP1 Project is now complete: following 3 years of development, a prototype Grid has been established, meeting the requirements of the experiments and fully integrated with LCG, currently the World’s largest Grid. Starting from this strong foundation, a more complex project, GridPP2, has now started, with an extended team in the UK working towards a production Grid deployed for the benefit of all experiments by September 2007.”

• We achieved (almost exactly) what we stated we would do in building a prototype…

Page 5: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Executive Summary II

• “2004 was a pivotal year, marked by extraordinary and rapid change with respect to Grid deployment, in terms of scale and throughput. The scale of the Grid in the UK is more than 2000 CPUs and 1PB of disk storage (from a total of 9,000 CPUs and over 5PB internationally), providing a significant fraction of the total resources required by 2007. A peak load of almost 6,000 simultaneous jobs in August, with individual Resource Brokers able to handle up to 1,000 simultaneous jobs, gives confidence that the system should be able to scale up to the required 100,000 CPUs by 2007. A careful choice of sites leads to acceptable (>90%) throughput for the experiments, but the inherent complexity of the system is apparent and many operational improvements are required to establish and maintain a production Grid of the required scale. Numerous issues have been identified that are now being addressed as part of GridPP2 planning in order to establish the required resource for particle physics computing in the UK.”

• Most projects fail in going from prototype to production…

• There are many issues: methodical approach reqd.

Page 6: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

What is GridPP2?

0. Production Grid

1. 1 2.1 3.1 4.1 5.1 6.1

1. 2 2.2 3.2 4.2 5.2 6.2

1. 3 2.3 3.3 4.3 6.3

1. 4 2.4 3.4 4.4 6.4

2.5 3.5 4.5

Navigate down External link Link to goals

2.6 3.6 4.6

Network

Management

& MonitoringInformation PhenoGrid

KnowledgeTransfer

32

Management

Grid Deployment Security CMS UKQCD

Engagement

Grid Technology Workload LHCb D0

Computing Fabric Data & Storage Ganga CDF Deployment

Grid Operations

1 6M/S/N LHC Apps

54

GridPP2 GoalTo develop and deploy a large scale production quality grid in the UK for the use of the Particle Physics community

Tier-A Tier-1 Tier-2 Deployment Middleware Support Experiment Support

Interoperability

ATLAS Dissemination

Management ExternalLCG

Planning

Applications Metadata

Non-LHC Apps

BaBar

SAMGrid

LHC Deployment Portal

Structures agreed and in place (except LCG phase-2)

•253 Milestones, 112 Monitoring Metrics at present.•Must deliver a “Production Grid”: robust, reliable, resilient, secure, stable service delivered to end-user applications. •The Collaboration aims to develop, deploy and operate a very large Production Grid in the UK for use by the worldwide particle physics community.

Page 7: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Must share data between thousands of scientists with multiple interestslink major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2]) computer centresensure all data accessible anywhere, anytimegrow rapidly, yet remain reliable for more than a decadecope with different management policies of different centresensure data securitybe up and running routinely by 2007

What are the Grid challenges?

Page 8: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

What are the Grid challenges?

Data Management, Security and

Sharing

1. Software process2. Software efficiency3. Deployment

planning 4. Link centres

5. Share data

6. Manage data7. Install software8. Analyse data9. Accounting

10. Policies

Page 9: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Where do we start? Issues

https://edms.cern.ch/file/495809/2.2/LCG2-Limitations_and_Requirements.pdf

First large-scale Grid production problems being addressed…at all levels

“LCG-2 MIDDLEWARE

PROBLEMS ANDREQUIREMENTS

FOR LHC EXPERIMENT DATA

CHALLENGES”

Overall efficiency ~60% -> ~90%

¼ -> ½ of the problems

¾ -> ½ of the problems

Page 10: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

GridPP Deployment Status (9-28-30/1/05)

Three Grids on Global scale in HEP (similar functionality)

sites CPUs• LCG (GridPP) 104 (16) 10,000 (2242)• Grid3 [USA] 29 2800• NorduGrid 30 3200

GridPP deployment is part of LCG(Currently the largest Grid in the world)The future Grid in the UK is dependent upon LCG releases

totalCPU

freeCPU

runJob

waitJob

seAvail TB

seUsed TB

maxCPU

avgCPU

Total

2242 915 591 784 936.87 4.45 10648 2232

Page 11: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

UK Grid

The whole is better than the sum of the parts..

Page 12: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

ApplicationsThere is a (slightly wonky?)

wheelUse it to get to where you need

to beZEUS uses LCG•needs the Grid to respond to increasing demand for MC production• up to 6 million Geant events per week on Grid since August 2004

1. The system developed for the large LHC experiments works (more) effectively for other (less resource-intensive) applications

2. Experiments need to work together with deployment team/sites

3. The de-facto deployment standard is LCG – it ~works. We can add components as required, to meet each experiment’s needs

Page 13: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Disseminationmuch has happened..

more people are reading about it..

GridPP2 gets its first term report Fri 28 Jan 2005 BaBar UK moves into the Grid era Tue 11 Jan 2005 LHCb-UK members get up to speed with the Grid Wed 5 Jan 2005

GridPP in Pittsburgh Thu 9 Dec 2004GridPP website busier than ever Mon 6 Dec 2004Optorsim 2.0 released Wed 24 Nov 2004ZEUS produces 5 million Grid events Mon 15 Nov 2004CERN 50th anniversary reception Tue 26 Oct 2004GridPP at CHEP'04 Mon 18 Oct 2004LHCb data challenge first phase a success for LCG and UK Mon 4 Oct 2004Networking in Nottingham - GLIF launch meeting Mon 4 Oct 2004GridPP going for Gold - website award at AHM Mon 6 Sep 2004GridPP at the All Hands Meeting Wed 1 Sep 2004R-GMA included in latest LCG release Wed 18 Aug 2004LCG2 administrators learn tips and tricks in Oxford Tue 27 Jul 2004Take me to your (project) leader Fri 2 Jul 2004ScotGrid's 2nd birthday: ScotGrid clocks up 1 million CPU hours Fri 25 Jun 2004Meet your production manager Fri 18 Jun 2004GridPP10 report and photographs Wed 9 Jun 2004CERN recognizes UK's outstanding contribution to Grid computing Wed 2 Jun 2004UK particle physics Grid takes shape Wed 19 May 2004A new monitoring map for GridPP Mon 10 May 2004Press reaction to EGEE launch Tue 4 May 2004GridPP at the EGEE launch conference Tue 27 Apr 2004LCG2 released Thu 8 Apr 2004University of Warwick joins GridPP Thu 8 Apr 2004Grid computing steps up a gear: the start of EGEE Thu 1 Apr 2004EDG gets glowing final review Mon 22 Mar 2004Grids and Web Services meeting, 23 April, London Tue 16 Mar 2004EU DataGrid Software License approved by OSI Fri 27 Feb 2004GridPP Middleware workshop, March 4-5 2004, UCL Fri 20 Feb 2004Version 1.0 of the Optorsim grid simulation tool released by EU DataGrid Tue 17 Feb 2004Summary and photographs of the 9th GridPP Collaboration Meeting Thu 12 Feb 2004

138,976 hitsin December

Page 14: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Annual data storage:2.4-2.8 PetaBytesper year? (~20%)

10 Million SPECint2000

10,000 PCs (3 GHz Pentium 4)

CD stack(~ 4 km)

The UK mountain climb has started..

Quantitatively, we’re ~10% of the way there in terms of

UK CPU (~2,000 ex ~10,000) and disk (~1 ex ~10 PB)

In production

terms, left base camp

We are here(0.4 km)

step-by-step plan in place…

For the Ben Nevis climb?

0. Production Grid

1. 1 2.1 3.1 4.1 5.1 6.1

1. 2 2.2 3.2 4.2 5.2 6.2

1. 3 2.3 3.3 4.3 6.3

1. 4 2.4 3.4 4.4 6.4

2.5 3.5 4.5

Navigate down External link Link to goals

2.6 3.6 4.6

Network

Management

& MonitoringInformation PhenoGrid

KnowledgeTransfer

32

Management

Grid Deployment Security CMS UKQCD

Engagement

Grid Technology Workload LHCb D0

Computing Fabric Data & Storage Ganga CDF Deployment

Grid Operations

1 6M/S/N LHC Apps

54

GridPP2 GoalTo develop and deploy a large scale production quality grid in the UK for the use of the Particle Physics community

Tier-A Tier-1 Tier-2 Deployment Middleware Support Experiment Support

Interoperability

ATLAS Dissemination

Management ExternalLCG

Planning

Applications Metadata

Non-LHC Apps

BaBar

SAMGrid

LHC Deployment Portal

totalCPU

freeCPU

runJob

waitJob

seAvail TB

seUsed TB

maxCPU

avgCPU

Total

2242 915 591 784 936.87 4.45 10648 2232

Page 15: Tony Doyle - University of Glasgow 3 February 2005Science Committee Meeting GridPP Status Report Tony Doyle.

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

Summary GRIDPP-PMB-40-EXEC

• The Grid is a reality• A project was/is needed • Under control• LCG2 support: SC case presn. 3/2/05• 16 UK sites are on the Grid

– MoUs, planning, deployment, monitoring

– each underway as part of GridPP2• Developments estd., R-GMA deployed• gLite designed inc. web services• Interfaces developed, testing phase• Area transformed• Incorporation in HEP programme..

• Introduction• Project Management• Resources• LCG• Deployment

– Tier-1/A production+ Tier-2 resources

• M/S/N• EGEE• Applications • Dissemination • Beyond GridPP2