Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

18
Tier1A Status Andrew Sansum 30 January 2003

Transcript of Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Page 1: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Tier1A Status

Andrew Sansum30 January 2003

Page 2: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Overview

• Systems• Staff• Projects

Page 3: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Lots of Services

DISK FARM

CPU FARM CDF Babar SunsTESTBEDS

Core Services

AFSDatastore

Support Systems

Page 4: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Lots of Operating Systems

• Production Farm– Redhat 6.2 (Close to end of life)– Redhat 7.2 (In production/ Babar)– Redhat 7.3 (close to Trial Service: For LHC)

• CDF Service– Redhat 7.1 (Kerberised Fermi Distribution)– Redhat 7.3 (Possible Future release)

• Solaris Service– Solaris 2.6/Solaris 8

• EDG Testbed(s) - Redhat 6.2 -> Redhat 7.3

Page 5: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Lots of EDG Testbeds!

• Production Testbed (CE, SE, 3*WN+NM)• Development Testbed (CE, SE, 1*WN)• RGMA Testbed (CE, SE, WN and RB)• WP5 SE • WP3/WP5 development systems• EDG UI• CE for REDhat 7.2 service

Page 6: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Lots of Grid Testbeds!

Tier1A

Babar

Page 7: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

New Hardware

• Disk– Expect 40TB– Continue with existing IDE technology,

but different manufacturer.

• CPU– Expect 100 CPUs– Move to Pentium 4 or possible AMD

Page 8: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Some New Staff

GridPP Staff: Traylen, Radden, Bly

ESC/PPD System Staff: Wheeler, White, Sansum, Saunders, Ross, Folkes, Strong

Management: Kelsey, Gordon, Sansum, ...

BITD Support: Networking, Operations, User Reg, AFS

Experiment Support Staff (RAL and elsewhere)

Users

Page 9: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Lots of New Projects

• Basic fabric performance monitoring (ganglia)• Resource CPU accounting (based on PBS

accounts/mysql)• New CA in production• New batch scheduler (MAUI)• Deploy new helpdesk (end March)• Network Performance tests (CERN/Bristol - also

maybe WP7)• Get ready for LCG (February deployment?)

Page 10: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Ganglia Monitoring

• Urgently needed live performance and utilisation monitoring– RAL Ganglia Monitoring (live)– RAL Ganglia Monitoring (Static)

• Scalable solution based on multicast• Very rapidly deployable - reasonable

support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/

Page 11: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.
Page 12: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

New CA Deployed

• Now fully deployed by E-Science Centre (Jens+Alastair Mills)

• In use in UK core GRID• Several PP have RA’s defined • Approved by EDG - not yet in

distribution.• Once in EDG - termination date for old

CA will be set.

Page 13: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

New Scheduler (MAUI)

• With Redhat 7.2 now using MAUI Scheduler over PBS

• Some problems with MAUI scheduling on wallclock time - now corrected.

• Testing algorithms, but essentially have a range of strategies we can apply.

• Will make changes to queue structure in due course

Page 14: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

New Helpdesk Software

• Old helpdesk (Remedy) - mail based, unfriendly.

• With additional staff, urgently need to deploy new solution.

• Expect new system to be based on free software (Bugzilla, Request Tracker …)

• Hope that deployed system will also meet needs of Testbed and Tier 2 sites.

• Expect deployment by end of March.

Page 15: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Network Performance Tests

• Simon Metson, Nick White, +….• Preparing for CMS production. Must be

able to move data to CERN at 100-200Mbit/second.

• Currently aggregate 350Mbit/s to Bristol - but under 100Mbit/s to CERN.

• Main problem seems to be within CMS infrastructure

Page 16: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

BaBar Batch CPU Use at RAL

0

20,000

40,000

60,000

80,000

100,000

120,000

Week Beginning

BaB

ar C

PU

Ho

urs

per

Wee

k(N

orm

alis

ed

to

P4

50

)

SPUK UsersNon-UK Users

Full usage at full efficiency of BaBar CPUs = 106,624 Hours/Week; 59,733 according to MOU

MOU

Page 17: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Successes (2002)

• Five additional staff online since January 2002.

• Fully engaged in EDG testbed. Making an impact in EDG: Steve

• Tier1A installation went very well in March/April/May

• Tier A service ramp up excellent: – Most successful of the Tier A services. SLAC

seem pleased - so far.

Page 18: Tier1A Status Andrew Sansum 30 January 2003. Overview Systems Staff Projects.

Challenges

• Complete 2002/2003 tender/deployment• Carry out major EU tenders for 2003/2004• Expand use of Tier 1• Need to evolve strategy to cope with

diversity of requirements• Deploy the LCG Testbed (What/When?)• Enhance automation / out of hours cover• Improve reporting to GridPP - accountability