GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside.

21
GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside

Transcript of GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside.

GridPP3 project status

Sarah Pearce 24 April 2010

GridPP25 Ambleside

2GridPP24, Ambleside

Skiddaw

• The 4th highest mountain in England (or the 3rd, depending)

• The “simplest of the mountains of this height to ascend”

• A well trodden tourist track

• The first summit of the ‘Bob Graham Round’ fell running challenge

• The view from the top is ‘panoramic’

24/8/10

3GridPP24, Ambleside

Since the last meeting

• LHC continues to take data – see Pete’s talk• EGEE finished and EGI started – see Jeremy and

Andy’s discussion• Tier-1 running well• Tier-2s procuring more equipment from the 2nd

round of hardware grants• GridPP4 proposal reviewed and accepted – see

Dave’s talk

24/8/10

4GridPP24, Ambleside

Tier-1

• CPU hardware delivered and commissioned in time to meet WLCG pledge

• One tranche of disk delivery still going through acceptance

• Procurements for next round of CPU and disk have started

• Testing for upgrade to CASTOR 2.1.9 (from 2.1.7)• Operations very stable

24/8/10

5GridPP24, Ambleside

Tier-2s

• RHUL cluster successfully running in new RHUL machine room

• UCL-Central removed from list of UK sites• All grants for 2nd tranche of hardware issued: sites

procuring hardware to meet 2010 pledge.• Several sites made significant upgrades, including:

– Sheffield (inc. air con/ temperature monitoring equipment)– Lancaster kit for new machine room– Cambridge increased disk and CPU– IC moved site outside firewall – x2 improvement in

performance• Some issues with staffing (Durham, likely at Bristol)

– Discussion today at PMB/ DB on how to cover sites with small amounts (or no) dedicated staff

24/8/10

6GridPP24, Ambleside

EGI, EGI-Inspire etc.

• EGI started operations on 1 May 2010– Governed by EGI Council – Executive Board reports to Council –

Neil Geddes elected member of the EB– Key staff now in Amsterdam (except

Neasan)– First Technical Forum will Sept 14-17 in

Amsterdam

• EGI-InSPIRE also started– Grant Agreement with EC not signed

yet – so no money so far

• e-ScienceTalk will start 1 September– funds UK staff at IC and QMUL

24/4/10

7GridPP24, Ambleside

UKI CPU contribution (LHC)

CPU August 2010 – GStat2.0

24/8/10

Since April 2010

Country stats

8GridPP24, Ambleside

UKI VOs

24/8/10

Since April 2010

Previous year

9GridPP24, Ambleside

UKI Tier-1 & Tier-2 contributions

24/8/10

Since April 2010

Previous year

10GridPP24, Ambleside

Storage

• From GStat (and previous talks…)

September 2008 March 2009 September 2009

April 2010

24/8/10

• From GStat2.0 (today)

11GridPP24, Ambleside

ProjectMap Q210

24/8/10

12GridPP24, Ambleside

Project map - statistics

Metrics Milestones

24/8/10

13GridPP24, Ambleside

Experiments

ATLAS

• T1 data acceptance from CERN, T1s and T2s up from 79% to 96%

• Data availability in T2 storage is green, but this hides quite significant SE issues at some sites

LHCb

• Sharp drop in the proportion of production computing taking place in the UK, from 28% to 16% - early user jobs at CERN

• Issue with data transfer from the T2s to RAL (1.2.5)

• Ganga milestone delayed (Integrate XML job summary from Dirac into Ganga) due to setting up new DAST

CMS

• Some data loss at T1 and T2 but not considered significant by CMS

• Going well – CMS recognises the UK’s contribution

Other experiments

• MINOS, D0 and Babar mainly this quarter

• Red milestones for experiment satisfaction/user support questionnaire – waiting on ATLAS reply

24/8/10

14GridPP24, Ambleside

Grid services

Operations

• 2.1.3 Fraction job slots used (Target 80%, achieved 37%). Overall occupancy low this quarter.

Security

• No incidents this quarter

Networking

• No red metrics. Second (resilient) OPN link from RAL is operational

Data and storage

• Record FTS transfer rates (2.4.4), with an average over 370 MB/s sustained over the whole quarter

• Still questions over published storage values

24/8/10

15GridPP24, Ambleside

Tier-1

• T1 operating extremely well. Nearly all metrics for front-end systems at 100%.

• CASTOR SAM tests at 100% for the first time (3.4.8)

• Red metrics for farm occupancy (43%, against a target of 80%, 3.2.11)

• Red milestone for 2009 disk hardware accepted. One tranche of disk capacity failed acceptance – firmware fix and running again.

• Red milestone on moving out of Atlas centre – revised and will be met next quarter

24/8/10

16GridPP24, Ambleside

Tier-2s

• % of promised CPU available – green for all Tier-2s (metric 2). % of disk red for NorthGrid, but procurements underway. Next quarter will be measured against 2010 pledge.

• SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4). Range of issues at SouthGrid sites.

• Other red metrics:

• CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid – but generally low

• Number of management meetings NorthGrid (metric 11)

• Staff changes at several sites (Durham, Glasgow, Manchester, QMUL)

24/8/10

17GridPP24, Ambleside

Management and external

Project execution – red metrics• All quarterly reports in by target time

(though some earlier than others…)• Red metric for no. of UB meetings

Rest of Map• No red metrics• EGEE/EGI metrics being revised to reflect

EGI start

24/8/10

18GridPP24, Ambleside

Risk register

24/8/10

• 3 high level risks– Recruitment and retention – more of an issue as we get closer to

GridPP4– Sudden loss of key staff – as above– Uncertain long term funding. GridPP4 approved, but government

funding an issue everywhere

19GridPP24, Ambleside

Finances - summary

24/8/10

20GridPP24, Ambleside

Finances

• Substantial reduction in the Tier-1 FY10 hardware line – STFC requested reduced capital spend of £1.1m– New experiment resource requirements from C-RRB in

April 2010. Overall (to 2015) reduction in disk and CPU but increase in custodial storage.

• Second tranche of Tier-2 hardware grants all issued• Bridging posts for EGEE-funded staff• Travel costs £173k for 09/10 – within budget• Small amount of funding for R-GMA over 6 months

24/8/10

21GridPP24, Ambleside

And the view is…

24/8/10

Panoramic?