UKI-SouthGrid Overview and Oxford Status Report
Pete GronbechSouthGrid Technical Coordinator
HEPSYSMAN – RAL10th June 2010
SouthGrid June 20102
UK Tier 2 reported CPU
– Historical View to present
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
K SPEC int 2000 hours
UK-London-Tier2
UK-NorthGrid
UK-ScotGrid
UK-SouthGrid
SouthGrid June 20103
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
May-09
Jun-09 Jul-09 Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10 Feb-10
Mar-10
Apr-10 May-10
Jun-10
K SPEC int 2000 hours
JET
BHAM
BRIS
CAM
OX
RALPPD
SouthGrid SitesAccounting as reported by
APEL
Sites Upgrading to SL5 and recalibration of published SI2K values
RALPP seem low, even after my compensation for publishing 1000 instead of 2500
SouthGrid June 20104
SouthGrid June 20105
Site Resourcesat Q409
HEPSPEC06
CPU (kSI2K) converted from
HEPSPEC06 benchmarks Storage (TB)
1772 442 1.5
3344 836 114
1836 459 110
1772 443 140
3332 833 160
12928 3232 633
0
24984 6246 1158.5
Site
EDFA-JET
Birmingham
Bristol
Cambridge
Oxford
RALPPD
Totals
SouthGrid June 20106
Resources at Q110
SouthGrid June 20107
Gridpp3 h/w generated MoU for 2010,11,12
2010 TB 2011 TB 2012 TB
bham 179 95 124
bris 22 27 35
cam 108 135 174
ox 203 255 328
RALPPD 364 440 583
2010 HS06 2011 HS06 2012 HS06
bham 1450 2,119 2724
bris 661 1,173 1429
cam 1148 1,445 1738
ox 2034 2,483 2974
RALPPD 6499 13109 16515
SouthGrid June 20108
Oxford
• Preparing tender to purchase h/w with the 2nd tranche of gridpp3 money
• ATLAS pool accounts on the DPM problem, worked for some not for others. Increased the number fixed it.
• Ewan working on KVM and iSCSI for VMs. SL5 has Virt IO support as both host and guest. Shared storage will give us VMware style live migration for free. No VMware tools hassle.
SouthGrid June 20109
Grid Cluster setup SL5 Worker Nodes
T2ce04
LCG-ce
T2ce05
LCG-ce
t2torque02
T2wn40T2wn5x
T2wn6xT2wn7xT2wn8xT2wn85
Glite 3.2 SL5
Oxford
SouthGrid June 201010
Grid Cluster setup CREAM ce & pilot setup
t2ce02
CREAM
Glite 3.2 SL5
T2wn41glexec
enabled
t2scas01
t2ce06
CREAM
Glite 3.2 SL5
T2wn40 -87
Oxford
SouthGrid June 201011
Grid Cluster setup NGS integration setup
ngsce-test.oerc.ox.ac.uk
ngs.oerc.ox.ac.uk
wn40wn5x
wn6xwn7xwn8x
Oxford
ngsce-test is a Virtual Machine which has glite ce software installed.
The glite WN software is installed via a tar ball in an NFS shared area visible to all the WN’s.
PBSpro logs are rsync’ed to ngsce-test to allow the APEL accounting to match which PBS jobs were grid jobs.
Contributed 1.2% of Oxfords total work during Q1
SouthGrid June 201012
Oxford Tier-2 Cluster – Jan 2009
located at Begbroke. Tendering for upgrade.
Decommissioned January 2009
Saving approx 6.6KW
Originally installed April 04
1717thth November November 2008 Upgrade2008 Upgrade
26 servers = 208 Job Slots
60TB Disk
22 Servers = 176 Job Slots 100TB Disk Storage
SouthGrid June 201013
Grid Cluster Network setup
3com 5500
T2se0n – 20TB
Disk Pool Node
Worker Node
3com 5500
3com 5500
Backplane Stacking Cables 96Gbps full duplex
T2se0n – 20TB
Disk Pool NodeT2se0n – 20TB
Disk Pool Node
Dual Channel bonded 1 Gbps links to the storage nodes
Oxford
10 gigabit too expensive, so will maintain 1gigabit per ~10TB ratio with channel bonding in the new tender
SouthGrid June 201014
Tender for new Kit
• Storage likely to be 36 bay *2TB supermicro servers• Compute node options based twin squared supermicro
with– AMD 8 core Best Value for money– AMD 12core– Intel 4 core– Intel 6 core 3GB RAM per coreDual SAS local disks
• APC racks, PDUs and UPSs• 3COM 5500G network switches to extend our existing
infrastructure
SouthGrid June 201015
Atlas Monitoring
SouthGrid June 201016
PBSWEBMON
SouthGrid June 201017
Ganglia
SouthGrid June 201018
Command Line
• showq | more• pbsnodes –l• qstat –an• ont2wns df –hl
SouthGrid June 201019
Local Campus Network Monitoring
SouthGrid June 201020
Gridmon
SouthGrid June 201021
Patch levels – Pakitiv1 vs v2
SouthGrid June 201022
Monitoring tools etc
Site Pakiti Ganglia Pbswebmon Scas,glexec, argus
JET No Yes No No
Bham Yes Yes No No,yes,yes
Brist Yes, v1 Yes No No
Cam No Yes No No
Ox V1 production, v2 test
Yes Yes Yes, yes, no
RALPP V1 Yes No No (but started on scas)
SouthGrid June 201023
Conclusions
• SouthGrid sites utilisation improving• Many had recent upgrades others putting out tenders• Will be purchasing new hardware in gridpp3 second
tranche• Monitoring for production running improving
Top Related