INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
-
Upload
bryan-murphy -
Category
Documents
-
view
221 -
download
0
Transcript of INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
INFN – Tier1 Site Status Report
Vladimir Sapunenko on behalf of Tier1 staff
HEPiX, CERN 26-May-08
Overview
Introduction Infrastructural Expansion Farming Network Storage and databases Conclusions
HEPiX, CERN 36-May-08
Introduction Location: INFN - CNAF, Bologna
floor -2 ~ 1000 m2 hall in the basement Multi-Experiment Tier-1 (~20 VOs, including
LHC experiments, CDF, BABAR and others)Participating to LCG, EGEE, INFNGRID
projects One of the main nodes of the GARR network In a nutshell:
about 3 MSI2K with ~2000 CPU/cores, to be expanded ~9 MSI2K by June about 1PB of disk space (tender for further 1.6 PB) 1 PB tape library (additional 10 PB tape library by Q2 2008) Gigabit Ethernet network, 10 Gigabit links with some LAN and WAN
Resources are assigned to experiments on a yearly basis
HEPiX, CERN 46-May-08
Infrastructural Expansion Electrical power and cooling systems expansion, work is in progress right
now
Main constraints: Dimensional - Limited height of the rooms (hmax=260 cm) → no floating floor
Environmental - Noise and electromagnetic insulations due to the office and classroom proximity
Key aspects: reliability, redundancy, maintainability 2 (+1) transformers: 2500 kVA each (~ 2000 kW each) 2 rotating (no-break) electric generators (EG): UPS+EG in one machine to save room
1700 kVA each (~ 1360 kW each) to be integrated with the 1250 kVA EG already installed
2 independent electric lines feeding each row of racks: redundancy of 2n
7 chillers: ~ 2 MW at Tair = 40°C, 2 (+2) pumps for chilled water circulation
High capacity precision air conditioning units (50 kW each) inside the high density islands (APC)
Air treatment and conditioning (30 kW each) units outside the high density islands: UTA and UTL
HEPiX, CERN 56-May-08
room 1(migration,
then storage)
sites for chilled water piping
from/to floor -1 (chillers)
The TIER1 in 2009 (floor -2)
electricpanelboard
room
room 2 (farming)
sites not involved by
the expansion
Remote control of the systems’ critical points
HEPiX, CERN 66-May-08
Farming The farming service maintains all
computing resources and provides access to them. Main aspects are: Automatic unattended installation of all
nodes via Quattor. Advanced job scheduling via LSF
scheduler. Customized monitoring and accounting
system via RedEye, internally developed. Remote control of all computing resources
via KVM, IPMI, plus customized scripts. Resources can be accessed in 2 ways:
GRID: the preferred solution, using a so called “User Interface” node.
Requires to set up a VO. Secure, x509 certificates used for
authentication/authorization. Direct access to LSF: discouraged.
Faster, you simply require a UNIX access on a front-end machine
Limited to Tier1 only, insecure.
HEPiX, CERN 76-May-08
Node installation and configuration Quattor (www.quattor.org) is a CERN developed toolkit for
automatic unattended installation. Kickstart initial installation (RedHat) Quattor takes over after first reboot Node configured according to administrator requirements Very powerful, allows per-node customizations, but can also easily
install 1000 nodes with the same configuration in 1 hour Currently we only support Linux
The HEP community chose Scientific Linux (www.scientificlinux.org) Version currently deployed at CNAF: 4.x Identical to RedHat AS
Good hardware support Big software library available on-line
Supported Grid middleware is gLite 3.1 We install the SL CERN release (www.cern.ch/linux)
Some useful customizations
HEPiX, CERN 86-May-08
Job scheduling Job scheduling is done via the LSF scheduler
848 WNs, 2032 CPUs/Cores, 3728 Slots Queue abstraction:
one job queue per VO is deployed, no time oriented queues. Each experiment submits jobs to its own queue only. Resource utilization limits are set on per queue basis.
Hierarchical Fairshare scheduling is used in order to calculate the priority in resource access All slots are shared, no VO-dedicated resources, all nodes belong to a
single big cluster. One group per VO, subgroups supported. A share (namely a resource quota) is assigned to each group in a
hierarchical way. Priority is directly proportional to the share and inversely proportional
to the historical resource usage. MPI Jobs are supported
HEPiX, CERN 96-May-08
CNAF Tier1 KSpecInt2000 history
0100020003000400050006000700080009000
10000
Mar
-07
May
-07
Jul-0
7
Sep-0
7
Nov-07
Jan-
08
Mar
-08
May
-08
Jul-0
8
Before Migration
848 WNs, 2032
Cpus/Cores, 3728 Slots After Migration
452 WNs, 1380 Cpus/Cores,
2230 Slots
11 twin quadcoreservers added
476 WNs,540 Cpus/Cores
2450 Slots
Expected delivery(from new tender) of 6312.32 KSI2K
Declared/available KSI2K monitoring
HEPiX, CERN 106-May-08
7600
GARR
2x10Gb/s
10Gb
/s
ExtermeBD10808
2x10Gb
/s
LHC Network General Layout
10Gb/s
LHC-OPNdedicated link 10Gb/s
•T1-T1’s (except FZK)•T1-T2’s•CNAF General purpose
FZK
ExtermeBD8810
Worker Nodes
Worker Nodes
2x1Gb/s
2x1Gb/s
Extreme Summit450
Extreme Summit450
4x1Gb
/s
Extreme Summit450
Worker Nodes
4x1Gb/s
2x10
Gb/
s
Extreme Summit400
Storage Servers•Disk Servers•Castor StagersFiber
Channel
Storage Devices
SANExtreme
Summit400
In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s to 10 Gb/s or 2x10Gb/s
FC director
LHC-OPNCNAF-FZK & T0-T1 BACKUP 10Gb/s
WAN
HEPiX, CERN 116-May-08
Storage @ CNAF Implementation of 3 Storage Classes needed for LHC
Disk0 Tape1 (D0T1) CASTOR (testing GPFS/TSM/StoRM) Space managed by system Data migrated to tapes and deleted from disk when staging area full
Disk1 Tape0 (D1T0) GPFS/StoRM Space managed by VO CMS, LHCb, Atlas
Disk1 Tape1 (D1T1) CASTOR (moving to GPFS/TSM/StoRM) Space managed by VO (i.e. if disk is full, copy fails) Large buffer of disk with tape back end and no garbage collector
Deployment of an Oracle database infrastructure for Grid applications back-ends.
Advanced backup service for both disk based and database based data Legato, RMAN, TSM (in the near future).
HEPiX, CERN 126-May-08
~ 40 disk servers attached to a SAN full redundancy FC 2Gb/s or 4Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW or Vendor Specific Software)
CASTOR deployment
STK FlexLine 600, IBM FastT900
• Core services are on machines with SCSI disks, hardware RAID1, redundant power supplies
• tape servers and disk servers have lower level hardware, like WNs
15 tape servers
Sun Blade v100 with 2 internal IDE disks with software RAID1 running ACSLS 7.0 OS Solaris 9.0
• STK L5500 silos (5500 slots, 200GB cartridges, capacity ~1.1 PB )
• 16 tape drives, 3 Oracle databases (DLF, Stager, Nameserver)
• LSF plug-in for scheduling
• SRM v2 (2 front-ends), SRM v1 (phasing out)
SANSAN
HEPiX, CERN 136-May-08
Storage evolution Previous tests demonstrated weakness in the CASTOR
behavior, even if some issues are now solved, we want to investigate and deploy an alternative way of implementing D1T1 and D0T1 storage classes
Great expectations come from the use of TSM together with GPFS and StoRM
Ongoing integration tests GPFS/TSM/StoRM StoRM needs to be modified to support DxT1
Some not trivial modifications for D0T1 required
Short term solution for D1T1 based on customized scripts, has been successfully tested by LHCb
Solution for D0T1 much more complicated, at present under test
HEPiX, CERN 146-May-08
Why SToRM and GPFS/TSM? StoRM is a GRID enabled Storage Resource Manager (SRM v2.2)
allows Grid applications to interact with storage resources through standard POSIX calls.
GPFS 3.2 is the IBM high-performance cluster file system. Greatly reduced administrative overhead Redundancy on the level of IOserver failure HSM support and ILM features in both GPFS and TSM permits
creation of very efficient solution. GPFS in particular demonstrated robustness and high performances
GPFS showed better performance in SAN environment, as confronted to CASTOR, dCache and Xrootd solutions
Long experience at CNAF (> 3 years), ~ 27 GPFS file systems in production at CNAF (~ 720 net TB) mounted on all farm WNs
TSM is a High Performance Backup/Archiving solution from IBM TSM 5.5 implements HSM used also in HEP world (e.g. FZK, NDGF, CERN for backup)
HEPiX, CERN 156-May-08
GPFS deployment evolution Started from a single
cluster:all WNs and IO nodes in one cluster
Some manageability problem has been observed
Separated cluster of servers from one of WNs
Access to Remote cluster FS has proven to be as efficient as the local one
Decided to separate also cluster with HSM backend
HEPiX, CERN 166-May-08
Oracle Database Service Main goals: high availability, scalability, reliability Achieved through a modular architecture based on the following building
blocks: Oracle ASM for storage management implementation of redundancy
and striping in an Oracle oriented way Oracle Real Application Cluster (RAC) the database is shared across
several nodes with failover and load balancing capabilities Oracle Streams geographical data redundancy
ASM
RAC
32 server, 19 of them configured in 7 cluster 40 database instances Storage: 5TB (20TB raw) Availability rate: 98,7% in 2007 Availability (%) = Uptime/(Uptime + Target Downtime + Agent
Downtime)
HEPiX, CERN 176-May-08
Backup At present, backup on tape based on Legato Networker
3.3 Database on-line backup through RMAN, one copy is also
stored on tape via Legato-RMAN plug-in Future migration to IBM TSM is foreseen
Certified interoperability between GPFS and TSM TSM provides not only backup and archiving methods
but also migrations capabilities Possible to exploit TSM migration in order to implement
D1T1 and D0T1 storage classes StoRM/GPFS/TSM integration
HEPiX, CERN 186-May-08
Conclusions INFN – Tier1 is facing a big infrastructural improvement
which will allow to fully meet the experiment requirements for LHC
Farming and network services are already pretty consolidated and are able to grow in term of computing capacity and network bandwidth without deep structural modifications
Storage service has achieved a good degree of stability, last issues are mainly due to implementation of D0T1 and D1T1 storage classes An integration between StoRM, GPFS and TSM is under
development and promises to be a definitive solution for the outstanding problems
HEPiX, CERN 196-May-08