The ALICE Tier-2’s in Italy
description
Transcript of The ALICE Tier-2’s in Italy
1
The ALICE Tier-2’s in Italy
Roberto Barbera(*)
Univ. of Catania and INFNWorkshop CCR INFN 2006
Otranto, 08.06.2006
(*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation.
Workshop CCR INFN 2006, Otranto, 08.06.2006 2
Outline
The ALICE computing model and its parameters ALICE and the Grid(s)
Layout Implementation Recent results
ALICE Tier-2’s in Italy Catania Torino Bari LNL-PD
Summary and conclusions
Workshop CCR INFN 2006, Otranto, 08.06.2006 3
The ALICE computing model (1/2)
pp Quasi-online data distribution and first reconstruction at T0 Further reconstructions at T1’s
AA Calibration, alignment and pilot reconstructions during data
taking Data distribution and first reconstruction at T0 during four
months after AA Further reconstructions at T1’s
One copy of RAW at T0 and one distributed at T1’s
Workshop CCR INFN 2006, Otranto, 08.06.2006 4
The ALICE computing model (2/2)
T0 First pass reconstruction, storage of one copy of RAW,
calibration data and first-pass ESD’s
T1 Reconstructions and scheduled analysis, storage of the
second collective copy of RAW and one copy of all data to be kept, disk replicas of ESD’s and AOD’s
T2 Simulation and end-user analysis, disk replicas of ESD’s and
AOD’s
Workshop CCR INFN 2006, Otranto, 08.06.2006 5
Parameters of the ALICE computing model
Unit pp PbPb
T1 # 7
T2 # 23
Size raw MB 0.2x5 12.5
Recording rate Hz 100 100
ESD MB 0.04 2.50
AOD kB 4 250
Event Catalogue kB 10 10
Running time s 107 106
Events / y # 109 108
Reconstruction passes (av) # 3
RAW duplication # 2
AOD/ESD duplication # 2
Scheduled analysis passes / rec ev / y (av) # 3
Chaotic analysis passes / rec ev / y (av) # 20
Workshop CCR INFN 2006, Otranto, 08.06.2006 6
Legenda: TQ= Task Queue Central job DB
CAT= Central Catalogue
ALICE & the Grid(s)
ALICEAgents
&Daemons
ROOT
ALIROOT
Computing
framework
ResourcesR
esou
rces
NU
Grid
Res
ou
rces
ALICE TQ
ALICE Agents &Daemons
OS
G
Reso
urces
ALICE user
ALICE CAT
Workshop CCR INFN 2006, Otranto, 08.06.2006 7
Implementation: the “VO-Box”
LCG SiteLCG CE
WNJobAgent
LCG SE
LCG
RBTQ
VO-Box
SCA
SA
Job request
LFC SURL Registration
File Catalogue
LFN RegistrationPackManRequest
configuration
Workshop CCR INFN 2006, Otranto, 08.06.2006 8
Who does what ?
Configure, submit and track jobs
User interface with massive production support
Job DB (Production and user) User and role management
Install software on sites Package Managers
Distribute and execute jobs Workload Management System
(Broker, L&B,…) Computing Element software Information Services Interactive analysis jobs
Store and catalogue data Data catalogues (file, replica,
metadata, local,…) Storage Element software
Move data around File Trasfer services and schedulers
Access data files I/O services File management (SRM)
Monitor all that stuff Transport infrastructure Sensors Web presentation
..on top of that:
Enforce security!
MIXED
PROOF
MonALISA
MIXED
Xrootd
Workshop CCR INFN 2006, Otranto, 08.06.2006 9
Some statistics and results for SC3/PDC05 In the last two months of 2005:
22,500 jobs (Pb+Pb and p+p) Average CPU time: 8 hours Data volume produced: 20 TB (90% CASTOR2 at CERN, 10%
remote sites) Resource Centres participating (22 in total)
4 T1: CERN, CNAF, GridKa, CCIN2P3 18 T2: Bari, Clermont (FR) , GSI (D), Houston (USA) , ITEP
(RUS), JINR (RUS) , KNU (UKR), Muenster (D), NIHAM (RO), OSC (USA), PNPI (RUS), SPbSU (RUS), Prague (CZ), RMKI (HU), SARA (NL), Sejong (SK), Torino, UiB (NO)
Job share per site: T1: CERN 19%, CNAF 17% (CPU 20%), GridKa 31%, CCIN2P3
22% T2: total of 11%
Failure rate di AliRoot: 2.5%
Workshop CCR INFN 2006, Otranto, 08.06.2006 10
Job execution profile during SC3
2450 jobs (25% more than entire lxbatch capacity at Cern)
Negative slope: AliEn problem during output retrieval. Fixed in the further release!
Workshop CCR INFN 2006, Otranto, 08.06.2006 11
Without
INFN-T1~388000 job
~811000 job
Memento: VO= Virtual Organization (esperimento)
ALICE: 8% of the total number of jobs on the national grid
Use of INFN Grid by LHC Exps.: JOB/VO (Sep 2005 - Dec 2005)
Workshop CCR INFN 2006, Otranto, 08.06.2006 12
~ 98 years, 2 month, 18 days~ 358 years, 7 months, 11 days
Without
INFN-T1
ALICE: 14% of CPU time outside T1
Use of INFN Grid by LHC Exps.: CPU/VO (Sep 2005 - Dec 2005)
Workshop CCR INFN 2006, Otranto, 08.06.2006 13
ALICE JOBS PER SITE.
Warning:
Job agents and real jobs are accounted in the same way
Workshop CCR INFN 2006, Otranto, 08.06.2006 14
ALICE Tier-2’s in Italy
Four candidates: Bari, Catania, LNL-PD, and Torino (T2 projects available at the URL: http://www.to.infn.it/~masera/TIER2/).
The team of ALICE referees with representatives of the INFN Management Board visited all Tier-2 candidates between 10/2005 and 02/2006.
Referees’ decision communicated at a meeting in Rome on 10/03/2006: Catania and Torino approved; Bari and LNL-PD “incubated” (kept in “life support” until real
ALICE needs are proved by real test of the computing model in production mode).
Workshop CCR INFN 2006, Otranto, 08.06.2006 15
Network connectivity of T2-s
ALICE Tier-2’s
Workshop CCR INFN 2006, Otranto, 08.06.2006 16
Catania (1/5) – Comp. room
Present installation
Future expansion
Space available for installations: ~160 m2
Workshop CCR INFN 2006, Otranto, 08.06.2006 17
Catania (2/5) - Infrastructure
TraditionalSystem
High DensitySystem
Workshop CCR INFN 2006, Otranto, 08.06.2006 18
Catania (3/5) - CPU150 kSI2k
SuperMicro dual AMD dual-core 275 with 4 GB RAM in 1U configuration
IBM LS20 “blades” with dual AMD dual-core 280 with 4 GB RAM (within june)
LSF 6.1 as LRMS
Workshop CCR INFN 2006, Otranto, 08.06.2006 19
Catania (4/5) - Storage
21+ TB with GPFS
• FC-2-SATA systems plus more
traditional DAS with EIDE-2-SCSI
controllers
• Filesystem: GPFS
Workshop CCR INFN 2006, Otranto, 08.06.2006 20
Catania (5/5) - Statistics
Last month activity
Workshop CCR INFN 2006, Otranto, 08.06.2006 21
Torino (1/5) – Computing Room
Workshop CCR INFN 2006, Otranto, 08.06.2006 22
Torino (2/5) - Present installation
• Present solutions: blade servers (IBM) and 1U biprocessors• Guidelines for the future:
•Minimize space•Minimize power consumption
Workshop CCR INFN 2006, Otranto, 08.06.2006 23
Torino (3/5) - Resources CPU
38 Intel(R) Xeon(TM) CPU 2.40GHz; 12 Intel(R) Xeon(TM) CPU 3.06GHz. 45 Intel Biprocessors (<=4 years – 14 Blades)
DISK ~6TB dedicated to ALICE 2TB shared among various VO’s (Classic-SE); 1 dCache SE with an internal disk of ~80GB for tests; ~15TB of disk space for ALICE is going to be commissioned soon. It
is a FLX210 with 3 FLC200 expansions from di StorageTek Filesystem
Ext3 for the ClassicSE; not yet defined for the new storage system; Tests with xrootd for local and remote access (through proxy) are
scheduled. LRMS
Torque-Maui; the default one coming with the INFN Grid release
Open to all VO’s
Dedicated to ALICE (at the moment)
Workshop CCR INFN 2006, Otranto, 08.06.2006 24
Torino (4/5) - Resources
Future evolution Many nodes (~20 – the most recent) are being migrated
from the ALICE farm to the LCG farm exploiting the forthcoming upgrade to gLite 3.0;
New WN’s (80 cores – 130 KSI2K), recently bought, will be installed and configured very soon.
Networking: All WN’s are in a hidden LAN (only outbound connectivity is
allowed) and the NATting is done by an Extreme Networks switch. Almost all connection are Gigabit Ethernet.
Monitoring: MRTG and NAGIOS for the local control of the farm.
Workshop CCR INFN 2006, Otranto, 08.06.2006 25
Torino (5/5) - Usage
Scheduler locale. # di job
LCG. Numero di Job
Monitoring centrale ALICE
Workshop CCR INFN 2006, Otranto, 08.06.2006
Bari (1/2)
Bari is a Tier-2 candidate both for ALICE and CMS.
Bari supports also other VO’s. Priorities are given to the various VO’s
proportionally to the different budgets for acquiring resources.
In the last two years Bari has provided resources for ALICE both for PDC04 and SC3 and will provide for SC4.
Workshop CCR INFN 2006, Otranto, 08.06.2006
Bari (2/2) One 2 cpu 700 MHz PIII aligrid1.ba.infn.it - HD 40 GB One 2 cpu 1 GHz PIII alicegrid2.ba.infn.it - HD 160 GB Three 2 cpu Intel Xeon 1.8 GHz alicegrid4 - alicegrid6
(VOBOX) - 3 HD da 80GB One 2 cpu Intel Xeon 1.8 GHz alicegrid3.ba.infn.it - (SE for
PDC04) with 0.7 TB of data One 2 cpu Intel Xeon 2.4 GHz alicegrid5.ba.infn.it - (SE for
Finuda) with 1.5 TB disk space Three 2 cpu Intel Xeon 2.4 GHz - HD 80 GB One 2 cpu Intel Xeon 2.4 GHz alicegrid7.ba.infn.it - HD 80
GB - software repository + Quattor installation server One Opteron 2 dual core 275 - HD 120 GB Three 2 cpu Intel Xeon 2.8 GHz - HD 80 GB One 2 cpu Intel Xeon 3.0 GHz EM64T - HD 2 array x 2.5 TB
(TOT 5 TB) (to be configured with xrootd for SC4)
Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE jobs at Bari (monitored by MonaLisa)
Workshop CCR INFN 2006, Otranto, 08.06.2006 29
LNL-PD
Background: LNL-PD is an approved Tier-2 for CMS; Many-years experience in running a T2 prototype for
CMS.
Size of the existing Tier-2 for CMS: CPU: ~200 KSI2K (almost all “blades” dual core) Storage: EIDE-2-SCSI DAS with 3Ware + Storage Area
Network LRMS: LSF Monitoring: Ganglia (local) + GridIce
Workshop CCR INFN 2006, Otranto, 08.06.2006 30
ALICE at LNL-PD
ALICE activities already done: ALICE VO-box installed in 02/2006 Site testing with small productions OK Big ALICE production in April-May via LCG
Future activities foreseen for the rest of 2006: Participation to PDC06 (~10 kSI2k dedicated resources
+ the possibility to use CMS resources, if/when available)
Installation of an ALICE storage system with xrootd (~1 TB at the beginning)
Workshop CCR INFN 2006, Otranto, 08.06.2006 31
ALICE jobs at LNL-PD(monitored by GridIce)
ALICE
15 April 2006 – 15 May 2006
Workshop CCR INFN 2006, Otranto, 08.06.2006 32
Common issues
Need for a common solution for the infrastructure (to improve the economy of scale).
Need for an affordable, reliable, and scalable solution for the storage.
Need for a better organization of distributed support for Tier-2’s.
Although new technologies (“blades” with low-power CPU’s) help a bit, power consumption at Tier-2 sites is becoming increasingly important from an economic point of view. Strict guidelines and a dedicated budget should be centrally created by INFN Management.
Workshop CCR INFN 2006, Otranto, 08.06.2006 33
The future: PDC06 (June 2006) Check of the distributed computing model:
From raw-data to ESD Data tranfers among sites Calibration and alignment Analysis
SC3 experience has helped a lot to improve AliEn (current version 2.10)
Intense development of AliRoot to include calibration and alignment code for all sub-detectors and reduce the percentage of run time failures. Huge effort of the Italian groups in many sites.
Workshop CCR INFN 2006, Otranto, 08.06.2006 34
Resources ramp up at INFN Tier-2’s
2006 2007 2008 2009 2010CPU (KSI2K) 460 1070 2520 5000 6000DISK (TB) 160 379 894 1773 2128CPU/DISK 2.88 2.82 2.82 2.82 2.82
2006 2007 2008 2009 2010CPU (KSI2K) 160 610 1450 2480 1000DISK (TB) 115 219 514 879 355
2006 2007 2008 2009 2010CPU (KSI2K) 0 80 0 220 160DISK (TB) 0 15 0 30 115
2006 2007 2008 2009 2010CPU (KSI2K) 160 690 1450 2700 1160DISK (TB) 115 234 514 909 470
2006 2007 2008 2009 2010CPU (kEur) 92 261 369 446 144DISK (kEur) 258 329 450 498 160Tot (kEur) 351 590 819 944 304GrandTotal 3008 k€
TOTAL ACQUISITIONS (PER YEAR)
COSTS (P.Capiluppi & A. Masoni)
T2 - Present ramp up (year = acquisition)INTEGRATED ESTIMATES @ TIER-2
NEW RESOURCES (differential)
REPLACEMENTS
Workshop CCR INFN 2006, Otranto, 08.06.2006 35
Summary and conclusions
The ALICE computing model has been finalized and now it is ready to face the forthcoming data from LHC.
INFN has identified the first official Tier-2’s for ALICE.
Both for the design and the day-by-day operation of a LHC Tier-2 a strong collaboration between the Experiments, the INFN Grid Project, the INFN CCR, and the Computing&Network Services at the various INFN Departments is of vital importance.