LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005.
-
Upload
geoffrey-beasley -
Category
Documents
-
view
216 -
download
2
Transcript of LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005.
LHCb Computingand Grid Status
Glenn PatrickLHCb(UK), Dublin – 23 August 2005
2Glenn Patrick LHCb(UK) – 23 August 2005
Computing completes TDRs
Jan 2000June 2005
3Glenn Patrick LHCb(UK) – 23 August 2005
LHCb – June 2005
03 June 2005
HCAL
MF1-MF3Mu-filters
MF4
LHCb Magnet
ECAL
4Glenn Patrick LHCb(UK) – 23 August 2005
Grid World
Online System
Multiplexing Layer
FE FE FE FE FE FE FE FE FE FE FE FE
SwitchSwitch SwitchSwitch
Level-1Traffic
HLTTraffic
1000 kHz5.5 GB/s
40 kHz1.6 GB/s
94 SFCs
Front-end Electronics
7.1 GB/s
TRM
Sorter
L1-Decision
StorageSystem
Readout NetworkReadout Network
SwitchSwitch SwitchSwitch SwitchSwitch
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
Switch
CPUCPU
CPUCPU
CPUCPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
Switch
CPUCPU
CPUCPU
CPUCPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
Switch
CPUCPU
CPUCPU
CPUCPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
Switch
CPUCPU
CPUCPU
CPUCPU
SFC
Switch
CPU
CPU
CPU
SFC
Switch
CPU
CPU
CPU
Switch
CPUCPU
CPUCPU
CPUCPU
CPUFarm
~1600 CPUs
~ 250 MB/stotal
TIER0
Scalable in depth:more CPUs (<2200)Scalable in width: more detectors in Level- 1
TFCSystem
ECS
40 MHz
Level-0Hardware
1 MHz
Level-1Software
40 kHz
HLTSoftware
2 kHz Tier 0
Raw Data:2kHZ, 50MB/s Tier 1Tier 1Tier 1Tier 1Tier 1Tier 1
5Glenn Patrick LHCb(UK) – 23 August 2005
HLT Output
b-exclusive dimuon D* b-inclusive Total
Trigger Rate (Hz)
200 600 300 900 2000
Fraction 10% 30% 15% 45% 100%
Events/year (109)
2 6 3 9 20
200 Hz Hot StreamWill be fully reconstructed on online farmin real time. “Hot stream” (RAW + rDST) written to Tier 0.
2kHzRAW data written to Tier 0 for reconstruction at CERN and Tier 1s.
XJ / )(0* KDD Calibration for proper-
time resolution.Clean peak allows PID
calibration.
bUnderstand bias on other
B selections.
6Glenn Patrick LHCb(UK) – 23 August 2005
Data Flow
ReconstructionBrunel
SimulationGauss
DigitisationBoole
AnalysisDaVinci
MC Truth Raw Data DST AnalysisObjects
StrippedDST
Framework - Gaudi
DetectorDescription
ConditionsDatabase
Event Model/Physics Event Model
7Glenn Patrick LHCb(UK) – 23 August 2005
Grid Architecture
Tier 1 centre(RAL)
+4 virtual Tier 2
centres
LCG-2/EGEE World’s Largest Grid!~16,000 CPU and 5PB over 192 sites in ~39 countriesGridPP provides ~3,000 CPU at 20 UK sites
8Glenn Patrick LHCb(UK) – 23 August 2005
Grid Ireland
EGEE made up of regions.
UKI region consists of 3federations:
GridPP
Grid Ireland
National Grid Service
We are here
9Glenn Patrick LHCb(UK) – 23 August 2005
LHCb Computing Model
14 candidates
CERN Tier 1 essential
for accessing “hot stream” for1. First alignment
& calibration.2. First high-level
analysis.
10Glenn Patrick LHCb(UK) – 23 August 2005
LHC Comparison
Experiment TIER 1 TIER 2
ALICE Reconstruction MC productionChaotic analysis Chaotic analysis
ATLAS Reconstruction SimulationScheduled analysis/strimming AnalysisCalibration Calibration
CMS Reconstruction Analysis for 20-100 users
All simulation prodn.
LHCb Reconstruction MC productionScheduled strimming No analysis.Chaotic analysis
11Glenn Patrick LHCb(UK) – 23 August 2005
Distributed Data
RAW DATA500 TB
CERN = Master Copy
2nd copy distributed over six Tier 1s
STRIPPING140 TB/pass/copy
Pass 1: During data taking at CERN and Tier 1s (7 months)Pass 2: After data taking at CERN and Tier 1s (1 month)
RECONSTRUCTION500TB/pass
Pass 1: During data taking at CERNand Tier 1s (7 months)
Pass 2: During winter shutdown atCERN, Tier 1s and online farm (2months)
Pass 3: During shutdown at CERN, Tier 1s and online farm
Pass 4: Before next year data taking at CERN and Tier 1s (1 month)
12Glenn Patrick LHCb(UK) – 23 August 2005
Check File integrity
DaVinci stripping
Check File integrity
DaVinci stripping
Check File integrity
DaVinci stripping
Stripping Job - 2005
Read INPUTDATA and stage them in 1 go
Check File status
Not yet Staged
Prod DB
group2group1
groupN
staged
Send bad file info
Check File integrity
DaVinci stripping
Good file
Merging processDST and ETC
ETC
DST
Send file info
Usage of SRM
Stripping runs on reducedDSTs (rDST).
Pre-selection algorithmscategorise events into streams.
Events that pass are fullyreconstructed and full DSTs written.
CERN, CNAF, PIC used so far – sites based on CASTOR.
13Glenn Patrick LHCb(UK) – 23 August 2005
LHCb Resource Profile
Global CPU (MSI2k.yr)2006 2007 2008 2009 2010
CERN 0.27 0.54 0.90 1.25 1.88
Tier-1’s 1.33 2.65 4.42 5.55 8.35
Tier-2’s 2.29 4.59 7.65 7.65 7.65
TOTAL 3.89 7.78 12.97 14.45 17.88
Global DISK (TB)2006 2007 2008 2009 2010
CERN 248 496 826 1095 1363
Tier-1’s 730 1459 2432 2897 3363
Tier-2’s 7 14 23 23 23
TOTAL 984 1969 3281 4015 4749
Global MSS (TB)2006 2007 2008 2009 2010
CERN 408 825 1359 2857 4566
Tier-1’s 622 1244 2074 4285 7066
TOTAL 1030 2069 3433 7144 11632
14Glenn Patrick LHCb(UK) – 23 August 2005
Tier1 CPU
0
20
40
60
80
100
120
140
160
Jan
Feb
Ma
r
Ap
r
Ma
y
Jun Jul
Au
g
Se
p
Oct
No
v
De
c
Jan
Feb
Ma
r
Ap
r
Ma
y
Jun Jul
Au
g
Se
p
Oct
No
v
De
c
Jan
Feb
Ma
r
Ap
r
Ma
y
Jun Jul
Au
g
Se
p
Oct
No
v
De
c
Date
MS
I2k
LHCb
CMS
ATLAS
ALICE
2008 2009 2010
Comparisons - CPUTier 1 CPU – integrated (Nick Brook)
LHCb
15Glenn Patrick LHCb(UK) – 23 August 2005
Comparisons- DiskLCG TDR – LHCC, 29.6.2005 (Jurgen Knobloch)
0
20
40
60
80
100
120
140
160
2007 2008 2009 2010Year
PB
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
54%
pled
ged
CE
RN
Tie
r-1
Tie
r-2
16Glenn Patrick LHCb(UK) – 23 August 2005
UK Tier 1 Status
Total Available (August 2005)
CPU = 796 KSI2K (500 dual cpu)Disk = 187 TB (60 servers)Tape = 340 TB
Minimum Required Tier 1 2008
CPU = 4732 KSI2K Disk = 2678 TBTape = 2538 TB
LHCb(UK) 2008 (15% share)
CPU = 663 KSI2K Disk = 365 TBTape = 311 TB
LHCb(UK) 2008 (1/6 share)
CPU = 737 KSI2K Disk = 405 TBTape = 346 TB
17Glenn Patrick LHCb(UK) – 23 August 2005
UK Tier 1 Utilisation
Hardware purchase scheduled for early 2005
postponed.PPARC discussions ongoing.
Capacity
Grid
Non-Grid
70%
Grid use increasing.
CPU “undersubscribed” (but efficiencies of Grid jobs may be a problem).
LHCb 69%Jan-July 2005
CPU/Walltime < 50%for some ATLAS jobs
18Glenn Patrick LHCb(UK) – 23 August 2005
UK Tier 1 Exploitation
BaBar
LHCb
ATLAS
2004
LHCb ATLAS
BaBar17.8.05
2005
19Glenn Patrick LHCb(UK) – 23 August 2005
UK Tier 1 StorageClassic SE not sufficient as LCG storage solution.SRM now the agreed interface to storage resources.Lack of SRM prevented data stripping at UK Tier 1.
This year, new storage infrastructure deployed for UK Tier 1.
Storage Resource Manager (SRM) – Interface providing a combined view of secondary and tertiary storage to Grid clients.
dCache – Disk Pool Management system jointlydeveloped by DESY and FermiLab.• Single namespace to manage 100s of TB of data.• Access via GRIDFTP and SRM.• Interfaced to RAL tapestore.
CASTOR under evaluation as replacement for home-grown (ADS) tape service. CCLRC to deploy 10,000 tape robot?
LHCb now has disk allocation of 8.2TB with 4x1.6TB under dCache control (c.f. BaBar=95TB, ATLAS=19TB, CMS=40TB).
Computing Model says LHCb Tier 1 should have ~122TB in 2006…
20Glenn Patrick LHCb(UK) – 23 August 2005
UK Tier 2 Centres
CPU Disk
ALICE ATLAS CMS LHCb ALICE ATLAS CMS LHCb
London 0.0 1.0 0.8 0.4 0.0 0.2 0.3 11.0
NorthGrid 0.0 2.5 0.0 0.3 0.0 1.3 0.0 12.1
ScotGrid 0.0 0.2 0.0 0.2 0.0 0.0 0.0 39.6
SouthGrid 0.2 0.5 0.2 0.3 0.0 0.1 0.0 6.8
Committed Resources available to experiment at Tier-2 in 2007Size of an average Tier-2 in experiment's computing
model
Hopefully, more resources from future funding bids
e.g. SRIF3 April 2006 – March 2008
Under Delivered - Tier1+Tier2 (March 2005)
CPU = 2277 KSI2K out of 5184 KSI2K, DISK = 280TB out of 968TB
Improving as hardware is deployed in the Tier 2 institutes.
21Glenn Patrick LHCb(UK) – 23 August 2005
Tier 2 Exploitation
Over 40 sitesin UKI federationof EGEE + over 20Virtual Organisations.
GRIDPP only.Does not
include GridIreland.
17 Aug, Grid Operations Centre
LHC
b
CM
S
ATLA
S
800 data points – improved accountingprototype on the way…
…but you get the idea.Tier 2 sites are vital LHCb Grid resource.
BaB
ar
22
DIRAC Architecture
DIRAC JobManagement
Service
DIRAC JobManagement
Service
DIRAC CEDIRAC CEDIRAC CEDIRAC CE
DIRAC CEDIRAC CE
LCGLCGResourceBroker
ResourceBroker
CE 1CE 1
DIRAC SitesDIRAC Sites
AgentAgent AgentAgent AgentAgent
CE 2CE 2
CE 3CE 3
Productionmanager
Productionmanager GANGA UIGANGA UI User CLI User CLI
JobMonitorSvcJobMonitorSvc
JobAccountingSvcJobAccountingSvc
AccountingDB
Job monitorJob monitor
InformationSvcInformationSvc
FileCatalogSvcFileCatalogSvc
MonitoringSvcMonitoringSvc
BookkeepingSvcBookkeepingSvc
BK query webpage BK query webpage
FileCatalogbrowser
FileCatalogbrowser
Userinterfaces
DIRACservices
DIRACresources
DIRAC StorageDIRAC Storage
DiskFileDiskFile
gridftpgridftpbbftpbbftp
rfiorfio
Services Oriented Architecture
23Glenn Patrick LHCb(UK) – 23 August 2005
Data Challenge 2004
DIRAC alone
LCG inaction
1.8 106/day
LCG paused
Phase 1 Completed
3-5 106/day
LCG restarted
187 M Produced Events
LHCb DC'04
0
20
40
60
80
100
120
140
160
180
200
Total may june july august
Month
Ev
en
ts (
M)
LCG
DIRAC
20 DIRAC sites + 43 LCG sites were used.
Data written to Tier 1s.
• Overall, 50% of events produced using LCG.
• At end, 75% produced by LCG.
UK second largest producer (25%) after
CERN.
24Glenn Patrick LHCb(UK) – 23 August 2005
RTTC - 2005
Real Time Trigger Challenge – May/June 2005
150M Minimum bias events to feed online farmand test software trigger chain.
Completed in 20 days (169M events) on 65 different sites.95% produced with LCG sites5% produced with “native” DIRAC sites
Average of 10M events/day.Average of 4,000 cpus
CountriesEvents
Produced
UK 60 M
Italy 42 M
Switzerland 23 M
France 11 M
Netherlands 10 M
Spain 8 M
Russia 3 M
Greece 2.5 M
Canada 2 M
Germany 0.3 M
Belgium 0.2M
Sweden 0.2 M
Romany, Hungary, Brazil, USA
0.8 M
37%
25Glenn Patrick LHCb(UK) – 23 August 2005
Looking Forward
SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmicsSC4
Next ChallengeSC3 – Sept. 2005
Start DC06Processing phase
May 2006
Alignment/calibrationChallenge
October 2006
Ready for data takingApril 2007
Analysisat Tier 1sNov. 2005
Excellent support from UK Tier 1 at RAL.2 application support posts at Tier 1 appointed in June 2005 BUT LHCb(UK) technical co-ordinator still to be appointed.
26Glenn Patrick LHCb(UK) – 23 August 2005
LHCb and SC3
Phase 1 (Sept. 2005 ):
a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.
b) Removal of replicas (via LFN) from all Tier 1 centres.
c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.
d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.
Phase 1 (Sept. 2005 ):
a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.
b) Removal of replicas (via LFN) from all Tier 1 centres.
c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.
d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.
Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier
1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.
b) Analysis of stripped data in Tier 1 centres.
Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier
1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.
b) Analysis of stripped data in Tier 1 centres.
27Glenn Patrick LHCb(UK) – 23 August 2005
GridPP Status
GRIDPP1 Prototype Grid£17M, complete
September 2001 – August 2004
GRIDPP2 Production Grid£16M, ~20% complete
September 2004 – August 2007
Beyond August 2007?
Funding from September 2007 will be incorporated as part of PPARC’s request for planning input for LHC exploitation.
To be considered by panel (G. Lafferty, S. Watts & P. Harris) providing input to the Science Committee in the autumn.
Input from ALICE, ATLAS, CMS, LHCb and GRIDPP.
28Glenn Patrick LHCb(UK) – 23 August 2005
SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
SC4
LCG StatusLCG-2 (=EGEE-0)
prototyping
prototyping
product
20042004
20052005
LCG-3 (=EGEE-x?)
product
LCG has two phases.Phase 1: 2002 – 2005 Build a service prototype, based on
existing grid middleware Gain experience in running a
production grid service Produce the TDR for the final system
LCG and experiment TDRs submitted
Phase 2: 2006 – 2008 Build and commission the initial LHC
computing environment
We
are
her
e
29Glenn Patrick LHCb(UK) – 23 August 2005
UK:Workflow Control
Primary eventSpill-over event
Production DesktopGennady Kuznetsov
(RAL)
Gauss B Gauss MBGauss MBGauss MB
Boole B Boole MBBoole MBBoole MB
Brunel B Brunel MB
Sim
Digi
Reco
Software installation
Gaussexecution
Check logfile
Dir listing
Bookkeeping
report
Steps
Modules
Used for RTCC and currentproduction/stripping.
30Glenn Patrick LHCb(UK) – 23 August 2005
Web
Browser
Bookkeep
ing
AR
DA
S
erv
er
TCP/IPStreaming
AR
DA
C
lien
t AP
I
Tomcat S
erv
let
AR
DA
Clie
nt
AP
I
GANGA
application
UK: LHCb Metadata and ARDA
Carmine Cioffi (Oxford)
Testbed underway to measure performance with
ARDA and ORACLE servers.
31Glenn Patrick LHCb(UK) – 23 August 2005
AtlasPROD
DIAL
DIRAC
LCG2
gLite
localhost
LSF
submit, kill
get outputupdate status
store & retrieve job definition
prepare, configure
Ganga4
JobJobJobJob
scripts
Gaudi
Athena
AtlasPROD
DIAL
DIRAC
LCG2
gLite
localhost
LSF
+ split, merge, monitor, dataset selection
UK: GANGA Grid Interface
Karl Harrison (Cambridge)
Alexander Soroko (Oxford)
Alvin Tan (Birmingham)Ulrik Egede (Imperial)
Andrew Maier (CERN)
Kuba Moscicki (CERN)
Ganga 4 beta release8th July
32Glenn Patrick LHCb(UK) – 23 August 2005
UK: Analysis with DIRAC
Task-Queue
Agent
Job executes on WN
DIRACJob
Installs software
Closest SE
Data as LFN
Matching
Check for allSE’s whichhave data
If no dataspecified
Software Installation + Analysis via DIRAC WMS
Stuart Patterson (Glasgow)
DIRAC API for analysis job submission
[ Requirements = other.Site == "DVtest.in2p3.fr"; Arguments = "jobDescription.xml"; JobName = "DaVinci_1"; OutputData = { "/lhcb/test/DaVinci_user/v1r0/LOG/DaVinci_v12r11.alog" }; parameters = [ STEPS = "1"; STEP_1_NAME = "0_0_1" ]; SoftwarePackages = { "DaVinci.v12r11" }; JobType = "user"; Executable = "$LHCBPRODROOT/DIRAC/scripts/jobexec"; StdOutput = "std.out"; Owner = "paterson"; OutputSandbox = { "std.out", "std.err", "DVNtuples.root", "DaVinci_v12r11.alog", "DVHistos.root" }; StdError = "std.err"; ProductionId = "00000000"; InputSandbox = { "lib.tar.gz", "jobDescription.xml", "jobOptions.opts" }; JobId = ID ]
PACMANDIRAC installation tools
See later talk!
Half way there!
But the climb getssteeper and there
may be more mountains beyond
2007
2005 Monte-Carlo Production on the Grid
2007 Data Taking
Data Stripping
Distributed Analysis
Distributed Reconstruction
Conclusion
DC04
DC03