رجيم Ùصيلة الدم ØŒ د. Øسن Ùكري منصور(2).pdf
D Ø MC and Data Processing on the Grid
-
Upload
wesley-doyle -
Category
Documents
-
view
44 -
download
0
description
Transcript of D Ø MC and Data Processing on the Grid
![Page 1: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/1.jpg)
DØ MC and Data Processing on the Grid
Brad AbbottUniversity of OklahomaD0SAR Sept 21, 2006
![Page 2: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/2.jpg)
Computing at DØ• Provide the necessary resources for
primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification…
• Provide this in a timely manner to allow researchers to analyze data in efficient manner.
![Page 3: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/3.jpg)
Challenges
• Collecting data at ~ 50 events/sec. • Processing time is ~ 70 GHz-sec event• ~ 900 CPU’s on DØ farm running 24/7 to keep
up with data• Need Millions of Monte Carlo events • Store data to tape and allow easy access (SAM)• Have ability to reprocess, fix, data in timely
manner.• Provide computing resources to analyzers
![Page 4: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/4.jpg)
Local Facilities • 70 TB of project disk CluedØ/CAB• CAB
– 2.2 THz of CPU (comparable to the FNAL production farm)– 235TB of SAM Cache– More CPU/Disk on order
• CluedØ– An incredible resource by the people for the people!– 1+ THz– SAM Cache– 70 TB (nodes) + 160 TB (servers)
![Page 5: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/5.jpg)
Usage
ANALYSIS STATIONS data analyzed events projectsclued0 15.09T 402M 646fnal-cabsrv2 115.51T 2685M 1611fnal-cabsrv1 85.56T 2358M 985
D0 TOTAL 216.16T 5446M 3242
Monday ReportAugust 14, 2006, Typical week
What does a typical week look like?
![Page 6: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/6.jpg)
Analysis over time• Events consumed by station since “the
beginning of SAM time”• Integrates to 300B events consumed
Cabsrv-Blue, redClued0-grey
![Page 7: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/7.jpg)
StatisticsD0 Vital Statistics
1997(projections) 2006Peak (Average) Data Rate(Hz) 50(20) 100(35)Events Collected 600M/ year 1.5 BRaw Data Size (kbytes/ event) 250 250Reconstructed Data Size (kbytes/ event) 100 (5) 80User f ormat (kbytes/ event) 1 40Tape storage 280 TB/ year 1.6 pb on tapeTape Reads/ writes (weekly) 30TB/ 7TBAnalysis/ cache disk 7TB/ year 220 TB Reconstruction Time (Ghz-sec/ event) 2.00 50 (120)Monte Carlo Chain f ull Geant f ull Geantuser analysis times (Ghz-sec/ event) ? 1user analysis weekly reads ? 5+E18B eventsPrimary Reconstruction f arm size (THz) 0.6 2.4 THzCentral Analysis f arm size (GHz) 0.6 2.2 THzRemote resources(GHz) ? ~ 2.5 THz(grid)
![Page 8: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/8.jpg)
Current Computing Status
• Overall very good. • Reconstruction keeping up with data
taking. • Data handling working well• Remote sites for MC, reprocessing,
processing, fixing• Significant analysis CPU
![Page 9: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/9.jpg)
Future challenges
• Larger data sets – Luminosities > 200 E 30
• Increased sharing of manpower with LHC– Reduced manpower for DØ
• Tight budgets– Need to use shared resources
![Page 10: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/10.jpg)
Significantly longer to processComputing resources need toDeal with this
Previously
Need to plan on luminosities of 400 E 30
![Page 11: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/11.jpg)
DØ computing model
• Distributed computing, moving toward automated use of common tools on grid
• Scalable• Work with LHC, not against, increased
resources• Need to conform to standards• DØ running experiment and is taking data.
Need to take prudent approach to computing• SAMgrid
![Page 12: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/12.jpg)
SamGrid
• SAM: Data Handling– Over 7PB consumed last year– Up to 1 PB/month
• SAMGrid: – JIM: Job submission and monitoring– SAM+JIM: SAMGrid– 20 native execution sites– Automated submission to other grids
![Page 13: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/13.jpg)
Progression on Remote Farms
• MC data reprocessing processing skimming* analysis*
• Facilities: Dedicated farms shared farm OSG/LCG
• Automation: Expert regional farmer any user*
*Not yet implemented
![Page 14: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/14.jpg)
Data Reprocessing on Grid
• Reprocessing of data: 1 Billion events (250 TB from raw)– SAMGrid as default, using shared resources– 3.5 THz for 6 months – Largest such effort in
HEP• Refixing: 1.4 B events in 6 weeks
– Used SAMGrid, automated use of LCG,OSG• Finished on time. Very successful
![Page 15: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/15.jpg)
Processing on Grid
• Prefer not to do primary processing on Grid.
• Can do processing on a few select sites that have been well certified (Has been shown, Cable swap data processed at OU)
• Certification of Grid is problematic• Do not need to worry about fair-share,
availability of nodes etc.
![Page 16: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/16.jpg)
Cable swap data at OU
• First time that primary processing performed at a remote site for DØ
• Processed 9463 files• Total of 3421.6 GB• Events: 18391876• Took ~ 3 months. Partly since we
only had ~70 of the available 270 CPU’s
![Page 17: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/17.jpg)
MC Production resources
• All produced offsite• MC less stringent, i.e. can always make more• Native SAMGrid Producers: CMS-FNAL. Gridka,
LTU, LU, MSU, OU(2), SPRACE, TATA, Westgrid, Wuppertal, FZU
• Non-SAMGrid: Lyon and Nikhef• LCG -21 CE’s (10 UK, 6 FR, 3NL, 1 CZ, 1 DE)• OSG 8 CE’s ( UNL, IU, Purdue, SPGRID,
OCHEP, TOPDAWG, UWM, CMS-FNAL
![Page 18: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/18.jpg)
Monte Carlo• More than 250 Million
events produced• Up to 10 million
events/week• LCG and OSG• 59% SAMGrid• 80.4% Europe• 15.7% N. America• 3.5% S. America• 0.3% Asia
![Page 19: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/19.jpg)
Current plans
• Reprocessing of Run IIB data needed• 300 million events• Takes ~ 80 GHZ-sec/event to process• Expect to need ~ 2000 CPUs for 4 months
to reprocess data• Utilize OSG sites much more extensively• SAM v7 (One version of SAM)• Plan on beginning in November
![Page 20: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/20.jpg)
Current plans (cont)• Overall priority is to reduce manpower needs for
midterm and long term by assuring additional functionality is quickly developed. First in SAMGrid mode with rapid transfer to automated forwarding nodes.
• CAB running as part of Fermigrid• Moving full functionality to the forwarding mechanisms• Automated production of MC with OSG• Sam shifters take over responsibility of submitting jobs• Automated submission to use full power of
interoperability/grid resources
![Page 21: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/21.jpg)
OSG/LCG
![Page 22: D Ø MC and Data Processing on the Grid](https://reader030.fdocuments.in/reader030/viewer/2022020423/5681388f550346895da03e17/html5/thumbnails/22.jpg)
Conclusions
• DØ computing model very successful• MC and data are continuing to move more
toward using Grid resources• LCG has been used more heavily in past
but soon OSG will be more heavily utilized• Remote computing critical for continued
success of DØ