Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S....

14
Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Rob Gardner Indiana University Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory Brookhaven National Laboratory NOVEMBER 14-17, 2000 NOVEMBER 14-17, 2000

Transcript of Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S....

Page 1: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Distributed IT Infrastructure for U.S. ATLAS

Rob GardnerRob Gardner

Indiana UniversityIndiana University

DOE/NSF Review of U.S. ATLAS and CMS Computing ProjectsDOE/NSF Review of U.S. ATLAS and CMS Computing Projects

Brookhaven National LaboratoryBrookhaven National LaboratoryNOVEMBER 14-17, 2000NOVEMBER 14-17, 2000

Page 2: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 2

Outline

RequirementsRequirements

ApproachApproach

Organization Organization

Resource RequirementsResource Requirements

ScheduleSchedule

Fallback IssuesFallback Issues

Page 3: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 3

Distributed IT Infrastructure

A wide area computational infrastructure for U.S. ATLAS A wide area computational infrastructure for U.S. ATLAS A network of distributed computing devices

A network of distributed data caches & stores

Connectivity Physicists with data (laptop scale sources: LOD) Computers with data (at all scales) Physicists with each other (collaboration)

Distributed information, portals

EffortsEfforts Data Grid R&D

Strategic “remote” sites (Tier 2s)

Distributed IT support at the Tier 1 center

Page 4: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 4

Requirements

AccessAccess Efficient access to resources at the Tier 1 facility Data distribution to remote computing devices

InformationInformation A secure infrastructure to locate, monitor and manage collections of

distributed resources Analysis planning framework

Resource estimation “Matchmaker” tools to optimally connect physicist+CPU+data+etc…

ScalableScalable Add arbitrary large numbers of computing devices as they become available Add arbitrarily large numbers of data sources as they become available

Page 5: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 5

Approach

~5 strategic remote sites (Tier 2s)~5 strategic remote sites (Tier 2s)

Scale of each facility:Scale of each facility: MONARC estimates

ATLAS NCB/WWC (World Wide Computing Group) National Tier 1 facility

209 Spec95 365 Online disk 2 PB tertiary

Tier 2 = Tier 1 * 20%

Page 6: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 6

Role of Tier 2 Centers

User AnalysisUser Analysis Standard configuration optimized for analysis at the AOD level

ESD objects required for some analysis

Primary Resource for Monte Carlo Simulation Primary Resource for Monte Carlo Simulation

““Spontaneous” production level ESD skims (autonomy)Spontaneous” production level ESD skims (autonomy)

Data distribution cachesData distribution caches

Remote data storesRemote data stores HSM serve to archive AODs

MC data of all types (GEN, RAW, ESD, AOD, LOD) from all Tier 2’s & users

Page 7: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 7

Typical Tier 2

CPU: 50KCPU: 50K SpecInt95 SpecInt95 (t1: 209K)(t1: 209K)

Commodity Pentium/Linux

Estimated 144 Dual Processor Nodes (t1: 640) Online Storage: 70 TB Disk (t1: 365)

High Performance Storage Area Network

Baseline: Fiber Channel Raid Array

Page 8: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 8

‘Remote’ Data Stores

Exploit existing infrastructureExploit existing infrastructure

mass store infrastructure at 2 of the 5 Tier 2 centersmass store infrastructure at 2 of the 5 Tier 2 centers Assume existing HPSS or equivalent license, tape silo, robot

Augment with drives, media, mover nodes, and disk cache

Each site contributes 0.3-0.5 PB store AOD archival, MC ESD+AOD archival

Page 9: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 9

Organization

Facilities Subproject 2.3.2Facilities Subproject 2.3.2WBSNumber Description2.3.2 Distributed IT Infrastructure2.3.2.1 Specify ATLAS requirements2.3.2.2 Design and Model Grid Architecture2.3.2.3 Integration of Grid Software 2.3.2.4 Grid testbeds2.3.2.5 Wide Area Network Integration 2.3.2.6 Collaborative tools for Tier 2 2.3.2.7 Tier 2 Regional Center at Location A2.3.2.8 Tier 2 Regional Center at Location B2.3.2.9 Tier 2 Regional Center at Location C2.3.2.10 Tier 2 Regional Center at Location D2.3.2.11 Tier 2 Regional Center at Location E2.3.2.12 Tertiary Storage at Tier 2 Regional Centers

Page 10: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 10

Personnel

MANPOWER ESTIMATE SUMMARY IN FTEsMANPOWER ESTIMATE SUMMARY IN FTEs

WBSNo:WBSNo: 22 Funding Type:Funding Type: InfrastructureInfrastructure 11/13/00 8:08:38 PM11/13/00 8:08:38 PM

Description:Description: US ATLAS ComputingUS ATLAS Computing Institutions:Institutions:AllAll Funding Source :Funding Source : AllAll

FY 01FY 01 FY 02FY 02 FY 03FY 03 FY 04FY 04 FY 05FY 05 FY 06FY 06 TotalTotal

IT IIT I 1.01.0 4.04.0 6.06.0 10.010.0 10.010.0 7.07.0 38.038.0

IT IIIT II .0.0 1.01.0 2.02.0 2.02.0 5.05.0 5.05.0 15.015.0

PhysicistPhysicist 1.01.0 1.01.0 1.01.0 1.01.0 1.01.0 .0.0 5.05.0

TOTAL LABORTOTAL LABOR 2.02.0 6.06.0 9.09.0 13.013.0 16.016.0 12.012.0 58.058.0

Page 11: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 11

Tier 2 CostsWBS FY 01 FY 02 FY 03 FY 04 FY 05 FY06 TotalNumber Description (k$) (k$) (k$) (k$) (k$) (k$) (k$)

2.3.2.7 Tier 2 Regional Center at Location A 412 462 436 439 691 1007 34482.3.2.7.1 Tier 2 Facility Hardware 248 244 213 207 459 784 21552.3.2.7.2 Tier 2 Facility Software 140 186 186 186 186 186 10692.3.2.7.3 Tier 2 Facility Administration 24 33 37 46 46 37 224

2.3.2.8 Tier 2 regional center at location B 0 500 498 403 721 1008 31312.3.2.8.1 Tier 2 Facility Hardware 0 336 275 171 489 785 20572.3.2.8.2 Tier 2 Facility Software 0 140 186 186 186 186 8832.3.2.8.3 Tier 2 Facility Administration 0 24 37 46 46 37 191

2.3.2.9 Tier 2 regional center at Location C 0 0 0 315 745 1032 20932.3.2.9.1 Tier 2 Facility Hardware 0 0 0 89 411 698 11982.3.2.9.2 Tier 2 Facility Software 0 0 0 75 75 75 2242.3.2.9.3 Tier 2 Facility Administration 0 0 0 21 37 37 95

2.3.2.10 Tier 2 regional center at Location D 0 0 0 315 745 1032 20932.3.2.10.1 Tier 2 Facility Hardware 0 0 0 89 411 698 11982.3.2.10.2 Tier 2 Facility Software 0 0 0 75 75 75 2242.3.2.10.3 Tier 2 Facility Administration 0 0 0 21 37 37 95

2.3.2.11 Tier 2 regional center at Location E 0 0 0 315 745 1032 20932.3.2.11.1 Tier 2 Facility Hardware 0 0 0 89 411 698 11982.3.2.11.2 Tier 2 Facility Software 0 0 0 75 75 75 2242.3.2.11.3 Tier 2 Facility Administration 0 0 0 21 37 37 95

2.3.2.12 Tertiary Storage at Tier 2 Regional 0 0 258 343 387 757 17452.3.2.12.1 Tertiary Storage at Regional Center 0 0 258 151 176 401 9872.3.2.12.2 Tertiary Storage at Regional Center 0 0 0 192 210 355 758

14603

Page 12: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 12

Schedule

R&D Tier 2’s – FY ‘01 & FY ‘02R&D Tier 2’s – FY ‘01 & FY ‘02 Initial Development & Test, 1% to 2% scale

Start Grid testbed: ATLAS-GriPhyN

Data Challenges – FY ‘03 & FY ‘04Data Challenges – FY ‘03 & FY ‘04

Production Tier 2’s – FY ‘04 & FY ‘05Production Tier 2’s – FY ‘04 & FY ‘05

Operation – FY ‘05, FY ‘06 & beyondOperation – FY ‘05, FY ‘06 & beyond Full Scale System Operation, 20% (‘05) to 100% (‘06) (as for Tier

1)

Page 13: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 13

Calren Esnet, Abilene, Nton

Abilene

Esnet, Mren

UC BerkeleyLBNL-NERSC

Esnet

NPACI, Abilene

BrookhavenNationalLaboratory

Indiana University

Boston University

ArgonneNationalLaboratory

HPSS sites

U Michigan

University ofTexas atArlington

Testbed ‘01

Page 14: Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

November 14-17, 2000November 14-17, 2000Rob Gardner Tier 2 Plan for U.S. ATLASRob Gardner Tier 2 Plan for U.S. ATLAS 14

Fallback Issues

Impact of limited support for a planned distributed Impact of limited support for a planned distributed

infrastructure?infrastructure? Several scenarios of course are possible US ATLAS will face a serious shortfall in analysis capability

Shortfall in simulation capacity Analysis groups will have less autonomy

University groups likely augment their facilities through supplemental requests, and large scale proposals to establish multidisciplinary “centers” We could end up with 1 Tier 1 and 32 “Tier 2” centers An incoherent, messy infrastructure, difficult to manage Not the best way to optimize physics discovery