Post on 12-Jan-2016
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Towards a US (and LHC) Grid Towards a US (and LHC) Grid Environment for HENP ExperimentsEnvironment for HENP Experiments
CHEP 2000 Grid WorkshopCHEP 2000 Grid WorkshopHarvey B. Newman, CaltechHarvey B. Newman, Caltech
PadovaPadovaFebruary 12, 2000February 12, 2000
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Data Grid Hierarchy: Integration, Data Grid Hierarchy: Integration, Collaboration, Marshal resourcesCollaboration, Marshal resources
Tier2 Center ~1 TIPS
Online System
Offline Farm~20 TIPS
CERN Computer Center
Fermilab~4 TIPS
France Regional Center
Italy Regional Center
Germany Regional Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
~100 MBytes/sec
~2.4 Gbits/sec
100 - 1000 Mbits/sec
Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute has ~10 physicists working on one or more channels
Data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 3Tier 3
Tier 4Tier 4
1 TIPS = 25,000 SpecInt95
PC (today) = 10-15 SpecInt95
Tier2 Center ~1 TIPS
Tier 2Tier 2
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
To Solve: the LHC “Data Problem”To Solve: the LHC “Data Problem”
The proposed LHC computing and data handling will not support FREE The proposed LHC computing and data handling will not support FREE access, transport or processing for more than a small part of the dataaccess, transport or processing for more than a small part of the data
Balance between proximity to large computational and data Balance between proximity to large computational and data handling facilities, and proximity to end users and more handling facilities, and proximity to end users and more local resources for frequently-accessed datasets local resources for frequently-accessed datasets
Strategies must be studied and prototyped, to ensure both:Strategies must be studied and prototyped, to ensure both: acceptable turnaround times, and efficient resource utilisation acceptable turnaround times, and efficient resource utilisation
Problems to be Explored Problems to be Explored
How to meet demands of hundreds of users who How to meet demands of hundreds of users who needneed transparent transparent access to local and remote data, in disk caches and tape stores access to local and remote data, in disk caches and tape stores
Prioritise hundreds of requests of local and remote communities,Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies consistent with local and regional policies
Ensure that the system is dimensioned/used/managed Ensure that the system is dimensioned/used/managed
optimally, for the mixed workload optimally, for the mixed workload
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Regional Center ArchitectureRegional Center Architecture Example by I. Gaines (MONARC) Example by I. Gaines (MONARC)
Tapes
Network from CERN
Networkfrom Tier 2& simulation centers
Tape Mass Storage & Disk Servers
Database Servers
PhysicsSoftware
Development
R&D Systemsand Testbeds
Info serversCode servers
Web ServersTelepresence
Servers
TrainingConsultingHelp Desk
ProductionReconstruction
Raw/Sim ESD
Scheduled, predictable
experiment/physics groups
ProductionAnalysis
ESD AODAOD DPD
Scheduled
Physics groups
Individual Analysis
AOD DPDand plots
Chaotic
PhysicistsDesktops
Tier 2
Local institutes
CERN
Tapes
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grid Services Architecture [*]:Grid Services Architecture [*]:
GridGridFabricFabric
GridGridServicesServices
ApplnApplnToolkitsToolkits
ApplnsApplns
Networks, data stores, computers, display devices, Networks, data stores, computers, display devices, etc.; associated local services (local implementations) etc.; associated local services (local implementations)
Protocols, authentication, policy, resource Protocols, authentication, policy, resource management, instrumentation, data discovery, etc.management, instrumentation, data discovery, etc.
......RemoteRemote
vizviztoolkittoolkit
RemoteRemotecomp.comp.toolkittoolkit
RemoteRemotedatadata
toolkittoolkit
RemoteRemotesensorssensorstoolkittoolkit
RemoteRemotecollab.collab.toolkittoolkit
HEP Data-Analysis HEP Data-Analysis Related ApplicationsRelated Applications
[*] Adapted from Ian Foster[*] Adapted from Ian Foster
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grid Hierarchy Goals: Better Grid Hierarchy Goals: Better Resource Use Resource Use andand Faster Turnaround Faster Turnaround
““Grid” integration and (de facto standard) common services to ease Grid” integration and (de facto standard) common services to ease
development, operation, management and security development, operation, management and security
Efficient resource use and improved responsiveness through:Efficient resource use and improved responsiveness through: Treatment of the ensemble of site and network resourcesTreatment of the ensemble of site and network resources
as an integrated (loosely coupled) systemas an integrated (loosely coupled) system Resource discovery, query estimation (redirection), Resource discovery, query estimation (redirection),
co-scheduling, prioritization, local and global co-scheduling, prioritization, local and global
allocationsallocations Network and site “instrumentation”: performance Network and site “instrumentation”: performance
tracking, monitoring, forward-prediction, problem tracking, monitoring, forward-prediction, problem
trapping and handlingtrapping and handling
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
GriPhyN: First Production Scale GriPhyN: First Production Scale “Grid Physics Network”“Grid Physics Network”
Develop a New Integrated Distributed System, Develop a New Integrated Distributed System, while Meeting Primary Goals of the US LIGO, SDSS while Meeting Primary Goals of the US LIGO, SDSS
and LHC Programsand LHC Programs Unified GRID System Concept; Hierarchical StructureUnified GRID System Concept; Hierarchical Structure ~Twenty Centers; with Three Sub-Implementations ~Twenty Centers; with Three Sub-Implementations
5-6 Each in US for LIGO, CMS, ATLAS; 2-3 for SDSS5-6 Each in US for LIGO, CMS, ATLAS; 2-3 for SDSS Emphasis on Training, Mentoring and Remote CollaborationEmphasis on Training, Mentoring and Remote Collaboration
Focus on LIGO, SDSS (+BaBar and Run2) handling of real Focus on LIGO, SDSS (+BaBar and Run2) handling of real data, and LHC Mock Data Challenges with simulated datadata, and LHC Mock Data Challenges with simulated data
Making the Process of Discovery Making the Process of Discovery Accessible to Students WorldwideAccessible to Students Worldwide
GriPhyN Web Site: http://www.phys.ufl.edu/~avery/mre/GriPhyN Web Site: http://www.phys.ufl.edu/~avery/mre/
White Paper: http://www.phys.ufl.edu/~avery/mre/white_paper.htmlWhite Paper: http://www.phys.ufl.edu/~avery/mre/white_paper.html
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grid Development IssuesGrid Development Issues
Integration of applications with Grid MiddlewareIntegration of applications with Grid Middleware Performance-oriented user application software architecturePerformance-oriented user application software architecture
is required, to deal with the realities of data access and deliveryis required, to deal with the realities of data access and delivery Application frameworks must work with system state and policy Application frameworks must work with system state and policy
information (“instructions”) from the Gridinformation (“instructions”) from the Grid O(R)DBMS’s must be extended to work across networksO(R)DBMS’s must be extended to work across networks
E.g. “Invisible” (to the DBMS) data transport, and catalog updateE.g. “Invisible” (to the DBMS) data transport, and catalog update Interfacility cooperation at a new level, across Interfacility cooperation at a new level, across
world regionsworld regions Agreement on choice and implementation of standard Grid Agreement on choice and implementation of standard Grid
components, services, security and authenticationcomponents, services, security and authentication Interface the common services locally to match with Interface the common services locally to match with
heterogeneous resources, performance levels, and local heterogeneous resources, performance levels, and local operational requirementsoperational requirements
Accounting and “exchange of value” software to enable Accounting and “exchange of value” software to enable cooperationcooperation
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
RD45, RD45, GIOD:GIOD: Networked Object DatabasesNetworked Object Databases Clipper/GC;Clipper/GC; High speed access to Objects or File data High speed access to Objects or File data
FNAL/SAMFNAL/SAM for processing and analysis for processing and analysis SLAC/OOFS Distributed File System + Objectivity Interface SLAC/OOFS Distributed File System + Objectivity Interface NILE, Condor:NILE, Condor: Fault Tolerant Distributed Computing with Fault Tolerant Distributed Computing with
Heterogeneous CPU ResourcesHeterogeneous CPU Resources
MONARC:MONARC: LHC Computing Models: LHC Computing Models: Architecture, Simulation, Strategy, PoliticsArchitecture, Simulation, Strategy, Politics
PPDG:PPDG: First Distributed Data Services and First Distributed Data Services and Data Grid System Prototype Data Grid System Prototype
ALDAP:ALDAP: OO Database Structures and Access Methods OO Database Structures and Access Methods for Astrophysics and HENP Datafor Astrophysics and HENP Data
GriPhyN: GriPhyN: Production-Scale Data GridProduction-Scale Data Grid Simulation/Modeling, Application + Network Simulation/Modeling, Application + Network
Instrumentation, System Optimization/EvaluationInstrumentation, System Optimization/Evaluation APOGEEAPOGEE
Roles of ProjectsRoles of Projectsfor HENP Distributed Analysisfor HENP Distributed Analysis
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Other ODBMS tests Other ODBMS tests
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
1 4 0 0 0
1 6 0 0 0
1 8 0 0 0
2 0 0 0 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0
U p d a t e N u m b e r ( T i m e o f D a y )
mil
ise
co
nd
s
c r e a t e L A Nc r e a t e W A Nc o m m i t L A Nc o m m i t W A N
S a t u r a t e d h o u r s ~ 1 0 k b i t s / s e c o n d U n s a t u r a t e d ~ 1 M b i t s / s e c o n d
DRO WAN Tests with CERN
Production on CERN’s PCSF
and file movement to
Caltech
Objectivity/DB Creation of 32000
database federation
Tests with Versant(fallback ODBMS)
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
The China Clipper Project:The China Clipper Project:A Data Intensive GridA Data Intensive Grid
China Clipper GoalChina Clipper GoalDevelop and demonstrate middleware allowing Develop and demonstrate middleware allowing
applications transparent, high-speed access to large applications transparent, high-speed access to large data sets distributed over wide-area networksdata sets distributed over wide-area networks..
Builds on expertise and assets at ANL, LBNL & SLACBuilds on expertise and assets at ANL, LBNL & SLAC NERSC, ESnetNERSC, ESnet
Builds on Globus Middleware and high-performance Builds on Globus Middleware and high-performance distributed storage system distributed storage system (DPSS from LBNL)(DPSS from LBNL) Initial focus on large DOE HENP applicationsInitial focus on large DOE HENP applications
RHIC/STAR, BaBarRHIC/STAR, BaBar Demonstrated data rates to 57 Mbytes/sec.Demonstrated data rates to 57 Mbytes/sec.
ANL-SLAC-Berkeley
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grand Challenge ArchitectureGrand Challenge Architecture
An order-optimized prefetch architecture for data retrieval An order-optimized prefetch architecture for data retrieval from multilevel storage in a multiuser environmentfrom multilevel storage in a multiuser environment
Queries select events and specific event components based Queries select events and specific event components based upon tag attribute rangesupon tag attribute ranges Query estimates are provided prior to executionQuery estimates are provided prior to execution Queries are monitored for progress, multi-useQueries are monitored for progress, multi-use
Because event components are distributed over several files, Because event components are distributed over several files, processing an event requires delivery of a “bundle” of files processing an event requires delivery of a “bundle” of files
Events are delivered in an order that takes advantage of what is Events are delivered in an order that takes advantage of what is already on disk, and multiuser policy-based prefetching of already on disk, and multiuser policy-based prefetching of further data from tertiary storagefurther data from tertiary storage
GCA intercomponent communication is CORBA-based, but GCA intercomponent communication is CORBA-based, but physicists are shielded from this layerphysicists are shielded from this layer
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
GCA System OverviewGCA System Overview
Client
GCASTACS
Stagedeventfiles
EventTags
(Other)disk-resident
event data
Index
HPSSpftp
File Catalog
ClientClient
Client
Client
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
STorage Access Coordination System STorage Access Coordination System (STACS)(STACS)
QueryEstimator
QueryMonitor
CacheManager
PolicyModule
Bit-SlicedIndex
FileCatalog
Query Status,Cache Map
Query
Estimate
File Bundles,Event lists
Pftp andfile purgecommands
List of file bundles and events
Requests for file caching and purging
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
The Particle Physics Data Grid (PPDG)The Particle Physics Data Grid (PPDG)
First Year Goal: First Year Goal: Optimized cached read access to 1-10 Gbytes, Optimized cached read access to 1-10 Gbytes, drawn from a total data set of order One Petabytedrawn from a total data set of order One Petabyte
PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,
CPU, Disk, CPU, Disk, Tape RobotTape Robot
SECONDARY SITESECONDARY SITECPU, Disk, CPU, Disk, Tape RobotTape Robot
Site to Site Data Replication Service
100 Mbytes/sec
ANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS
Multi-Site Cached File Access Service
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsersPRIMARY SITEPRIMARY SITE
DAQ, Tape, DAQ, Tape, CPU, CPU,
Disk, RobotDisk, Robot
Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
The Particle Physics Data Grid The Particle Physics Data Grid (PPDG)(PPDG)
The ability to query and partially retrieve hundreds of terabytes The ability to query and partially retrieve hundreds of terabytes across Wide Area Networks within seconds, across Wide Area Networks within seconds,
PPDG uses advanced services in three areas:PPDG uses advanced services in three areas: Distributed caching:Distributed caching: to allow for rapid data delivery in response to allow for rapid data delivery in response
to multiple requests to multiple requests Matchmaking and Request/Resource co-scheduling:Matchmaking and Request/Resource co-scheduling:
to manage workflow and use computing and net resources to manage workflow and use computing and net resources efficiently; to achieve high throughputefficiently; to achieve high throughput
Differentiated Services:Differentiated Services: to allow particle-physics bulk data transport to allow particle-physics bulk data transport to coexist with interactive and real-time remote collaboration to coexist with interactive and real-time remote collaboration sessions, and other network traffic.sessions, and other network traffic.
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
PPDG: Architecture for Reliable High PPDG: Architecture for Reliable High Speed Data DeliverySpeed Data Delivery
Object-based andObject-based andFile-based Application File-based Application
ServicesServices
Cache ManagerCache Manager
File AccessFile AccessServiceService
Matchmaking Matchmaking ServiceService
Cost EstimationCost Estimation
File FetchingFile FetchingServiceService
File Replication File Replication IndexIndex
End-to-End End-to-End Network ServicesNetwork Services
Mass Storage Mass Storage ManagerManager
Resource Resource ManagementManagement
File MoverFile Mover
File MoverFile Mover
Site BoundarySite Boundary Security DomainSecurity Domain
+ Future+ FutureFile and Object Export;File and Object Export;
Cache & State Tracking;Cache & State Tracking;Forward PredictionForward Prediction
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
First Year PPDG “System” First Year PPDG “System” ComponentsComponents
Middleware Components (Initial Choice): See PPDG ProposalMiddleware Components (Initial Choice): See PPDG Proposal Object and File-Based Object and File-Based Objectivity/DB (SLAC enhanced)Objectivity/DB (SLAC enhanced)
Application Services Application Services GC Query Object, Event Iterator,GC Query Object, Event Iterator, Query Monitor Query Monitor
FNAL SAM SystemFNAL SAM System Resource ManagementResource Management Start with Human InterventionStart with Human Intervention
(but begin to deploy resource discovery & mgmnt tools: Condor, SRB)(but begin to deploy resource discovery & mgmnt tools: Condor, SRB) File Access Service File Access Service Components of OOFS (SLAC)Components of OOFS (SLAC) Cache ManagerCache Manager GC Cache Manager (LBNL)GC Cache Manager (LBNL)
Mass Storage ManagerMass Storage Manager HPSS,HPSS, Enstore,Enstore, OSM (Site-dependent) OSM (Site-dependent) Matchmaking Service Matchmaking Service Condor (U. Wisconsin)Condor (U. Wisconsin) File Replication Index File Replication Index MCAT (SDSC)MCAT (SDSC) Transfer Cost Estimation ServiceTransfer Cost Estimation Service Globus (ANL)Globus (ANL)
File Fetching ServiceFile Fetching Service Components of OOFSComponents of OOFS File Movers(s) File Movers(s) SRB (SDSC)SRB (SDSC) ; Site specific ; Site specific End-to-end Network ServicesEnd-to-end Network Services Globus tools for QoS reservationGlobus tools for QoS reservation
Security and authenticationSecurity and authentication Globus (ANL) Globus (ANL)
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
CONDOR MatchmakingCONDOR Matchmaking A Resource Allocation Paradigm A Resource Allocation Paradigm
Parties use ClassAds to Parties use ClassAds to advertise properties, advertise properties, requirements and ranking requirements and ranking to a matchmakerto a matchmaker
ClassAds are Self-ClassAds are Self-describing (no separate describing (no separate schema)schema)
ClassAds combine query ClassAds combine query and dataand data
http://www.cs.wisc.edu/condorhttp://www.cs.wisc.edu/condor
ResourceLocal Resource Management
Owner AgentEnvironment Agent
Customer AgentApplication Agent
Application
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Request
Queue
OwnerAgent
ExecutionAgent
ApplicationProcess
CustomerAgent
ApplicationProcess
ApplicationAgent
Data &ObjectFiles
CkptFiles
ObjectFiles
RemoteI/O &Ckpt
ObjectFiles
Submission Execution
Remote Execution in CondorAgents for Remote Execution in CONDOR
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Mobile AgentsMobile Agents Execute AsynchronouslyExecute Asynchronously Reduce Network Load: Local ConversationsReduce Network Load: Local Conversations Overcome Network Latency; Some OutagesOvercome Network Latency; Some Outages Adaptive Adaptive Robust, Fault Tolerant Robust, Fault Tolerant Naturally Heterogeneous Naturally Heterogeneous Extensible Concept: Extensible Concept: Agent HierarchiesAgent Hierarchies
Beyond Traditional Architectures:Beyond Traditional Architectures:Mobile Agents (Java Aglets)Mobile Agents (Java Aglets)
““Agents are objects with rules and legs” -- D. TaylorAgents are objects with rules and legs” -- D. Taylor
Application
Se
rvic
e
Ag
entAgent
Ag
ent A
gen
tA
gen
t
Ag
ent
Ag
ent
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Using the Globus ToolsUsing the Globus Tools
Tests with “gsiftp”, a modified ftp Tests with “gsiftp”, a modified ftp server/client that allows control server/client that allows control of the TCP buffer sizeof the TCP buffer size
Transfers of Objy database files Transfers of Objy database files from the Exemplar tofrom the Exemplar to ItselfItself An O2K at Argonne (via An O2K at Argonne (via
CalREN2 and Abilene)CalREN2 and Abilene) A Linux machine at INFN (via A Linux machine at INFN (via
US-CERN Transatlantic link)US-CERN Transatlantic link) Target /dev/null in multiple Target /dev/null in multiple
streams (1 to 16 parallel gsiftp streams (1 to 16 parallel gsiftp sessions). sessions).
Aggregate throughput as a Aggregate throughput as a function of number of streams function of number of streams and send/receive buffer sizesand send/receive buffer sizes
gsiftp rate as a function of Buffer Size (single stream over HiPPI)
0
5000
10000
15000
20000
25000
30000
0 500 1000 1500 2000 2500
Buffer Size (kBytes)
Ra
te (
kB
ytes
/sec
on
d)
gsiftp rate as a function of Buffer Size (single stream to Argonne)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 500 1000 1500 2000 2500
Buffer Size (kBytes)
Ra
te (
kB
yte
s/s
ec
on
d)
gsiftp Aggregate rate to Argonne as function of the number of parallel streams
0
500
1000
1500
2000
2500
0 2 4 6 8 10 12 14 16 18
Number of parallel streams
Rat
e (k
Byt
es/s
ec)
~25 MB/sec on HiPPI loop-back
~4MB/sec to Argonne by tuning TCP window size
Saturating available B/W to
Argonne
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Distributed Data Delivery and Distributed Data Delivery and LHC Software ArchitectureLHC Software Architecture
Software Architectural ChoicesSoftware Architectural Choices
Traditional, single-threaded applicationsTraditional, single-threaded applications Wait Wait for data location, arrival and reassembly for data location, arrival and reassembly
OROR
Performance-Oriented (Complex)Performance-Oriented (Complex) I/O requests up-front; multi-threaded; data driven;I/O requests up-front; multi-threaded; data driven;
respond to ensemble of (changing) cost estimates respond to ensemble of (changing) cost estimates Possible code movement as well as data movementPossible code movement as well as data movement Loosely coupled, dynamicLoosely coupled, dynamic
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
GriPhyN FoundationGriPhyN Foundation
Build on the Distributed System Results of the GIOD, Build on the Distributed System Results of the GIOD, MONARC, NILE, Clipper/GC and PPDG ProjectsMONARC, NILE, Clipper/GC and PPDG Projects
Long Term Vision in Three PhasesLong Term Vision in Three Phases 1. 1. Read/write access to high volume data and processing powerRead/write access to high volume data and processing power
Condor/Globus/SRB + NetLogger components to manage jobs Condor/Globus/SRB + NetLogger components to manage jobs and resources and resources
2. 2. WAN-distributed data-intensive Grid computing system WAN-distributed data-intensive Grid computing system Tasks move automatically to the “most effective” Node in the GridTasks move automatically to the “most effective” Node in the Grid Scalable implementation using mobile agent technologyScalable implementation using mobile agent technology
3. 3. “Virtual Data” concept for multi-PB distributed data management,“Virtual Data” concept for multi-PB distributed data management, with large-scale Agent Hierarchies with large-scale Agent Hierarchies
Transparently match data to sites, manage data replication or Transparently match data to sites, manage data replication or transport, co-schedule data & compute resourcestransport, co-schedule data & compute resources
Build on VRVS Developments for Remote CollaborationBuild on VRVS Developments for Remote Collaboration
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
INSTRUMENTATION, SIMULATION, OPTIMIZATION, COORDINATIONINSTRUMENTATION, SIMULATION, OPTIMIZATION, COORDINATION
SIMULATION of a Production-Scale Grid HierarchySIMULATION of a Production-Scale Grid Hierarchy Provide a Toolset for HENP experiments to test and optimize Provide a Toolset for HENP experiments to test and optimize
their data analysis and resource usage strategiestheir data analysis and resource usage strategies
INSTRUMENTATION of Grid PrototypesINSTRUMENTATION of Grid Prototypes Characterize the Grid components’ performance under loadCharacterize the Grid components’ performance under load Validate the SimulationValidate the Simulation Monitor, Track and Report system state, trends and “Events”Monitor, Track and Report system state, trends and “Events”
OPTIMIZATION of the Data GridOPTIMIZATION of the Data Grid Genetic algorithms, or other evolutionary methodsGenetic algorithms, or other evolutionary methods Deliver optimization package for HENP distributed systemsDeliver optimization package for HENP distributed systems Applications to other experiments; accelerator and other Applications to other experiments; accelerator and other
control systems; other fieldscontrol systems; other fields
COORDINATE with Experiment-Specific Projects: COORDINATE with Experiment-Specific Projects: CMS, ATLAS, CMS, ATLAS, BaBar, Run2BaBar, Run2
GriPhyN/APOGEE: Production-Design GriPhyN/APOGEE: Production-Design of a Data Analysis Gridof a Data Analysis Grid
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grid (IT) Issues to be AddressedGrid (IT) Issues to be Addressed
Dataset compaction; data caching and mirroring strategies Dataset compaction; data caching and mirroring strategies Using large time-quanta or very high bandwidth bursts, Using large time-quanta or very high bandwidth bursts,
for large data transactionsfor large data transactions Query estimators, Query Monitors (cf. GCA work)Query estimators, Query Monitors (cf. GCA work)
Enable flexible, resilient prioritisation schemes (marginal utility)Enable flexible, resilient prioritisation schemes (marginal utility) Query redirection, fragmentation, priority alteration, etc.Query redirection, fragmentation, priority alteration, etc.
Pre-Emptive and realtime data/resource matchmakingPre-Emptive and realtime data/resource matchmaking Resource discoveryResource discovery
Data and CPU Location BrokersData and CPU Location Brokers Co-scheduling and queueing processesCo-scheduling and queueing processes
State, workflow, & performance-monitoring State, workflow, & performance-monitoring instrumentation; tracking and forward predictioninstrumentation; tracking and forward prediction
Security: Authentication (for resource allocation/usage Security: Authentication (for resource allocation/usage and and priority); running a certificate authoritypriority); running a certificate authority
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
CMS Example: Data Grid CMS Example: Data Grid Program of Work (I)Program of Work (I)
FY 2000FY 2000 Build basic services; “1 Million event” samples on proto-Tier2’sBuild basic services; “1 Million event” samples on proto-Tier2’s
For HLT milestones and detector/physics studies with ORCAFor HLT milestones and detector/physics studies with ORCA MONARC Phase 3 simulations for study/optimization MONARC Phase 3 simulations for study/optimization
FY 2001 FY 2001 Set up initial Grid system based on PPDG deliverablesSet up initial Grid system based on PPDG deliverables
at the first Tier2 centers and Tier1-prototype centers at the first Tier2 centers and Tier1-prototype centers High speed site-to-site file replication serviceHigh speed site-to-site file replication service Multi-site cached file accessMulti-site cached file access
CMS Data Challenges in support of DAQ TDRCMS Data Challenges in support of DAQ TDR Shakedown of preliminary PPDG (+ MONARC and GIOD) Shakedown of preliminary PPDG (+ MONARC and GIOD)
system strategies and tools system strategies and tools FY 2002FY 2002
Deploy Grid system at the second set of Tier2 centersDeploy Grid system at the second set of Tier2 centers CMS Data Challenges for Software and Computing TDR CMS Data Challenges for Software and Computing TDR
and Physics TDR and Physics TDR
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Data Analysis Grid Program of Data Analysis Grid Program of Work (II)Work (II)
FY 2003 FY 2003 Deploy Tier2 centers at last set of sitesDeploy Tier2 centers at last set of sites 5%-Scale Data Challenge in Support of Physics TDR5%-Scale Data Challenge in Support of Physics TDR Production-prototype test of Grid Hierarchy System,Production-prototype test of Grid Hierarchy System,
with first elements of the production Tier1 Center with first elements of the production Tier1 Center
FY 2004 FY 2004 20% Production (Online and Offline) CMS Mock Data Challenge, 20% Production (Online and Offline) CMS Mock Data Challenge,
with all Tier2 Centers, and partly completed Tier1 Centerwith all Tier2 Centers, and partly completed Tier1 Center Build Production-quality Grid SystemBuild Production-quality Grid System
FY 2005 (Q1 - Q2) FY 2005 (Q1 - Q2) Final Production CMS (Online and Offline) ShakedownFinal Production CMS (Online and Offline) Shakedown Full distributed system software and instrumentationFull distributed system software and instrumentation Using full capabilities of the Tier2 and Tier1 CentersUsing full capabilities of the Tier2 and Tier1 Centers
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
SummarySummary The HENP/LHC data handling problem The HENP/LHC data handling problem
Multi-Petabyte scale, binary pre-filtered data, resources Multi-Petabyte scale, binary pre-filtered data, resources distributed worldwidedistributed worldwide
Has no analog now, but will be increasingly Has no analog now, but will be increasingly prevalent in research, and industry by ~2005. prevalent in research, and industry by ~2005.
Development of a robust PB-scale networked data Development of a robust PB-scale networked data access and analysis system is mission-critical access and analysis system is mission-critical
An effective partnership exists, HENP-wide, through An effective partnership exists, HENP-wide, through many R&D projects - many R&D projects -
RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR, ALDAP, PPDG, ...RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR, ALDAP, PPDG, ... An aggressive R&D program is required to developAn aggressive R&D program is required to develop
Resilient “self-aware” systems, for data access, Resilient “self-aware” systems, for data access, processing and analysis across a hierarchy of networksprocessing and analysis across a hierarchy of networks
Solutions that could be widely applicable to data problems Solutions that could be widely applicable to data problems in other scientific fields and industry, by LHC startupin other scientific fields and industry, by LHC startup
Focus on Data Grids for Next Generation PhysicsFocus on Data Grids for Next Generation Physics
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
LHC Data Models: 1994-2000LHC Data Models: 1994-2000
HEP data models are complex!HEP data models are complex! Rich hierarchy of hundreds of Rich hierarchy of hundreds of
complex data types (classes)complex data types (classes) Many relations between themMany relations between them Different access patterns Different access patterns
(Multiple Viewpoints)(Multiple Viewpoints)
OO technologyOO technology OO applications deal with networks OO applications deal with networks
of objects (and containers)of objects (and containers) Pointers (or references) are Pointers (or references) are
used to describe relations used to describe relations
Existing solutions do not scaleExisting solutions do not scale Solution suggested by RD45: Solution suggested by RD45:
ODBMS coupled to a Mass ODBMS coupled to a Mass Storage SystemStorage System
Construction of “Compact” Datasets for Analysis:Construction of “Compact” Datasets for Analysis:Rapid Access/Navigation/TransportRapid Access/Navigation/Transport
EventEvent
TrackListTrackList
TrackerTracker CalorimeterCalorimeter
TrackTrackTrackTrack
TrackTrackTrackTrackTrackTrack
HitListHitList
HitHitHitHitHitHitHitHitHitHit
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Web-Based Server-Farm Networks Circa 2000Web-Based Server-Farm Networks Circa 2000Dynamic (Grid-Like) Content Delivery EnginesDynamic (Grid-Like) Content Delivery Engines
Akamai,Akamai, Adero, Sandpiper Adero, Sandpiper 11200 200 Thousands Thousands of Network-Resident Serversof Network-Resident Servers
25 25 60 ISP Networks 60 ISP Networks 25 25 30 Countries 30 Countries 40+ Corporate Customers40+ Corporate Customers $ 25 B Capitalization$ 25 B Capitalization
Resource DiscoveryResource Discovery Build “Weathermap” of Server Network (State Tracking)Build “Weathermap” of Server Network (State Tracking) Query Estimation; Matchmaking/Optimization; Query Estimation; Matchmaking/Optimization;
Request rerouting Request rerouting Virtual IP Addressing: One address per server-farmVirtual IP Addressing: One address per server-farm
Mirroring, CachingMirroring, Caching (1200) Autonomous-Agent Implementation(1200) Autonomous-Agent Implementation
Content Delivery Networks (CDN)Content Delivery Networks (CDN)
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Strawman Tier 2 EvolutionStrawman Tier 2 Evolution
20002000 20052005
Linux Farm:Linux Farm: 1,200 SI95 20,000 SI95 1,200 SI95 20,000 SI95** Disks on CPUsDisks on CPUs 4 TB 4 TB 50 TB50 TB RAID ArrayRAID Array 1 TB 1 TB 30 TB30 TB Tape LibraryTape Library 1-2 TB 50-100 TB 1-2 TB 50-100 TB LAN Speed 0.1 - 1 Gbps 10-100 GbpsLAN Speed 0.1 - 1 Gbps 10-100 Gbps WAN Speed 155 - 622 Mbps 2.5 - 10 GbpsWAN Speed 155 - 622 Mbps 2.5 - 10 Gbps Collaborative MPEG2 VGA Realtime HDTVCollaborative MPEG2 VGA Realtime HDTV
Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)
[*][*] Reflects lower Tier 2 component costs due to less demanding Reflects lower Tier 2 component costs due to less demanding usage. Some of the CPU will be used for simulation. usage. Some of the CPU will be used for simulation.
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
USCMS S&C Spending profileUSCMS S&C Spending profile
US CMS Software and Computing Project
CAS: People
UF: People
UF: Hardware
Tier2: People
Tier2: Hardware
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
FY1999 FY2000 FY2001 FY2002 FY2003 FY2004 FY2005Fiscal Year
$[M]
Operations / Year
2006 is a model year for the operations phase of CMS
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
GriPhyN CostGriPhyN Cost
System supportSystem support $ 8.0 M$ 8.0 M R&DR&D $ 15.0 $ 15.0
MM SoftwareSoftware $ 2.0 M$ 2.0 M Tier 2 networkingTier 2 networking $ 10.0 M$ 10.0 M Tier 2 hardwareTier 2 hardware $ 50.0 M$ 50.0 M TotalTotal $ 85.0 $ 85.0
MM
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Grid Hierarchy Concept:Grid Hierarchy Concept:Broader AdvantagesBroader Advantages
Partitioning of users into Partitioning of users into “proximate”“proximate” communities communitiesinto for support, troubleshooting, mentoringinto for support, troubleshooting, mentoring
Partitioning of facility tasks, to manage and focus Partitioning of facility tasks, to manage and focus resourcesresources
Greater flexibility to pursue different physics interests, Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by regionpriorities, and resource allocation strategies by region
Lower tiers of the hierarchy Lower tiers of the hierarchy More local control More local control
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Storage Request Brokers (SRB)Storage Request Brokers (SRB)
Name Transparency: Access to data by attributes Name Transparency: Access to data by attributes stored in an RDBMS (MCAT).stored in an RDBMS (MCAT).
Location Transparency: Logical collections (by Location Transparency: Logical collections (by attributes) spanning multiple physical resources.attributes) spanning multiple physical resources.
Combined Location and Name Transparency meansCombined Location and Name Transparency meansthat datasets can be replicated across multiple caches that datasets can be replicated across multiple caches and data archives (PPDG).and data archives (PPDG).
Data Management Protocol Transparency: SRB with Data Management Protocol Transparency: SRB with custom-built drivers in front of each storage systemcustom-built drivers in front of each storage system
User does not need to know how the data is accessed;User does not need to know how the data is accessed;SRB deals with local file system managersSRB deals with local file system managers
SRBs (agents) authenticate themselves and users, SRBs (agents) authenticate themselves and users, using Grid Security Infrastructure (GSI)using Grid Security Infrastructure (GSI)
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Role of SimulationRole of Simulationfor Distributed Systemsfor Distributed Systems
Simulations are widely recognized and used as essential toolsSimulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation for the design, performance evaluation and optimisation
of complex distributed systemsof complex distributed systems From battlefields to agriculture; from the factory floor to From battlefields to agriculture; from the factory floor to
telecommunications systemstelecommunications systems Discrete event simulations with an appropriate and Discrete event simulations with an appropriate and
high level of abstraction high level of abstraction Just beginning to be part of the HEP cultureJust beginning to be part of the HEP culture
Some experience in trigger, DAQ and tightly coupledSome experience in trigger, DAQ and tightly coupledcomputing systems: CERN CS2 models (Event-oriented)computing systems: CERN CS2 models (Event-oriented)
MONARC (Process-Oriented; Java 2 Threads + Class Lib) MONARC (Process-Oriented; Java 2 Threads + Class Lib) These simulations are very different from HEP “Monte Carlos”These simulations are very different from HEP “Monte Carlos”
““Time” intervals and interrupts are the essentialsTime” intervals and interrupts are the essentials
Simulation is a vital part of the study of site architectures,Simulation is a vital part of the study of site architectures, network behavior, data access/processing/delivery strategies, network behavior, data access/processing/delivery strategies,
for HENP Grid Design and Optimization for HENP Grid Design and Optimization
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)
Monitoring Architecture:Monitoring Architecture:Use of NetLogger in CLIPPERUse of NetLogger in CLIPPER
End-to-end monitoring of End-to-end monitoring of grid assets is necessary togrid assets is necessary to Resolve network Resolve network
throughput problemsthroughput problems Dynamically schedule Dynamically schedule
resourcesresources
Add precision-timed event Add precision-timed event monitor agents to:monitor agents to:
ATM switches ATM switches Storage serversStorage servers Testbed computational Testbed computational
resourcesresources
Produce trend analysis Produce trend analysis modules for monitor agentsmodules for monitor agents
Make results available to Make results available to applicationsapplications