DATA GRIDS for Science and Engineering Worldwide Analysis at Regional Centers Harvey B. Newman...
-
Upload
cecil-bryant -
Category
Documents
-
view
222 -
download
1
Transcript of DATA GRIDS for Science and Engineering Worldwide Analysis at Regional Centers Harvey B. Newman...
DATA GRIDS for DATA GRIDS for Science and EngineeringScience and Engineering
Worldwide Analysis at Regional CentersWorldwide Analysis at Regional Centers Harvey B. NewmanHarvey B. Newman
Professor of Physics, CaltechProfessor of Physics, Caltech
Islamabad, August 21, 2000Islamabad, August 21, 2000
LHC Vision: Data Grid HierarchyLHC Vision: Data Grid Hierarchy
Tier 1
Tier2 Center
Online System
Offline Farm,CERN Computer
Ctr > 20 TIPS
FranceCentre
FNAL Center Italy Center UK Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
~2.5 Gbits/sec
100 - 1000
Mbits/sec
Bunch crossing per 25 nsecs; 100 triggers per second. Event is ~1 MByte in size
Physicists work on analysis “channels”
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PByte/sec
~0.6-2.5 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~622 Mbits/sec
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
On-demand creation of powerfulOn-demand creation of powerfulvirtual computing and data systemsvirtual computing and data systems
Grid: Flexible, high-performance access to all significant resources
Sensor nets
http://
http://
WebWeb: Uniform : Uniform access to access to HTML HTML documentsdocuments
Data Stores
Computers
Softwarecatalogs
Colleagues
Grids: Next Generation WebGrids: Next Generation Web
RD45, RD45, GIODGIOD Networked Object DatabasesNetworked Object Databases Clipper/GC Clipper/GC High speed access, processing and analysis High speed access, processing and analysis FNAL/SAM FNAL/SAM of files and object dataof files and object data SLAC/OOFS SLAC/OOFS Distributed File System + Objectivity Interface Distributed File System + Objectivity Interface NILE, CondorNILE, Condor Fault Tolerant Distributed ComputingFault Tolerant Distributed Computing
MONARCMONARC LHC Computing Models: LHC Computing Models: Architecture, Simulation, StrategyArchitecture, Simulation, Strategy
PPDGPPDG First Distributed Data Services and First Distributed Data Services and Data Grid System PrototypeData Grid System Prototype
ALDAPALDAP OO Database Structures & Access OO Database Structures & Access Methods for Astrophysics and HENP DataMethods for Astrophysics and HENP Data
GriPhyN GriPhyN Production-Scale Data Grids Production-Scale Data Grids EU Data GridEU Data Grid
Roles of ProjectsRoles of Projectsfor HENP Distributed Analysisfor HENP Distributed Analysis
Grid Services Architecture [*]Grid Services Architecture [*]
GridGridFabricFabric
GridGridServicesServices
ApplnApplnToolkitsToolkits
ApplnsApplns
Data stores, networks, computers, display Data stores, networks, computers, display devices,… ; associated local servicesdevices,… ; associated local services
Protocols, authentication, policy, resource Protocols, authentication, policy, resource discovery & management, instrumentation,... discovery & management, instrumentation,...
......RemotRemot
eevizviz
toolkittoolkit
RemotRemotee
comp.comp.toolkittoolkit
RemotRemotee
datadatatoolkittoolkit
RemotRemotee
sensorssensorstoolkittoolkit
RemotRemotee
collab.collab.toolkittoolkit
A Rich Set of HEP Data-Analysis A Rich Set of HEP Data-Analysis Related ApplicationsRelated Applications
[*] [*] Adapted from Ian Foster: there are computing grids, Adapted from Ian Foster: there are computing grids, access (collaborative) grids, data grids, ...access (collaborative) grids, data grids, ...
The Grid MiddlewareThe Grid MiddlewareServices ConceptServices Concept
Standard services thatStandard services that
Provide uniform, high-level access to a wide Provide uniform, high-level access to a wide range of resources (including networks)range of resources (including networks)
Address interdomain issues: security, policyAddress interdomain issues: security, policy
Permit application-level management and Permit application-level management and monitoring of end-to-end performancemonitoring of end-to-end performance
Broadly deployed, like Internet ProtocolsBroadly deployed, like Internet Protocols
Enabler of application-specific tools as well Enabler of application-specific tools as well as applications themselvesas applications themselves
Application Example:Application Example:Condor Numerical OptimizationCondor Numerical Optimization
Exact solution of “nug30” quadratic Exact solution of “nug30” quadratic assignment problem on June 16, 2000assignment problem on June 16, 2000 14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,
13, 26,17,30,6,20,19,8,18,7,27,12,11,2313, 26,17,30,6,20,19,8,18,7,27,12,11,23 Used “MW” framework that maps branch and Used “MW” framework that maps branch and
bound problem to master-worker structurebound problem to master-worker structure Condor-G delivered 3.46E8 CPU seconds in Condor-G delivered 3.46E8 CPU seconds in
7 days (peak 1009 processors), using parallel 7 days (peak 1009 processors), using parallel computers, workstations, and clusterscomputers, workstations, and clusters
MetaNEOS: Argonne, Northwestern, Wisconson
Emerging Emerging Data GridData Grid User Communities User Communities
NSF Network for Earthquake Engineering NSF Network for Earthquake Engineering Simulation Grid (NEES)Simulation Grid (NEES) Integrated instrumentation, Integrated instrumentation,
collaboration, simulationcollaboration, simulation Grid Physics Network (GriPhyN)Grid Physics Network (GriPhyN)
ATLAS, CMS, LIGO, SDSSATLAS, CMS, LIGO, SDSS Particle Physics Data Grid (PPDG)Particle Physics Data Grid (PPDG) EU Data GridEU Data Grid Access Grid; VRVS: supporting Access Grid; VRVS: supporting
group-based collaborationgroup-based collaborationAndAnd
The Human Genome ProjectThe Human Genome Project The Earth System Grid and EOSDISThe Earth System Grid and EOSDIS Federating Brain DataFederating Brain Data Computed MicrotomographyComputed Microtomography The Virtual Observatory (US + Int’l) The Virtual Observatory (US + Int’l)
The Particle Physics Data Grid The Particle Physics Data Grid (PPDG)(PPDG)
First Round Goal: First Round Goal: Optimized cached read access to 10-100 Gbytes Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabytedrawn from a total data set of 0.1 to ~1 Petabyte
Matchmaking, Resource Co-Scheduling: SRB, Condor, HRM, GlobusMatchmaking, Resource Co-Scheduling: SRB, Condor, HRM, Globus
PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,
CPU, Disk, CPU, Disk, Tape RobotTape Robot
REGIONAL SITEREGIONAL SITECPU, Disk, CPU, Disk, Tape RobotTape Robot
Site to Site Data Replication Service
100 Mbytes/sec
ANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS
Multi-Site Cached File Access Service
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsersPRIMARY SITEPRIMARY SITE
DAQ, Tape, DAQ, Tape, CPU, CPU,
Disk, RobotDisk, Robot
Regional SiteRegional SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
Regional SiteRegional SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
GriPhyN: PetaScale GriPhyN: PetaScale Virtual Data GridsVirtual Data Grids
Build the Foundation for Petascale Virtual Data GridsBuild the Foundation for Petascale Virtual Data Grids
Virtual Data Tools
Request Planning &
Scheduling ToolsRequest Execution & Management Tools
Transforms
Distributed resources(code, storage,
computers, and network)
Resource Management
Services
Resource Management
Services
Security and Policy
Services
Security and Policy
Services
Other Grid ServicesOther Grid
Services
Interactive User Tools
Production TeamIndividual Investigator
Workgroups
Raw data source
WorkPackageNumber
Work Package title Leadcontractor
WP1 Grid Workload Management INFN
WP2 Grid Data Management CERN
WP3 Grid Monitoring Services PPARC
WP4 Fabric Management CERN
WP5 Mass Storage Management PPARC
WP6 Integration Testbed CNRS
WP7 Network Services CNRS
WP8 High Energy Physics Applications CERN
WP9 Earth Observation Science Applications ESA
WP10 Biology Science Applications INFN
WP11 Dissemination and Exploitation INFN
WP12 Project Management CERN
EU-Grid ProjectEU-Grid ProjectWork PackagesWork Packages
Grid Tools for CMS “HLT” Grid Tools for CMS “HLT” Production: A. Samar (Caltech)Production: A. Samar (Caltech)
Distributed Distributed Job Job ExecutionExecution and and Data Handling:Data Handling:
GoalsGoals
TransparencyTransparency PerformancePerformance Security Security Fault ToleranceFault Tolerance AutomationAutomation
Submit job
Replicate data
Replicatedata
Site A Site B
Site C
Jobs are executed locally or
remotely Data is always
written locally Data is replicated
to remote sites
Job writes data locally
A.Samar, M. Hafeez (Caltech) with CERN and FNAL
GRIDs In 2000: SummaryGRIDs In 2000: Summary
Grids are changing the way we do science Grids are changing the way we do science and engineeringand engineering
Key services and concepts have been Key services and concepts have been identified, and development has startedidentified, and development has started
Major IT challenges remainMajor IT challenges remain Opportunities for CollaborationOpportunities for Collaboration
Transition of services and applications to Transition of services and applications to production use is starting to occurproduction use is starting to occur
In future more sophisticated integrated services In future more sophisticated integrated services and toolsets could drive advances in many and toolsets could drive advances in many fields of science and engineeringfields of science and engineering
High Energy Physics, facing the need for High Energy Physics, facing the need for Petascale Virtual Data, is an early adopterPetascale Virtual Data, is an early adopterand leading Data Grid developerand leading Data Grid developer
The GRID BOOKThe GRID BOOK
Book published by Book published by Morgan KaufmanMorgan Kaufman www.mkp.com/gridswww.mkp.com/grids
GlobusGlobus www.globus.orgwww.globus.org
Grid ForumGrid Forum www.gridforum.orgwww.gridforum.org
French GRID Initiative PartnersFrench GRID Initiative Partners
Computing centresComputing centres:: IDRIS CNRS High Performance Computing CentreIDRIS CNRS High Performance Computing Centre IN2P3 Computing CentreIN2P3 Computing Centre CINES, centre de calcul intensif de l’enseignementCINES, centre de calcul intensif de l’enseignement CRIHAN centre régional d’informatique à RouenCRIHAN centre régional d’informatique à Rouen
Network departments:Network departments: UREC CNRS network departmentUREC CNRS network department GIP RenaterGIP Renater
Computing Science CNRS & INRIA labs:Computing Science CNRS & INRIA labs: Université Joseph FourierUniversité Joseph Fourier ID-IMAG ID-IMAG LAASLAAS RESAMRESAM LIP and PSMN (Ecole Normale LIP and PSMN (Ecole Normale
Supérieure de Lyon)Supérieure de Lyon)Industry:Industry:
Société Communication et Systèmes Société Communication et Systèmes EDF R&D departmentEDF R&D department
Applications development teams (HEP, Applications development teams (HEP, Bioinformatics,Earth Observation):Bioinformatics,Earth Observation):
IN2P3, CEA, Observatoire de Grenoble, Laboratoire IN2P3, CEA, Observatoire de Grenoble, Laboratoire de Biométrie, Institut Pierre Simon Laplacede Biométrie, Institut Pierre Simon Laplace
LHC Tier 2 Center In 2001LHC Tier 2 Center In 2001
OC-12
Router
DLT
RAID
FCG
bEth
FEth
OC
-3
VRVSMPEG2
OC
-3
FEth SwitchFEth Switch
FEth SwitchFEth SwitchGEth Switch
Data S
erver
ESG Prototype ESG Prototype Inter-communication DiagramInter-communication Diagram
LBNLGSI-wuftpd
LLNL
Disk
PCMDI
Request Manager
ISIGSI-
wuftpd
Disk
SDSCGSI-pftpd
HPSS
Disk
ANLGSI-
wuftpd
Disk
NCARGSI-
wuftpd
Disk
LBNL
Diskon
Clipper
HPSS
HRM
ANLReplica Catalog
GIS with NWS
GSI-ncftp
GS
I-ncftpGSI-n
cftp
LDAP Script
LDAP C API or Script
GSI-ncftp
GSI-ncftpGSI-ncftp CORBA
GriPhyN ScopeGriPhyN Scope
Several scientific disciplinesSeveral scientific disciplines US-CMSUS-CMS High Energy PhysicsHigh Energy Physics US-ATLASUS-ATLAS High Energy PhysicsHigh Energy Physics LIGOLIGO Gravity wave experimentGravity wave experiment SDSSSDSS Sloan Digital Sky SurveySloan Digital Sky Survey
Requesting $70M from NSF to build GridsRequesting $70M from NSF to build Grids 4 Grid implementations, one per experiment4 Grid implementations, one per experiment Tier2 hardware, networking, people, R&DTier2 hardware, networking, people, R&D Common problems for different Common problems for different
implementationsimplementations PartnershipPartnership with CS professionals, with CS professionals,
IT, industryIT, industry R&D from NSF ITR Program ($12M)R&D from NSF ITR Program ($12M)
Data Grids: Better Global Resource Data Grids: Better Global Resource Use Use andand Faster Turnaround Faster Turnaround
Efficient resource use and improved responsiveness Efficient resource use and improved responsiveness through:through:
Treatment of the ensemble of site and network Treatment of the ensemble of site and network resources as an integrated (loosely coupled) systemresources as an integrated (loosely coupled) system
Resource discovery, prioritizationResource discovery, prioritization
Data caching, query estimation, co-scheduling, Data caching, query estimation, co-scheduling, transaction managementtransaction management
Network and site “instrumentation”: performance Network and site “instrumentation”: performance tracking, monitoring, problem trapping and tracking, monitoring, problem trapping and
handlinghandling
Emerging ProductionEmerging ProductionGridsGrids
NASA Information Power Grid
NSF National Technology
Grid
EU HEP Data Grid Project
Grid (IT) Issues to be AddressedGrid (IT) Issues to be Addressed
Data caching and mirroring strategies Data caching and mirroring strategies Object Collection Extract/Export/Transport/Import Object Collection Extract/Export/Transport/Import
for large or highly distributed data transactionsfor large or highly distributed data transactions Query estimators, Query Monitors (cf. ATLAS/GC work)Query estimators, Query Monitors (cf. ATLAS/GC work)
Enable flexible, resilient prioritisation schemesEnable flexible, resilient prioritisation schemes Query redirection, priority alteration, fragmentation, etc.Query redirection, priority alteration, fragmentation, etc.
Pre-Emptive and realtime data/resource matchmakingPre-Emptive and realtime data/resource matchmaking Resource discoveryResource discovery Co-scheduling and queueing Co-scheduling and queueing
State, workflow, & performance-monitoring State, workflow, & performance-monitoring instrumentation; tracking and forward predictioninstrumentation; tracking and forward prediction
Security: Authentication (for resource allocation/usage Security: Authentication (for resource allocation/usage and priority); running an international certificate authorityand priority); running an international certificate authority
Why Now?Why Now?
The Internet as infrastructureThe Internet as infrastructure Increasing bandwidth, advanced services;Increasing bandwidth, advanced services;
a need to explore higher throughputa need to explore higher throughput
Advances in storage capacityAdvances in storage capacity A Terabyte for ~$40k (or $ 10k)A Terabyte for ~$40k (or $ 10k)
Increased availability of compute Increased availability of compute resourcesresources Dense (Web) Server-Clusters, Dense (Web) Server-Clusters,
supercomputers, etc.supercomputers, etc.
Advances in application conceptsAdvances in application concepts Simulation-based design, advanced scientific Simulation-based design, advanced scientific
instruments, collaborative engineering, ...instruments, collaborative engineering, ...
PPDG Work at Caltech and PPDG Work at Caltech and SLACSLAC
Work on the NTON connections between Caltech and Work on the NTON connections between Caltech and SLACSLAC Test with 8 OC3 adapters on the Caltech Exemplar Test with 8 OC3 adapters on the Caltech Exemplar
multiplexed across to a SLAC Cisco GSR router. multiplexed across to a SLAC Cisco GSR router. Limited throughput due to small MTU in the GSR.Limited throughput due to small MTU in the GSR.
A Dell dual Pentium III based server with A Dell dual Pentium III based server with two OC12 two OC12 (622 Mbps) ATM cards(622 Mbps) ATM cards. Configured to allow . Configured to allow aggregate transfer of more then aggregate transfer of more then 100 Mbytes/seconds100 Mbytes/seconds in both directions Caltech in both directions Caltech SLAC. SLAC.
So far reached 40 Mbytes/sec on one OC12So far reached 40 Mbytes/sec on one OC12 Monitoring tools installed at Caltech/CACRMonitoring tools installed at Caltech/CACR
PingER installed to monitor WAN HEP connectivityPingER installed to monitor WAN HEP connectivity AA Surveyor Surveyor device will be installed soon, for very device will be installed soon, for very
precise measurement of network traffic speeds precise measurement of network traffic speeds Investigations into a distributed resource management Investigations into a distributed resource management
architecture that co-manages processors and dataarchitecture that co-manages processors and data
ParticipantsParticipants
Main partners:Main partners: CERN, INFN(I), CNRS(F), PPARC(UK), CERN, INFN(I), CNRS(F), PPARC(UK), NIKHEF(NL), ESA-Earth Observation NIKHEF(NL), ESA-Earth Observation
Other sciences:Other sciences: Earth Observation, Biology, MedicineEarth Observation, Biology, Medicine
Industrial participation:Industrial participation: CS SI/F, DataMat/I, IBM/UKCS SI/F, DataMat/I, IBM/UK
Associated partners:Associated partners: Czech Republic, Finland, Czech Republic, Finland, Germany, Hungary, Spain, Sweden (mostly computer Germany, Hungary, Spain, Sweden (mostly computer scientists)scientists)
Work with US:Work with US:Underway; Formal collaboration being establishedUnderway; Formal collaboration being established
Industry and Research Project ForumIndustry and Research Project Forum with representatives from:with representatives from:
Denmark, Greece, Israel, Japan, Norway, Denmark, Greece, Israel, Japan, Norway, Poland, Portugal, Russia, SwitzerlandPoland, Portugal, Russia, Switzerland
GriPhyN: First Production Scale GriPhyN: First Production Scale “Grid Physics Network”“Grid Physics Network”
Develop a New Form of Integrated Distributed Develop a New Form of Integrated Distributed System, while Meeting Primary Goals of the System, while Meeting Primary Goals of the
LIGO, SDSS and LHC Scientific ProgramsLIGO, SDSS and LHC Scientific Programs
Focus on Tier2 Centers at UniversitiesFocus on Tier2 Centers at Universities In a Unified Hierarchical Grid of Five LevelsIn a Unified Hierarchical Grid of Five Levels
18 Centers; with Four Sub-Implementations 18 Centers; with Four Sub-Implementations 5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS Near Term Focus on LIGO, SDSS handling of real Near Term Focus on LIGO, SDSS handling of real
data; LHC “Data Challenges” with simulated datadata; LHC “Data Challenges” with simulated data Cooperation with PPDG, MONARC and Cooperation with PPDG, MONARC and
EU Grid ProjectEU Grid Project
http://www.phys.ufl.edu/~avery/GriPhyN/http://www.phys.ufl.edu/~avery/GriPhyN/
An effective collaboration betweenAn effective collaboration betweenPhysicists, Astronomers, and Computer ScientistsPhysicists, Astronomers, and Computer Scientists
Virtual DataVirtual Data A hierarchy of compact data formsA hierarchy of compact data forms, , user user
collections and remote data transformations collections and remote data transformations is essentialis essential
Even with future Gbps networksEven with future Gbps networks Coordination among multiple sites is requiredCoordination among multiple sites is required Coherent strategies are needed for data location, Coherent strategies are needed for data location,
transport, caching and replication, structuring,transport, caching and replication, structuring,and resource co-scheduling for efficient accessand resource co-scheduling for efficient access
GriPhyN: Petascale Virtual GriPhyN: Petascale Virtual Data GridsData Grids
Sloan Digital Sky SurveySloan Digital Sky SurveyData GridData Grid
Three main functions:Three main functions: Raw data processing on a Grid (FNAL)Raw data processing on a Grid (FNAL) Rapid turnaround with TBs of dataRapid turnaround with TBs of data Accessible storage of all image dataAccessible storage of all image data
Fast science analysis environmentFast science analysis environment(JHU)(JHU)
Combined data access + analysis Combined data access + analysis of calibrated dataof calibrated data
Distributed I/O layer and processing Distributed I/O layer and processing layer; shared by whole collaborationlayer; shared by whole collaboration
Public data accessPublic data access SDSS data browsing for SDSS data browsing for
astronomers, and studentsastronomers, and students Complex query engine for the publicComplex query engine for the public