The Grid:Beyond the Hype
Ian Foster
Argonne National Laboratory
University of Chicago
Globus Alliance
www.mcs.anl.gov/~foster
Seminar, Duke, September 14, 2004
3
Grid Hype
4Energy Internet
The Shape of Grids to Come? InternetHype?
5
eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale
distributed collaborative work
2. Such distributed collaborative work raises challenging problems of broad importance
3. Any effective attack on those problems must involve close engagement with applications
4. Open software & standards are key to producing & disseminating required solutions
5. Shared software & service infrastructure are essential application enablers
6. A cross-disciplinary community of technology producers & consumers is needed
Global Knowledge Communities: E.g., High Energy Physics
7
The Grid “Resource sharing & coordinated
problem solving in dynamic, multi-institutional virtual organizations”
1. Enable integration of distributed resources
2. Using general-purpose protocols & infrastructure
3. To achieve better-than-best-effort service
8
The Grid (2) Dynamically link resources/services
From collaborators, customers, eUtilities, … (members of evolving “virtual organization”)
Into a “virtual computing system” Dynamic, multi-faceted system spanning
institutions and industries Configured to meet instantaneous needs, for:
Multi-faceted QoX for demanding workloads Security, performance, reliability, …
9
Software,Standards
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Infra-structure
DisciplineAdvances
GlobalCommunity
10
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
11
Resource/Service Integrationas a Fundamental Challenge
R
Discovery
Many sourcesof data, services,computation
R
Registries organizeservices of interestto a community
Access
Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations
Exploration & analysismay involve complex,multi-step workflows
RM
RM
RMRM
RM
Resource managementis needed to ensureprogress & arbitrate competing demands
Securityservice
Securityservice
PolicyservicePolicyservice
Security & policymust underlie access& managementdecisions
12
CPU v. Collab.
10
100
1,000
10,000
100,000
0 500 1000 1500 2000 2500
Collaboration Size
CPU CPU v. Collab.
Earth Simulator
Atmospheric Chemistry Group
LHC Exp.
Astronomy
Grav. Wave
Nuclear Exp.
Current accelerator Exp.
Scale Metrics: Participants, Data, Tasks, Performance, Interactions, …
13
Profound Technical Challenges
How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:
Negotiate & manage trust Access & integrate data Construct & reuse workflows Plan complex computations Detect & recover from failures Capture & share knowledge Represent & enforce policies Achieve end-to-end QoX Move data rapidly & reliably
Support collaborative work Define primitive protocols Build reusable software Package & deliver software Deploy & operate services Operate infrastructure Upgrade infrastructure Perform troubleshooting Etc., etc., etc.
14
Grid TechnologiesAddress Key Requirements
Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations Dynamic, autonomous, domain independent On-demand, ubiquitous access to computing,
data, and services Mechanisms for creating and managing workflow
within such federations New capabilities constructed dynamically and
transparently from distributed services Service-oriented, virtualization
15Computer Science Contributions
Protocols and/or tools for use in dynamic, scalable, multi-institutional, computationally & data-rich settings for:
Large-scale distributedsystem architecture
Cross-org authentication Scalable community-based
policy enforcement Robust & scalable discovery Wide-area scheduling High-performance, robust,
wide-area data management Knowledge-based workflow
generation High-end collaboration
Resource & service virtualization
Distributed monitoring & manageability
Application development Wide area fault tolerance Infrastructure deployment &
management Resource provisioning &
quality of service Performance monitoring &
modeling
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”
VirtualData
System
Transformation Derivation
Data
created-by
execution-of
consumed-by/generated-by
“I’ve detected a calibration error in an instrument and
want to know which derived data to recompute.”
“I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”
“I want to apply an astronomical analysis program to millions of objects. If the results
already exist, I’ll save weeks of computation.”
Collaborative Workflow: Virtual Data
www.griphyn.org/chimera
17
0
50
100
150
200
250
300
350
400
0 50 100 150 200IP delay (ms)
Ove
rlay
del
ay (
ms)
.
AdaptiveUnstructured Multicast
“UMM: A dynamically adaptive, unstructured multicast overlay” M. Ripeanu et al.
A
E
B
D
C
A’
E’
B’
D’
C’
A”
E”
B”
D”
C”
Applicationoverlay
Baseoverlay
Physicaltopology
0
2
4
6
8
10
0
240
480
720
960
1200
1440
1680
1920
2160
2400
2640
2880
3120
3360
3600
3840
Time (sec)
RD
P
0
2
4
6
8
10
12
Max
xim
um li
nk s
tres
s .
MaxRDP
95% RDP
90%RDP
Stress
10 nodes fail then
rejoin 900s later
RDP=1
RDP=2
18
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
19
Open Standards & Software Standardized & interoperable mechanisms for
secure & reliable: Authentication, authorization, policy, … Representation & management of state Initiation & management of computation Data access & movement Communication & notification
Good quality open source implementations to accelerate adoption & development E.g., Globus Toolkit
20In
crea
sed
func
tiona
lity,
stan
dard
izat
ion
Customsolutions
1990 1995 2000 2005
Open GridServices Arch
Real standardsMultiple implementations
Web services, etc.
Managed sharedvirtual systems
Research
Globus Toolkit
Defacto standardSingle implementation
Internetstandards
Evolution of Open GridStandards and Software
2010
21
WS Core Enables Frameworks:E.g., Resource Management
Web services(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)
WS-Resource Framework & WS-Notification(Resource identity, lifetime, inspection, subscription, …)
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring, …)
Applications of the framework(Compute, network, storage provisioning,
job reservation & submission, data management,application service QoS, …)
22
WSRF & WS-Notification Naming and bindings (basis for virtualization)
Every resource can be uniquely referenced, and has one or more associated services for interacting with it
Lifecycle (basis for fault resilient state mgmt) Resources created by services following factory pattern Resources destroyed immediately or scheduled
Information model (basis for monitoring, discovery) Resource properties associated with resources Operations for querying and setting this info Asynchronous notification of changes to properties
Service groups (basis for registries, collective svcs) Group membership rules & membership management
Base Fault type
23
Network
RRR
A
ServiceLevel
Bringing it All TogetherScenario: Resource management & scheduling
Storage
RRRIBM
IBM
Blades
RRR
Notification
GridScheduler
WS-Resource used to “model” physical
processor resources
WS-Resource Properties “project” processor status (like utilization)
Local processor manageris “front-ended” with A Web service interface
Other kinds of resources are also“modeled” as WS-Resources
JJ
J
WS-Notification can be used to “inform” the
scheduler when processor utilization
changes
Grid “Jobs” and “tasks” are also modeled using
WS-Resources and Resource Properties
Grid Scheduleris a
Web Service
Service Level Agreement
is modeled as a WS-Resource
Lifetime of SLA Resource tied to the duration
of the agreement
24
The Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC)
An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit Design, engineering, support, governance
Academic Affiliates make major contributions EU: CERN, Imperial, MPI, Poznan AP: AIST, TIT, Monash US: NCSA, SDSC, TACC, UCSB, UW, etc.
Significant industrial contributions 1000s of users worldwide, many contribute
25
Globus Toolkit History:An Unreliable Memoir
0
5000
10000
15000
20000
25000
30000
1997 1998 1999 2000 2001 2002
Glo
bu
s T
oo
lkit
Do
wn
load
s/M
on
th f
rom
Glo
bu
s.O
rg
DARPA, NSF begin funding Grid work
NASA initiatesInformation Power Grid
Globus Project winsGlobal Information
InfrastructureAward
MPICH-Greleased
The Grid: Blueprint for a New ComputingInfrastructure published
GT 1.0.0Released
Early ApplicationSuccesses Reported
GT 1.1.1Released
GT 1.1.2Released
GT 1.1.3Released
NSF & European CommissionInitiate Many New Grid Projects
GT 1.1.4 andMPICH-G2 Released
Anatomy of the GridPaper Released
FirstEuroGlobusConference
Held inLecce
SignificantCommercial
Interest inGrids
NSF GRIDS CenterInitiated
GT 2.0 betaReleased
Physiology of the GridPaper Released
GT 2.0Released
GT 2.2Released
Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc.
26
GlobusToolkit
ContributorsInclude
Grid Packaging Technology (GPT) NCSA Persistent GRAM Jobmanager Condor GSI/Kerberos interchangeability Sandia Documentation NASA, NCSA Ports IBM, HP, Sun, SDSC, … MDS stress testing EU DataGrid Support IBM, Platform, UK eScience Testing and patches Many Interoperable tools Many Replica location service EU DataGrid Python hosting environment LBNL Data access & integration UK eScience Data mediation services SDSC Tooling, Xindice, JMS IBM Brokering framework Platform Management framework HP $$ DARPA, DOE, NSF, NASA, Microsoft, EU
27
GT-Based Grid Tools & Solutions
Globus Toolkit
Vir
tual D
ata
Toolk
it
Pla
tform
Glo
bu
s
NS
F M
idd
lew
are
In
it.
Bu
tterfl
y G
rid
EU
Data
Gri
d
IBM
Gri
d T
oolb
ox
MPIC
H-G
2
Acc
ess
Gri
d
Eart
h S
yst
em
Gri
d
Fusi
on
Gri
d
BIR
N B
iom
ed
ical G
rid
Tera
Gri
d
NEESg
rid
UK
eS
cien
ce G
rid
…
28
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
29
Infrastructure Broadly deployed services in support of virtual
organization formation and operation Authentication, authorization, discovery, …
Services, software, and policies enabling on-demand access to important resources Computers, databases, networks, storage, software
services,… Operational support for 24x7 availability Integration with campus infrastructures Distributed, heterogeneous, instrumented systems
can be wonderful CS testbeds
30
Infrastructure Status
Many infrastructure deployments worldwide Community-specific & general-purpose From campus to international Most based on GT technology
U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN
Major open issues include practical aspects of operations and federation
Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising
NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to
reducing vulnerability to catastrophic earthquakes
32
NEESgrid User Perspective
Secure, reliable, on-demand access to data,software, people, and other resources(ideally all via a Web Browser!)
33
How it Really Happens(with the Globus Toolkit)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
CHEF ChatTeamlet
MyProxy
CHEF
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
Globus IndexService
GlobusGRAM
GlobusGRAM
GlobusDAI
GlobusDAI
GlobusDAI
Application Developer
2
Off the Shelf
9
Globus Toolkit
4
Grid Community
4
34Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 7 substantial applications + CS experiments Running since October 2003
Korea
http://www.ivdgl.org/grid2003
35
Open Science Grid Components Computers & storage at 28 sites (to date)
2800+ CPUs Uniform service environment at each site
Globus Toolkit provides basic authentication, execution management, data movement
Pacman installation system enables installation of numerous other VDT and application services
Global & virtual organization services Certification & registration authorities, VO membership
services, monitoring services Client-side tools for data access & analysis
Virtual data, execution planning, DAG management, execution management, monitoring
IGOC: iVDGL Grid Operations Center
36
www.earthsystemgrid.org
DOE Earth System Grid
Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models
37Earth System Grid
38
Problem-Driven, Collaborative Research Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
39
gx
NCSA Computational Model
All computational models written in Matlab.
m1
f1
UIUC
Experimental Model
gx
f1
m1
f2f2
U. Colorado
Experimental Model
gx
NEESgridMulti-site Online Simulation Test
40
0
10
20
30
40
50
60
70
8:0
0
8:3
0
9:0
0
9:3
0
10
:00
10
:30
11
:00
11
:30
12
:00
12
:30
13
:00
13
:30
14
:00
14
:30
15
:00
15
:30
16
:00
16
:30
17
:00
17
:30
18
:00
18
:30
Nu
mb
er
of
Pa
rtic
ipa
nts
UIUC
Colorado
NEESgridMultisite OnlineSimulation Test
(July 2003)
Illin
ois
Colo
rado
Illinois (simulation)
41
MOST: A Grid PerspectiveU. Colorado
Experimental Model
gx
f2m1, 1
F2
F1
e
gx
=
gx
f1, x1
UIUC Experimental Model
NTCPNTCP
SERVERSERVER
gx
m1
f1 f2
NCSANCSA
Computational Model
SIMULATIONSIMULATION
COORDINATORCOORDINATOR
NTCPNTCP
SERVERSERVER
NTCPNTCP
SERVERSERVER
42
Grid2003 Applications To Date
CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments
www.ivdgl.org/grid2003/applications
ExampleGrid2003Workflows
Genome sequence analysis
Physicsdata
analysis
Sloan digital sky
survey
Example Grid3 Application:NVO Mosaic Construction
NVO/NASA Montage: A small (1200 node) workflow
Construct custom mosaics on demand from multiple data sources
User specifies projection, coordinates, size, rotation, spatial sampling
Work by Ewa Deelman et al., USC/ISI and Caltech
45
Concluding Remarks
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Software,Standards
DisciplineAdvances
Infra-structure
GlobalCommunity
46
eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale
distributed collaborative work
2. Such distributed collaborative work raises challenging problems of broad importance
3. Any effective attack on those problems must involve close engagement with applications
4. Open software & standards are key to producing & disseminating required solutions
5. Shared software & service infrastructure are essential application enablers
6. A cross-disciplinary community of technology producers & consumers is needed
GlobalCommunity
48
(Based on a slide from HP)
Utility Computing is One of Several Commercial Drivers
shared, traded resources
value
clusters
grid-enabled systems
programmable data center
virtual data center
Open VMS clusters, TruCluster, MC ServiceGuard
Tru64, HP-UX, Linux
switchfabriccompute storage
UDC
computing utility
or
GRID
today
Utility computing On-demand Service-orientation Virtualization
49
Significant Challenges Remain
Scaling in multiple dimensions Ambition and complexity of applications Number of users, datasets, services, … From technologies to solutions
The need for persistent infrastructure Software and people as well as hardware Currently no long-term commitment
Institutionalizing multidisciplinary approach Understand implications on the practice of
computer science research
50
Thanks, in particular, to: Carl Kesselman and Steve Tuecke, my long-time Globus
co-conspirators Gregor von Laszewski, Kate Keahey, Jennifer Schopf,
Mike Wilde, Argonne colleagues Globus Alliance members at Argonne, U.Chicago, USC/ISI,
Edinburgh, PDC Miron Livny, U.Wisconsin Condor project, Rick Stevens,
Argonne & U.Chicago Other partners in Grid technology, application, &
infrastructure projects DOE, NSF, NASA, IBM for generous support
51
For More Information Globus Alliance
www.globus.org Global Grid Forum
www.ggf.org Open Science Grid
www.opensciencegrid.org Background information
www.mcs.anl.gov/~foster GlobusWORLD 2005
Feb 7-11, Boston
2nd Editionwww.mkp.com/grid2