The Grid as Infrastructure and Application Enabler
description
Transcript of The Grid as Infrastructure and Application Enabler
The Grid as Infrastructureand Application Enabler
Ian Foster
Mathematics and Computer Science Division
Argonne National Laboratory
and
Department of Computer Science
The University of Chicago
http://www.mcs.anl.gov/~foster
2
[email protected] ARGONNE CHICAGO
The Grid
“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
3
[email protected] ARGONNE CHICAGO
Resources– Computing, storage, data
Communities– Operational procedures, …
Grid InfrastructureA
AA
Services– Authentication, discovery, …
Connectivity– Reduce tyranny of distance
Technologies– Build services & applications
4
[email protected] ARGONNE CHICAGO
The Grid World: Current Status Dozens of major Grid projects in scientific &
technical computing/research & education– Compute-intensive, data-intensive, remote
instrumentation, collaboration, … Open source Globus Toolkit™ a de facto
standard for major protocols & services– Supporting many tools & applns in data-
intensive and collaboration-intensive science Major investments in physical infrastructure Global Grid Forum: community & standards
5
[email protected] ARGONNE CHICAGO
Examples ofEmerging Grid Infrastructure
iVDGL: Data-intensive infrastructure– Building a (international) community
Data Grid middleware– Chimera virtual data system
Open Grid Services Architecture– Future service & technology infrastructure
6
[email protected] ARGONNE CHICAGO
iVDGL: A Global Grid Laboratory
International Virtual-Data Grid Laboratory– A global Grid laboratory (US, Europe, Asia, South
America, …)– A place to conduct Data Grid tests “at scale”– A mechanism to create common Grid infrastructure– A laboratory for other disciplines to perform Data Grid
tests– A focus of outreach efforts to small institutions
U.S. part funded by NSF (2001-2006)– $13.7M (NSF) + $2M (matching)
“We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.”
From NSF proposal, 2001
7
[email protected] ARGONNE CHICAGO
Initial US-iVDGL Data Grid
Tier1 (FNAL)Proto-Tier2Tier3 university
UCSDFlorida
Wisconsin
FermilabBNL
Indiana
BU
Other sites to be added in
2002
SKC
Brownsville
Hampton
PSU
JHUCaltech
8
[email protected] ARGONNE CHICAGOU.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
iVDGL:International Virtual Data Grid Laboratory
Tier0/1 facility
Tier2 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
Tier3 facility
9
[email protected] ARGONNE CHICAGO
Grid Evolution:Open Grid Services Architecture
Refactor Globus protocol suite to enable common base and expose key capabilities– Secure, reliable invocation; service info;
notification; soft state lifetime mgmt; … Service orientation to virtualize resources and
unify resources/services/information– Standard IDL for encapsulation
Embrace key Web services technologies: WSDL as IDL, leverage commercial efforts– And WS Security, WS Routing, etc.
10
[email protected] ARGONNE CHICAGO
OGSA Structure
A standard substrate: the Grid service– Standard interfaces and behaviors that address
key distributed system issues
– The “Grid Service Specification” … supports standard service specifications
– Resource management, databases, workflow, security, diagnostics, etc., etc.
– Target of current & planned GGF efforts … and arbitrary application-specific services
based on these & other definitions
11
[email protected] ARGONNE CHICAGO
OGSA Status
Grid service spec near completion in GGF– Globus Toolkit implementation available
– IBM & Fujitsu implementations underway
– Other companies committed to support it Various higher-level services underway
– OGSI-based Globus Toolkit v3 (GT3), will support GT2 interfaces by end of 2002
– Database services (UK eScience program)
– Resource information & management (CIM)
– Etc., etc.
12
[email protected] ARGONNE CHICAGO
Programs as Community Resources:Data Derivation and Provenance
Most [scientific] data are not simple “measurements”; essentially all are:– Computationally corrected/reconstructed
– And/or produced by numerical simulation And thus, as data and computers become ever
larger and more expensive:– Programs are significant community resources
– So are the executions of those programs A virtual data system provides a unified view of
data, programs, and executions
13
[email protected] ARGONNE CHICAGO
Transformation Derivation
Data
created-by
execution-of
consumed-by/generated-by
“I’ve detected a calibration error in an instrument and
want to know which derived data to recompute.”
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”
“I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”
“I want to apply an astronomical analysis program to millions of objects. If the results
already exist, I’ll save weeks of computation.”
Virtual Data
14
[email protected] ARGONNE CHICAGO
Virtual data catalog– Transformations,
derivations, data Virtual data language
– Data definition + query
Applications include browsers and data analysis applications
Data Grid Resources(distributed execution
and data management)
VDL Interpreter(manipulate derivations
and transformations)
Virtual Data Catalog(implements ChimeraVirtual Data Schema)
Virtual DataApplications
Virtual Data Language(definition and query)
Task Graphs(compute and data
movement tasks, withdependencies)
SQL
Chimera
Chimera Virtual Data System(www.griphyn.org/chimera)
Joint work with Jens Vöckler, Mike Wilde, Yong Zhao
GriPhyN VDT:Replica catalog
DAGMan
Globus Toolkit
Etc.
15
[email protected] ARGONNE CHICAGO Joint work with Jim Annis, Steve Kent, FNAL
Size distribution ofgalaxy clusters?
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Chimera Virtual Data System+ GriPhyN Virtual Data Toolkit
+ iVDGL Data Grid (many CPUs)
Chimera Application: Sloan Digital Sky Survey Analysis
16
[email protected] ARGONNE CHICAGO
Summary “Resource sharing, coordinated problem solving in
dynamic, multi-institutional virtual orgs”– Adoption in eScience, transitioning to industry
Emerging physical infrastructure– TeraGrid, iVDGL, DOE Science Grid, …, …
Open Grid Services Architecture– Integrated treatment of major Grid issues
– Uniform treatment of resources, data, services Chimera virtual data system
– New abstractions for application development
17
[email protected] ARGONNE CHICAGO
The Globus Project™– www.globus.org
Technical articles– www.mcs.anl.gov/~foster
TeraGrid, iVDGL– www.teragrid.org, www.ivdgl.org
Open Grid Services Arch.– www.globus.org/ogsa
Chimera– www.griphyn.org/chimera
Global Grid Forum– www.gridforum.org
For More Information