The Grid as Infrastructure and Application Enabler

17
The Grid as Infrastructure and Application Enabler Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster

description

The Grid as Infrastructure and Application Enabler. Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster. The Grid. - PowerPoint PPT Presentation

Transcript of The Grid as Infrastructure and Application Enabler

Page 1: The Grid as Infrastructure and Application Enabler

The Grid as Infrastructureand Application Enabler

Ian Foster

Mathematics and Computer Science Division

Argonne National Laboratory

and

Department of Computer Science

The University of Chicago

http://www.mcs.anl.gov/~foster

Page 2: The Grid as Infrastructure and Application Enabler

2

[email protected] ARGONNE CHICAGO

The Grid

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

Page 3: The Grid as Infrastructure and Application Enabler

3

[email protected] ARGONNE CHICAGO

Resources– Computing, storage, data

Communities– Operational procedures, …

Grid InfrastructureA

AA

Services– Authentication, discovery, …

Connectivity– Reduce tyranny of distance

Technologies– Build services & applications

Page 4: The Grid as Infrastructure and Application Enabler

4

[email protected] ARGONNE CHICAGO

The Grid World: Current Status Dozens of major Grid projects in scientific &

technical computing/research & education– Compute-intensive, data-intensive, remote

instrumentation, collaboration, … Open source Globus Toolkit™ a de facto

standard for major protocols & services– Supporting many tools & applns in data-

intensive and collaboration-intensive science Major investments in physical infrastructure Global Grid Forum: community & standards

Page 5: The Grid as Infrastructure and Application Enabler

5

[email protected] ARGONNE CHICAGO

Examples ofEmerging Grid Infrastructure

iVDGL: Data-intensive infrastructure– Building a (international) community

Data Grid middleware– Chimera virtual data system

Open Grid Services Architecture– Future service & technology infrastructure

Page 6: The Grid as Infrastructure and Application Enabler

6

[email protected] ARGONNE CHICAGO

iVDGL: A Global Grid Laboratory

International Virtual-Data Grid Laboratory– A global Grid laboratory (US, Europe, Asia, South

America, …)– A place to conduct Data Grid tests “at scale”– A mechanism to create common Grid infrastructure– A laboratory for other disciplines to perform Data Grid

tests– A focus of outreach efforts to small institutions

U.S. part funded by NSF (2001-2006)– $13.7M (NSF) + $2M (matching)

“We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.”

From NSF proposal, 2001

Page 7: The Grid as Infrastructure and Application Enabler

7

[email protected] ARGONNE CHICAGO

Initial US-iVDGL Data Grid

Tier1 (FNAL)Proto-Tier2Tier3 university

UCSDFlorida

Wisconsin

FermilabBNL

Indiana

BU

Other sites to be added in

2002

SKC

Brownsville

Hampton

PSU

JHUCaltech

Page 8: The Grid as Infrastructure and Application Enabler

8

[email protected] ARGONNE CHICAGOU.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

iVDGL:International Virtual Data Grid Laboratory

Tier0/1 facility

Tier2 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

Tier3 facility

Page 9: The Grid as Infrastructure and Application Enabler

9

[email protected] ARGONNE CHICAGO

Grid Evolution:Open Grid Services Architecture

Refactor Globus protocol suite to enable common base and expose key capabilities– Secure, reliable invocation; service info;

notification; soft state lifetime mgmt; … Service orientation to virtualize resources and

unify resources/services/information– Standard IDL for encapsulation

Embrace key Web services technologies: WSDL as IDL, leverage commercial efforts– And WS Security, WS Routing, etc.

Page 10: The Grid as Infrastructure and Application Enabler

10

[email protected] ARGONNE CHICAGO

OGSA Structure

A standard substrate: the Grid service– Standard interfaces and behaviors that address

key distributed system issues

– The “Grid Service Specification” … supports standard service specifications

– Resource management, databases, workflow, security, diagnostics, etc., etc.

– Target of current & planned GGF efforts … and arbitrary application-specific services

based on these & other definitions

Page 11: The Grid as Infrastructure and Application Enabler

11

[email protected] ARGONNE CHICAGO

OGSA Status

Grid service spec near completion in GGF– Globus Toolkit implementation available

– IBM & Fujitsu implementations underway

– Other companies committed to support it Various higher-level services underway

– OGSI-based Globus Toolkit v3 (GT3), will support GT2 interfaces by end of 2002

– Database services (UK eScience program)

– Resource information & management (CIM)

– Etc., etc.

Page 12: The Grid as Infrastructure and Application Enabler

12

[email protected] ARGONNE CHICAGO

Programs as Community Resources:Data Derivation and Provenance

Most [scientific] data are not simple “measurements”; essentially all are:– Computationally corrected/reconstructed

– And/or produced by numerical simulation And thus, as data and computers become ever

larger and more expensive:– Programs are significant community resources

– So are the executions of those programs A virtual data system provides a unified view of

data, programs, and executions

Page 13: The Grid as Infrastructure and Application Enabler

13

[email protected] ARGONNE CHICAGO

Transformation Derivation

Data

created-by

execution-of

consumed-by/generated-by

“I’ve detected a calibration error in an instrument and

want to know which derived data to recompute.”

“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”

“I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”

“I want to apply an astronomical analysis program to millions of objects. If the results

already exist, I’ll save weeks of computation.”

Virtual Data

Page 14: The Grid as Infrastructure and Application Enabler

14

[email protected] ARGONNE CHICAGO

Virtual data catalog– Transformations,

derivations, data Virtual data language

– Data definition + query

Applications include browsers and data analysis applications

Data Grid Resources(distributed execution

and data management)

VDL Interpreter(manipulate derivations

and transformations)

Virtual Data Catalog(implements ChimeraVirtual Data Schema)

Virtual DataApplications

Virtual Data Language(definition and query)

Task Graphs(compute and data

movement tasks, withdependencies)

SQL

Chimera

Chimera Virtual Data System(www.griphyn.org/chimera)

Joint work with Jens Vöckler, Mike Wilde, Yong Zhao

GriPhyN VDT:Replica catalog

DAGMan

Globus Toolkit

Etc.

Page 15: The Grid as Infrastructure and Application Enabler

15

[email protected] ARGONNE CHICAGO Joint work with Jim Annis, Steve Kent, FNAL

Size distribution ofgalaxy clusters?

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

Chimera Virtual Data System+ GriPhyN Virtual Data Toolkit

+ iVDGL Data Grid (many CPUs)

Chimera Application: Sloan Digital Sky Survey Analysis

Page 16: The Grid as Infrastructure and Application Enabler

16

[email protected] ARGONNE CHICAGO

Summary “Resource sharing, coordinated problem solving in

dynamic, multi-institutional virtual orgs”– Adoption in eScience, transitioning to industry

Emerging physical infrastructure– TeraGrid, iVDGL, DOE Science Grid, …, …

Open Grid Services Architecture– Integrated treatment of major Grid issues

– Uniform treatment of resources, data, services Chimera virtual data system

– New abstractions for application development

Page 17: The Grid as Infrastructure and Application Enabler

17

[email protected] ARGONNE CHICAGO

The Globus Project™– www.globus.org

Technical articles– www.mcs.anl.gov/~foster

TeraGrid, iVDGL– www.teragrid.org, www.ivdgl.org

Open Grid Services Arch.– www.globus.org/ogsa

Chimera– www.griphyn.org/chimera

Global Grid Forum– www.gridforum.org

For More Information