Grid Projects In The US July 2008

Post on 10-May-2015

1.353 views 0 download

Tags:

description

A talk given at the HPC 2008 meeting in Cetraro, Italy

Transcript of Grid Projects In The US July 2008

Ian Foster

Computation Institute

Argonne National Lab & University of Chicago

Grid Projects in the US(an inevitably incomplete view)

2

Grid Projects in the US

Resources ResourceProvider

ResourceProvider

ResourceProvider

3

Service Provider

Service Provider

Grid Projects in the US

Service Provider

Services

Resources ResourceProvider

4

CommunityCommunity

Grid Projects in the US

Community

Service Provider

Content

Services

Resources ResourceProvider

SoftwareProviders

5

Grid Projects in the US

Community

Service Provider

Content

Services

Resources

SoftwareProviders

ResourceProvider

6

Resource Providers Campus and regional grids

Purdue, Wisc, UCLA, …, … TIGRE, UC system, …

Open Science Grid 43,000 CPUs, 6 PB disk, 15,000 CPU days/day Allocations on basis of MOUs

TeraGrid ~ 1.2 Pflop/s National Allocation Committee

Amazon, Microsoft, IBM, etc. ?? CPUs, ?? storage Fee for service

7

Open Science Grid Sites (5/4/08)

+3 in Brazil; 2 in Mexico; 2 in Taiwan; 1 in the UK. Grows by 10-20 per year.

8

Use by Community

CMS

ATLAS

CDF

Local Usage & bugs(unmapped to VO)

D0

2,000,000 a week

1,000,000 a week

9

TeraGrid Participants

10

Growing User Community4,277

3,702

1,807

575

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

Dec

-03

Feb-

04

Apr-0

4

Jun-

04

Aug-0

4

Oct

-04

Dec

-04

Feb-

05

Apr-0

5

Jun-

05

Aug-0

5

Oct

-05

Dec

-05

Feb-

06

Apr-0

6

Jun-

06

Aug-0

6

Oct

-06

Dec

-06

Feb-

07

Apr-0

7

Jun-

07

Aug-0

7

Oct

-07

Dec

-07

TeraGrid UsersCurrent AccountsActive UsersNew AccountsGateway UsersTarget

Source: TeraGrid Central Database

11

Growing Usage

Source: TeraGrid Central Database

3.95B NUs delivered in CY2007

12

CY2007 Usage by Discipline

3.95B NUs delivered in CY2007

Molecular

Biosciences

31%

Chemistry

17%Physics

17%

Astronomical

Sciences12%

Materials Research

6%

Earth Sciences

3%

All 19 Others

4%

Advanced Scientific Computing

2%

Atmospheric

Sciences

3%

Chemical, Thermal

Systems

5%

13

Grid Projects in the US

Community

Service Provider

Content

Services

Resources

SoftwareProviders

ResourceProvider

Service Provider

For example: Build and test service (Wisc) Certificate Authorities Cancer Biology Informatics Grid LIGO Data Grid

14

caBIG: sharing of infrastructure, applications, and data.

DataIntegration!

Services& Cancer Biology Globus

15

Microarray

NCICB

ResearchCenter

Gene Databas

e

Grid-Enabled Client

ResearchCenter

Tool 1

Tool 2caArray

Protein Database

Tool 3

Tool 4

Grid Data Service

Analytical Service

Image

Tool 2

Tool 3

Grid Services Infrastructure(Metadata, Registry, Query,

Invocation, Security, etc.)

Grid Portal

caBIG Under the Covers

Globus

16

Birmingham•

LIGO Data Grid

Replicating >1 Terabyte/day to 8 sites770 TB replicated to date: >120 million

replicasMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

Ann Chervenak et al., ISI; Scott Koranda et al, LIGO

Globus

17

Grid Projects in the US

Community

Service Provider

Content

Services

Resources

SoftwareProviders

ResourceProvider

Community

For example: Earth System Grid Children’s Oncology Grid Southern California

Earthquake Center (SCEC) Science gateways

18

Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal

198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint

DOE/NSF climate modeling experiments

35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the

Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models

8,000 registered users 1,900 registered projects

Downloads to date 49 TB 176,000 files

Downloads to date 387 TB 1,300,000 files 500 GB/day

(average)

400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data

Earth System Grid

ESG usage: over 500 sites worldwide

ESG monthly download volumes

Globus

19

Pathway Instantiations

SCEC Community Modeling Environment

Knowledge Base

OntologiesCurated taxonomies,

Relations & constraints

Pathway ModelsPathway templates,

Models of simulation codes

Code Repositories

Data & SimulationProductsData Collections

FSM

RDM

AWM

SRM

Storage

GRIDPathway Execution

Policy, Data ingest, Repository access

Grid ServicesCompute & storage management, Security

DIGITALLIBRARIES

Navigation &Queries

Versioning,Replication

MediatedCollectionsFederated

access

KNOWLEDGEACQUISITION

Acquisition InterfacesDialog planning,

Pathway constructionstrategies

Pathway AssemblyTemplate instantiation,

Resource selection,Constraint checking

KNOWLEDGE REPRESENTATION & REASONINGKnowledge Server

Knowledge base access, InferenceTranslation Services

Syntactic & semantic translation

Computing

Users

A collaboratory for system-level earthquake science

Globus

20

Seismic Hazard Analysis

Intensity measure: peak ground acceleration

Interval: 50 yrs

Probability of exceedance: 2%

Defn: Max. intensity of shaking expected at a site during a fixed time interval

Example: National seismic hazard maps

(http://geohazards.cr.usgs.gov/eq/)(http://geohazards.cr.usgs.gov/eq/)

Globus

21

SDSCUSC

SCEC

PSC TeraGrid ISI

12 CPUs 1,700 CPUs 1,200 CPUs

1 CPU4 CPUs

• Prepare input to Pathway2 wave propagation code • Pathway2PGV converts output into hazard map• Map is visualized

SCEC Computations & Grid Globus

22

Children’s Oncology Gridand MEDICUS

Globus

23

Grid Projects in the US

Community

Service Provider

Content

Services

Resources ResourceProvider

SoftwareProviders

24

Software Providers

Globus [GT4.2 released July 2, 2008] GRAM, GridFTP, MDS, RLS, DRS, … GSI, GridShib, MyProxy, … GridWay (Spain), OGSA-DAI (UK), Introduce, …

Condor

MPI-G, Swift, Pegasus, Taverna (UK), Kepler caBIG: e.g., Introduce Virtual Data Toolkit (includes VOMS [Italy], …) SRB, iRODS, MyCluster, … …

Globus

25

Virtual Data Toolkit (VDT)Software Release Process

VDT components over time: built for 15 Linux Versions

Development & testing

Globus

26

ApplnService

Create

Index service

StoreRepository ServiceAdvertize

Discover

Invoke;get results

Introduce

Container

Transfer GAR

Deploy

Ohio State University and Argonne/U.Chicago

Creating Services:Introduce and gRAVI

Introduce Define service Create skeleton Discover types Add operations Configure security

Grid Remote Application Virtualization Infrastructure Wrap executables

Globus

27

Composing Services

Globus

28

Service Discovery:Registries

Globus

29

CommunityCommunity

Challenges

Community

Service Provider

Content

Services

Resources ResourceProvider

SoftwareProviders

Conflicting Missions

SustainabilityDiscipline science pull

30

The Future

NSF eXtreme Digital (XD) solicitation Aka “TeraGrid III”

DOE, NIH, etc.—what do they want?

International cooperation