Welcome, Opening Remarks, Goals and Agenda

Towards a Grid-enabled Towards a Grid-enabled Analysis EnvironmentAnalysis Environment

Harvey B. NewmanHarvey B. Newman California Institute of TechnologyCalifornia Institute of Technology

Grid-enabled Analysis Environment WorkshopGrid-enabled Analysis Environment WorkshopJune 23, 2003June 23, 2003

Welcome to Caltech Welcome to Caltech and Our Workshopand Our Workshop

Welcome to Caltech Welcome to Caltech and Our Workshopand Our Workshop

CaltechCaltech LogisticsLogistics Agenda Agenda Our Staff Our Staff

GAE Workshop Agenda GAE Workshop Agenda OverviewOverview

GAE Workshop Agenda GAE Workshop Agenda OverviewOverview

Monday Until 3:15 PM: Monday Until 3:15 PM: Presentations on Existing Work, IdeasPresentations on Existing Work, Ideas (Lunch at Noon Next to Lauritsen; Breaks at 10:30, 3:15) (Lunch at Noon Next to Lauritsen; Breaks at 10:30, 3:15)

3:15 - 6:003:15 - 6:00 GAE DemonstrationsGAE Demonstrations Tuesday 9:00 – 9:15 Tuesday 9:00 – 9:15 PDA JAS Client (A. Anjum, Pakistan)PDA JAS Client (A. Anjum, Pakistan)

9:15 – 10:30 9:15 – 10:30 Discussion of Workshop Goals and PlanDiscussion of Workshop Goals and Plan 10:30 – 11:00 10:30 – 11:00 BREAKBREAK 11:00 – 12:45 11:00 – 12:45 Discussion of GAE Architectures for LHC Discussion of GAE Architectures for LHC

Global Analysis and Remote WorkingGlobal Analysis and Remote Working 12:45 – 2:0012:45 – 2:00 LUNCHLUNCH 2:00 – 4:00 2:00 – 4:00 Discussion of Simulating GAE Discussion of Simulating GAE

Architectures Architectures 4:00 – 4:30 4:00 – 4:30 BREAKBREAK 4:30 – 5:30 4:30 – 5:30 Existing Analysis Software, Grid prod. Existing Analysis Software, Grid prod.

Systems; Integration in Candidate Arch.Systems; Integration in Candidate Arch.

6:30 6:30 Workshop Dinner at the AthenaeumWorkshop Dinner at the Athenaeum

GAE Workshop Agenda GAE Workshop Agenda Overview (Cont’d)Overview (Cont’d)

GAE Workshop Agenda GAE Workshop Agenda Overview (Cont’d)Overview (Cont’d)

Wednesday 9:00 – 10:30Wednesday 9:00 – 10:30 Existing Analysis software, Grid Prod. Existing Analysis software, Grid Prod. Systems; Integration in Arch. (Cont’d)Systems; Integration in Arch. (Cont’d)

10:30 – 11:00 10:30 – 11:00 BREAKBREAK 11:00 – 12:15 11:00 – 12:15 Future Activities, relationships with LHC Future Activities, relationships with LHC

Experiments, LCG, US-Grid Projects, Experiments, LCG, US-Grid Projects, CrossGrid, etc.CrossGrid, etc. 12:15 12:15 Conclusions, Comments, Workshop WrapupConclusions, Comments, Workshop Wrapup 12:30 12:30 ADJOURNADJOURN

LHC Data Grid HierarchyLHC Data Grid Hierarchy

Tier 1

Tier2 Center

Online System

CERN 700k SI95 ~1 PB Disk; Tape Robot

FNALIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100-1500 MBytes/sec

2.5-10 Gbps

0.1 to 10 Gbps

Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.Physics data cache

~PByte/sec

~2.5-10 Gbps

Tier2 CenterTier2 CenterTier2 Center

~2.5-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

CERN/Outside Resource Ratio ~1:2Tier0/Tier1/Tier2 ~1:1:1

Emerging Vision: A Richly Structured, Global Dynamic System

Issues For Grid Enabled Issues For Grid Enabled Analysis (GEA): Introduction Analysis (GEA): Introduction

(1)(1)

Issues For Grid Enabled Issues For Grid Enabled Analysis (GEA): Introduction Analysis (GEA): Introduction

(1)(1) The Problem: How can [experiments with] ~1000 physicists at ~100 The Problem: How can [experiments with] ~1000 physicists at ~100 Institutes in ~30 countries do “their analysis” effectively: A BalanceInstitutes in ~30 countries do “their analysis” effectively: A Balance

Efficient use of resources versus turnaround-time Efficient use of resources versus turnaround-time Central control/sharing of distributed resources;Central control/sharing of distributed resources;

versus use of local and regional resources under versus use of local and regional resources under group and regional control group and regional control

Proximity of the Jobs to the Data (if enough CPU+Priority); Proximity of the Jobs to the Data (if enough CPU+Priority); versus data transport to, and use of, more local resourcesversus data transport to, and use of, more local resources

And many other related issues…And many other related issues… The Problem, and its solution, also apply to “Production”, The Problem, and its solution, also apply to “Production”,

but the problem is most evident and severe for GEAbut the problem is most evident and severe for GEA A large and diverse community of usersA large and diverse community of users A wide range of tasks with a wide range of priorities;A wide range of tasks with a wide range of priorities;

representing large and small groups, and individuals representing large and small groups, and individuals But all the work has to get doneBut all the work has to get done

Which Approach: from Centralized Computing to Peer-to-PeerWhich Approach: from Centralized Computing to Peer-to-Peer For example: Managed, Structured P2P ?For example: Managed, Structured P2P ?

Computing Model Progress CMS Internal Review of Software and Computing

Example TExample Tag (JetMET) ag (JetMET) Web ServicesWeb Services

CAIGEE Draft ArchitectureCAIGEE Draft Architecture

CAIGEECAIGEEARCHITECTUREARCHITECTURE

NSF ITR: “Private” Grids & P2PNSF ITR: “Private” Grids & P2PSub-Communities in Global HEPSub-Communities in Global HEP

L. Bauerdick, FNAL

Develop and build Develop and build Dynamic WorkspacesDynamic Workspaces

Construct Construct Autonomous Autonomous CommunitiesCommunities Within Within

Global Collaborations Global Collaborations Build Build Private GridsPrivate Grids to to

support scientific analysis support scientific analysis communitiescommunities

e.g. Using Agent Based e.g. Using Agent Based Peer-to-peer Web Peer-to-peer Web ServicesServices

NSF ITR: Globally EnabledNSF ITR: Globally EnabledAnalysis CommunitiesAnalysis Communities

HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps

HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps

Year Production Experimental Remarks

2001 0.155 0.622-2.5 SONET/SDH

2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.

2003 To 2.5 To 10 DWDM; 1 + 10 GigE Integration

2005 10 2-4 X 10 Switch; Provisioning

2007 2-4 X 10 ~10 X 10; 40 Gbps

1st Gen. Grids

2009 ~10 X 10 or 1-2 X 40

~5 X 40 or ~20-50 X 10

40 Gbps Switching

2011 ~5 X 40 or

~20 X 10

~25 X 40 or ~100 X 10

2nd Gen Grids Terabit Networks

2013 ~Terabit ~MultiTbps ~Fill One Fiber

Continuing the Trend: ~1000 Times Bandwidth Growth Per DecadeGRIDS WILL BECOME MORE DYNAMIC

[email protected] ARGONNE ö CHICAGO

Grid Architecture Layers

“Talking to things”: Communication (Internet protocols) & security

“Sharing single resources”: Negotiating access, controlling use

“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

“Controlling things locally”: Access to, & control of resources

Connectivity

Resource

Collective

Application

Fabric

Internet

Transport

Appli-cation

Link

Inte

rnet P

roto

col

Arc

hite

ctu

re

More info: www.globus.org/research/papers/anatomy.pdf

The original Computational and Data Grid concepts are The original Computational and Data Grid concepts are largely stateless, open systems: known to be scalable largely stateless, open systems: known to be scalable

Analogous to the WebAnalogous to the Web The classical Grid architecture has a number of implicit The classical Grid architecture has a number of implicit

assumptionsassumptions The ability to locate and schedule suitable resources,The ability to locate and schedule suitable resources,

within a tolerably short time (i.e. resource richness) within a tolerably short time (i.e. resource richness) Short transactions with relatively simple failure modesShort transactions with relatively simple failure modes

HEP Grids are Data Intensive, and Resource-ConstrainedHEP Grids are Data Intensive, and Resource-Constrained 1000s of users competing for resources at 100s of sites1000s of users competing for resources at 100s of sites Resource usage governed by local and global policiesResource usage governed by local and global policies Long transactions; some long queuesLong transactions; some long queues

Need Realtime Monitoring and TrackingNeed Realtime Monitoring and Tracking Distributed failure modes Distributed failure modes Strategic task managementStrategic task management

HENP Data Grids Versus HENP Data Grids Versus Classical GridsClassical Grids

Upcoming HEP Grid Challenges: Workflow Management and Optimization

Upcoming HEP Grid Challenges: Workflow Management and Optimization

Maintaining a Maintaining a Global ViewGlobal View of Resources and System State of Resources and System State End-to-end System MonitoringEnd-to-end System Monitoring Adaptive Learning: new paradigms for optimization, Adaptive Learning: new paradigms for optimization,

problem resolution (eventually automated, in part)problem resolution (eventually automated, in part) Workflow Management,Workflow Management, Balancing Policy Versus Balancing Policy Versus

Moment-to-moment Capability to Complete TasksMoment-to-moment Capability to Complete Tasks Balance High Levels of Usage of Limited Resources Balance High Levels of Usage of Limited Resources

Against Better Turnaround Times for Priority JobsAgainst Better Turnaround Times for Priority Jobs Realtime Error Detection, Propagation; RecoveryRealtime Error Detection, Propagation; Recovery Handling User-Grid Interactions: Guidelines Handling User-Grid Interactions: Guidelines Higher Level Services, and an IntegratedHigher Level Services, and an Integrated

User Environment for the AboveUser Environment for the Above

Physicists’ ApplicationsPhysicists’ Applications Reconstruction, Calibration, Analysis; Code DevelopmentReconstruction, Calibration, Analysis; Code Development

Experiments’ Software Framework LayerExperiments’ Software Framework Layer Modular and Grid-aware: Architecture able toModular and Grid-aware: Architecture able to

interact effectively with the lower layers interact effectively with the lower layers Grid Applications Layer: Time-Dependent Metrics and MethodsGrid Applications Layer: Time-Dependent Metrics and Methods

(Parameters and algorithms that govern system operations)(Parameters and algorithms that govern system operations) Policy and priorityPolicy and priority Workflow-management Workflow-management Task-Placement Task-Placement Overall Performance:Overall Performance: STEERINGSTEERING methods methods

Global End-to-End System Services Layer: MechanismsGlobal End-to-End System Services Layer: Mechanisms Monitoring and Tracking Monitoring and Tracking Workflow Monitoring Workflow Monitoring Error recovery and redirection Error recovery and redirection System self-monitoring, evaluation and System self-monitoring, evaluation and optimisationoptimisation

HENP Grid Architecture: HENP Grid Architecture: Layers Above the Collective Layer Layers Above the Collective Layer

The Move to OGSA and StatefulThe Move to OGSA and StatefulManaged SystemsManaged Systems

Incr

ease

d f

un

ctio

nal

ity,

stan

dar

diz

atio

n

Time

Customsolutions

Open GridServices Arch

GGF: OGSI, …(+ OASIS, W3C)

Multiple implementations,including Globus Toolkit

Web services + …

Globus Toolkit

Defacto standardsGGF: GridFTP, GSI

X.509,LDAP,FTP, …

App-specificServices~Integrated Systems~Integrated Systems

Stateful; Managed

OGSA Example:OGSA Example:Reliable File Transfer ServiceReliable File Transfer Service

Performnce

Policy

Faults

servicedataelements

Pending

FileTransfer

InternalState

GridService

Notf’nSource

Policy

interfacesQuery &/orsubscribe

to service data

FaultMonito

r

Perf.Monitor

Client Client Client

Request & manage file transfer operations

Data transfer operations

A standard substrate: A standard substrate: the Grid servicethe Grid service Standard interfaces & Standard interfaces &

behaviors; to address behaviors; to address key distributed system key distributed system issuesissues

Refactoring, extension Refactoring, extension of Globus Toolkit of Globus Toolkit protocol suiteprotocol suite

Example “COG” Computing EnvironmentBasic, Advanced and Commodity Services

G. Von Laszewski et al. GC2002G. Von Laszewski et al. GC2002Session and ExecutionEnvironment Concepts

Building an Architecture Requires:Components that Communicate and InterworkDefining How the Components are Integrated

Interfaces Cooperative Behaviors

A System Concept

Analysis Desktop; Applications & FrameworksAnalysis Desktop; Applications & Frameworks

Advanced Grid Advanced Grid Service SessionsService Sessions

Managed Managed Grid ServicesGrid Services

Basic Grid ServicesBasic Grid Services

Dynamic Distributed Services Architecture (DDSA): Caltech/Romania/Pakistan

Dynamic Distributed Services Architecture (DDSA): Caltech/Romania/Pakistan

““Station Server” Services-enginesStation Server” Services-engines at sites host “Dynamic Services” at sites host “Dynamic Services” Auto-discovering, CollaborativeAuto-discovering, Collaborative

Service Agents: Goal-Oriented, Service Agents: Goal-Oriented, Autonomous, Adaptive Autonomous, Adaptive

Servers interconnect dynamically; Servers interconnect dynamically; form a robust fabric in which form a robust fabric in which mobile agents travel, with a mobile agents travel, with a payload of (analysis) taskspayload of (analysis) tasks

Event notification, subscriptionEvent notification, subscription Adaptable to Web services: Adaptable to Web services:

OGSA; many platforms.OGSA; many platforms. Also mobile working Also mobile working

environmentsenvironments

StationStationServerServer



LookupLookupServiceService

LookupLookupServiceService

Proxy ExchangeProxy Exchange

Registration

Registration

Service Listener

Service Listener

Lookup Lookup Discovery Discovery

ServiceService

Remote Notification

Remote Notification

By I. Legrand et al. Deployed on US CMS Grid,

+ increasing number of sites Agent-based Dynamic

information / resource discovery mechanism

Talks w/Other Mon. Systems Implemented in

Java/Jini; SNMP WDSL / SOAP with UDDI

Part of a Global Grid Control Room Service

MonALISA: A Globally Scalable Grid Monitoring System

http://www.naradabrokering.org gcf,spallick,[email protected]://www.naradabrokering.org gcf,spallick,[email protected]

P2P Narada Broker Network (P2P) Community

Database

Resource

Broker

Broker

Broker

Broker

Broker

Broker

Software multicast

(P2P) Community

(P2P) Community

For message/events service

(P2P) Community

G. Fox et al., GC2002, Ch. 22

Services-Oriented Architecture: Services-Oriented Architecture: Service ContractsService Contracts

A Grid Market EconomyA Grid Market Economy Procure and/or Trade Procure and/or Trade

ResourcesResources Service Contracts Service Contracts

Service Owners: Service Owners: Institutional Context Institutional Context

Service Contracts Nego-Service Contracts Nego- tiated in Marketplaces tiated in Marketplaces

Each with Its Own Rules,Each with Its Own Rules, Set by Marketplace Set by Marketplace Owners (VOs) Owners (VOs)

Renegotiation PossibleRenegotiation Possible at Contract Creation Time at Contract Creation Time

Track Service: Oversee Track Service: Oversee Satisfaction of Contract Satisfaction of Contract

D. De Roure et al., The Semantic Grid, GC 2002 Ch. 17

GAE Issues: A Computing Model;GAE Issues: A Computing Model; Key Role of Simulation and Prototyping Key Role of Simulation and Prototyping

GAE Issues: A Computing Model;GAE Issues: A Computing Model; Key Role of Simulation and Prototyping Key Role of Simulation and Prototyping

Need a “Computing Model”: Common Aspects; By ExperimentNeed a “Computing Model”: Common Aspects; By Experiment A complete picture of what should happen: A complete picture of what should happen:

All Tasks, Policies and Priorities, Performance Targets, All Tasks, Policies and Priorities, Performance Targets, Corrective ActionsCorrective Actions

Answer these Questions (Among Others): Answer these Questions (Among Others): 1.1. ““How many jobs using how much CPU and how much How many jobs using how much CPU and how much

data are accessed by how many physicists at how data are accessed by how many physicists at how many locations, how often, and how ?” many locations, how often, and how ?” [Specify the Tasks and Their Profiles][Specify the Tasks and Their Profiles]

2.2. “ “What Performance Can Users Expect ? What’s Normal ?” What Performance Can Users Expect ? What’s Normal ?” [Specify Turnaround Time Profiles][Specify Turnaround Time Profiles]

3.3. ““What happens when it doesn’t work; what do you What happens when it doesn’t work; what do you (or the system) do then ?” (or the system) do then ?” [Specify Corrective Actions: Strategies & Methodology][Specify Corrective Actions: Strategies & Methodology]

Therefore: A Key Role of Modeling and SimulationTherefore: A Key Role of Modeling and Simulation

MONARC/SONN: 3 Regional Centres MONARC/SONN: 3 Regional Centres Learning to Export Jobs (Day 9)Learning to Export Jobs (Day 9)

NUST20 CPUs

CERN30 CPUs

CALTECH25 CPUs

1MB/s ; 150 ms RTT

1.2 MB

/s

150 ms R

TT

0

.8 M

B/s

200

ms

RTT

Day = 9

<E> = 0.73

<E> = 0.66

<E> = 0.83

Model Higher Level Services

Simulations for Strategy and HLS Develop

GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1)GAE Workshop Goals (1) ““Getting Our Arms Around” the Grid-Enabled Getting Our Arms Around” the Grid-Enabled

Analysis “Problem” Analysis “Problem” Review Existing Work Towards a GAE:Review Existing Work Towards a GAE:

Components, Interfaces, System Concepts Components, Interfaces, System Concepts Review Client Analysis Tools; Consider How to Integrate ThemReview Client Analysis Tools; Consider How to Integrate Them User Interfaces: What does the GAE Desktop Look Like ?User Interfaces: What does the GAE Desktop Look Like ?

(Different Flavors) (Different Flavors) Look At Requirements, Ideas for a GAE Architecture Look At Requirements, Ideas for a GAE Architecture

A Vision of the System’s Goals and WorkingsA Vision of the System’s Goals and Workings Attention to Strategy and Policy Attention to Strategy and Policy

Develop (Continue) a Program of Simulations Develop (Continue) a Program of Simulations of the System of the System For the Computing Model, and Defining the GAEFor the Computing Model, and Defining the GAE Essential for Developing a Feasible Vision; DevelopingEssential for Developing a Feasible Vision; Developing

Strategies, Solving Problems and Optimizing the System Strategies, Solving Problems and Optimizing the System With a Complementary Program of PrototypingWith a Complementary Program of Prototyping

GAE Collaboration DesktopGAE Collaboration DesktopExampleExample

Four-screen Analysis Desktop Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024; RH94 Flat Panels: 5120 X 1024; RH9

Driven by a single server and Driven by a single server and single graphics cardsingle graphics card

Allows simultaneous work on:Allows simultaneous work on: Traditional analysis tools Traditional analysis tools

(e.g. ROOT)(e.g. ROOT) Software development Software development Event displays (e.g. IGUANA)Event displays (e.g. IGUANA) MonALISA monitoring MonALISA monitoring

displays; Other “Grid Views”displays; Other “Grid Views” Job-progress ViewsJob-progress Views Persistent collaboration Persistent collaboration

(e.g. VRVS; shared windows)(e.g. VRVS; shared windows) Online event or detector Online event or detector

monitoringmonitoring Web browsing, emailWeb browsing, email

GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2)GAE Workshop Goals (2) Architectural Approaches: Choose A Feasible Direction Architectural Approaches: Choose A Feasible Direction

For example a For example a Managed Services ArchitectureManaged Services Architecture Be Prepared to Learn by Doing;Be Prepared to Learn by Doing;

Simulating and Prototyping Simulating and Prototyping Where to Start, and the Development StrategyWhere to Start, and the Development Strategy

Existing and MissingExisting and Missing Parts of the System Parts of the System [Layers; Concepts] [Layers; Concepts]

When to Adapt Existing Components, When to Adapt Existing Components, Or to Re-Build Them “from Scratch” Or to Re-Build Them “from Scratch”

Manpower Available to Meet the Goals; ShortfallsManpower Available to Meet the Goals; Shortfalls Allocation of Tasks; Including Generating a PlanAllocation of Tasks; Including Generating a Plan

Linkage Between Analysis and Grid-Enabled ProductionLinkage Between Analysis and Grid-Enabled Production Planning for Closer Relationship with LCG, Trillium, Planning for Closer Relationship with LCG, Trillium,

and the Experiments’ starting Efforts in this areaand the Experiments’ starting Efforts in this area


Some Extra Slides Some Extra Slides

FollowFollow

Self Discovering, CooperativeSelf Discovering, Cooperative Registered Services, Lookup Services; self-describingRegistered Services, Lookup Services; self-describing “ “Spaces” for Mobile Code and ParametersSpaces” for Mobile Code and Parameters

Scalable and Robust Scalable and Robust Multi-threaded: with a thread pool managing engineMulti-threaded: with a thread pool managing engine Loosely Coupled: errors in a thread don’t stop the task Loosely Coupled: errors in a thread don’t stop the task

Stateful: System State as well as task stateStateful: System State as well as task state Rich set of “problem” situations: implies Rich set of “problem” situations: implies Grid Views, Grid Views,

and and User/System DialoguesUser/System Dialogues on what to do on what to do For Example: Raise Priority (Burn Quota); or Redirect WorkFor Example: Raise Priority (Burn Quota); or Redirect Work

Eventually may be increasingly automated asEventually may be increasingly automated as we scale up and gain experience we scale up and gain experience

Managed; to deal with a Complex Execution EnvironmentManaged; to deal with a Complex Execution Environment Real time higher level supervisory services monitor, Real time higher level supervisory services monitor, track, optimize and Revive/Restart services as needed track, optimize and Revive/Restart services as needed

Policy and strategy-driven; Self-Evaluating and OptimizingPolicy and strategy-driven; Self-Evaluating and Optimizing Investable with increasing intelligenceInvestable with increasing intelligence

Agent Based; Evolutionary Learning AlgorithmsAgent Based; Evolutionary Learning Algorithms

HENP Grids: Services Architecture HENP Grids: Services Architecture Design for a Global SystemDesign for a Global System

Generate a Blueprint: A “Computing Model”Generate a Blueprint: A “Computing Model”Tasks Tasks Workload, Facilities, Priorities & GOALS Workload, Facilities, Priorities & GOALS Persistency; Modes of Accessing Data (e.g. Object Collections)Persistency; Modes of Accessing Data (e.g. Object Collections) What runs where; when to redirectWhat runs where; when to redirect The User’s Working EnvironmentThe User’s Working Environment

What is normal (managing expectations) ?What is normal (managing expectations) ? Guidelines for dealing with problems: Guidelines for dealing with problems: based on which information ? based on which information ?

Performance and problem reporting/tracking/handling ?Performance and problem reporting/tracking/handling ? Known Problems: Strategies to deal with thoseKnown Problems: Strategies to deal with those

Set up, code a Simulation of the ModelSet up, code a Simulation of the Model Develop mechanisms and sub-models as neededDevelop mechanisms and sub-models as needed

Set up prototypes to measure the performance parameters Set up prototypes to measure the performance parameters where not already known to sufficient precisionwhere not already known to sufficient precision

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (I)and an Analysis Strategy (I)

Run simulations (avatars for “actors”; agents; tasks; mechanisms)Run simulations (avatars for “actors”; agents; tasks; mechanisms) Analyze and evaluate performanceAnalyze and evaluate performance

General performance (throughput; turnaround)General performance (throughput; turnaround) Ensure “all” work is done: learn how to do this: within a Ensure “all” work is done: learn how to do this: within a reasonable time; compatible with the Collaboration’s guidelinesreasonable time; compatible with the Collaboration’s guidelines

Vary Model to Improve PerformanceVary Model to Improve Performance Deal with bottlenecks and other problemsDeal with bottlenecks and other problems New strategies and/or mechanisms to manage workflowNew strategies and/or mechanisms to manage workflow Represent key features and behaviors, for example:Represent key features and behaviors, for example:

Responses to Link or Site failuresResponses to Link or Site failures User input to redirect data or jobsUser input to redirect data or jobs Monitoring information gathering Monitoring information gathering Monitoring and management agent actions and Monitoring and management agent actions and behaviors in a variety of situations behaviors in a variety of situations

Validate the ModelValidate the Model Using Dedicated setupsUsing Dedicated setups Using Data Challenges (measure, evaluate, compare; fix key items)Using Data Challenges (measure, evaluate, compare; fix key items) Learn of new factors and/or behaviors to take into accountLearn of new factors and/or behaviors to take into account

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (II)and an Analysis Strategy (II)

MAJOR MilestoneMAJOR Milestone: Obtain a first picture of a Model that : Obtain a first picture of a Model that Seems to Work Seems to Work

This may or may not involve changes in the computing resource This may or may not involve changes in the computing resource requirements-estimates; or Collaboration policies and expectationsrequirements-estimates; or Collaboration policies and expectations It is hard to estimate how long it will take to It is hard to estimate how long it will take to reach this milestone reach this milestone [most experiments until now have reached it [most experiments until now have reached it after the start of data taking] after the start of data taking]

Evolve the Model to Evolve the Model to Distinguish what works and what does notDistinguish what works and what does not Incorporate evolving site hardware and network performanceIncorporate evolving site hardware and network performance Progressively incorporate new and “better” strategies, to Progressively incorporate new and “better” strategies, to improve throughput and/or turnarounds, or fix critical problems improve throughput and/or turnarounds, or fix critical problems Take into account experience with the actual software-system Take into account experience with the actual software-system components as they developcomponents as they develop

In parallel with the Model evolution keep developing the overallIn parallel with the Model evolution keep developing the overall data analysis + Grid + monitoring “system”; represent it in the data analysis + Grid + monitoring “system”; represent it in the simulation simulation

And the associated strategiesAnd the associated strategies

Building a Computing ModelBuilding a Computing Modeland an Analysis Strategy (III)and an Analysis Strategy (III)


NaradaBrokering Based on a network of cooperating broker nodes

• Cluster based architecture allows system to scale to arbitrary size

Originally to provide uniform software multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems.

Now has four major core functions• Message transport (based on performance) in multi-link

fashion• General publish-subscribe including JMS & JXTA• Support for RTP-based audio/video conferencing.• Federation of multiple instances (just starting) of Grid

services


Role of Event/Message Brokers We will use events and messages interchangeably

• An event is a time stamped message Our systems are built from clients, servers and “event brokers”

• These are logical functions – a given computer can have one or more of these functions

• In P2P networks, computers typically multifunction; in Grids one tends to have separate function computers

• Event Brokers “just” provide message/event services; servers provide traditional distributed object services as Web services

There are functionalities that only depend on event itself and perhaps the data format; they do not depend on details of application and can be shared among several applications• NaradaBrokering is designed to provide these

functionalities


Why P2P? Core features

• Resource Sharing & Discovery CPU cycles: SETI@home, Folding@HOME File Sharing: Napster, Gnutella

Deployments user driven

• No dedicated management Management of resources

• Expose resources & specify security strategy

• Replicate resources based on demand Dynamic peer groups, fluid group memberships Sophisticated search mechanisms

• Peers respond to queries based on their interpretations

• Responses do not conform to traditional templates.

mailto:SETI@home

mailto:Folding@HOME


What are the downsides? Interactions are attenuated

• Localized• Fragmented world of multiple P2P subsystems

Routing not very sophisticated• Inefficient network utilization (Tragedy of Commons)• Simple forwarding• Peer Traces (to eliminate echoing)• Attenuations (to suppress propagation)

TTL’s associated with interactions.


Narada-JXTA events

Peer AdvertisementRequest

(a)

Narada Headers

JXTA Interaction Type

Narada Connection Info

Peer group id

Peer Advertisement

Narada Event DistributionTraces

After Peer AdvertisementResponse

(c)

Narada Headers

JXTA Interaction Type(Subscription)

Peer group id

Peer id


Peer AdvertisementResponse

(b)

Narada Headers

JXTA Interaction Type

Narada Connection Info

Peer group id

Peer id

Peer Advertisement


Requests/Responses to be part of a certain peer group


NaradaBrokering Results (G. Fox et al.)

i4 5

6 l13 14

15

j7 8

9

h1 2

3

k10 11

12

m16 17

18

n20

21

19

22

MeasuringSubscriber

Publisher Transit Delay under different matching rates:22 Brokers 102 Clients

Match Rate=100% Match Rate=50% Match Rate=10%

0 1002003004005006007008009001000Publish Rate (Events/sec)

0 50100150200250300350400450500

Event Size (Bytes)

050

100150200250300350400450

Mean Transit Delay (MilliSeconds)

Services-Oriented Architecture: Services-Oriented Architecture: Key Functions of ComponentsKey Functions of Components

D. De Roure et al., The Semantic Grid, GC 2002 Ch. 17

Grid Enabled Analysis (GEA) CrossGrid “Componentology”Grid Enabled Analysis (GEA) CrossGrid “Componentology”

Specify, Study, Iterate: Evolve with Specify, Study, Iterate: Evolve with Experience and Advancing Technologies Experience and Advancing Technologies

Workload: Actors (Tasks) Workload: Actors (Tasks) Job & Data Transport Profiles, Frequency Job & Data Transport Profiles, Frequency Site components and architectures: Performance vs. Load; Failure ModesSite components and architectures: Performance vs. Load; Failure Modes Data Structures, Streams, and Access Methods: (Sub-collections)Data Structures, Streams, and Access Methods: (Sub-collections) Networks: scale, operations, main behaviorsNetworks: scale, operations, main behaviors Operational Modes (Develop a Common Understanding ?)Operational Modes (Develop a Common Understanding ?)

e.g. What are guidelines and steps that make up thee.g. What are guidelines and steps that make up the data access/processing/analysis policy and strategy (the GEA) data access/processing/analysis policy and strategy (the GEA)

e.g. What should the user do under different situations ?e.g. What should the user do under different situations ? If we have automated services to help users: what do they do ?If we have automated services to help users: what do they do ? What are the technical goals + emphasis of the systemWhat are the technical goals + emphasis of the system

How is it intended to be used by the Collaboration ?How is it intended to be used by the Collaboration ? High Level Software Services architecture: High Level Software Services architecture:

Adaptive, partly autonomous, e.g. agent-basedAdaptive, partly autonomous, e.g. agent-based How/how much should they steer the system, to get the work done ?How/how much should they steer the system, to get the work done ?

Note: Common services among experiments imply some similar op. modesNote: Common services among experiments imply some similar op. modes

The LHC Distributed ComputingThe LHC Distributed ComputingModel: Getting StartedModel: Getting Started

2001 Transatlantic Net WG 2001 Transatlantic Net WG Bandwidth Bandwidth Requirements [*]Requirements [*]

2001 2002 2003 2004 2005 2006

CMS 100 200 300 600 800 2500

ATLAS 50 100 300 600 800 2500

BaBar 300 600 1100 1600 2300 3000

CDF 100 300 400 2000 3000 6000

D0 400 1600 2400 3200 6400 8000

BTeV 20 40 100 200 300 500

DESY 100 180 210 240 270 300

CERN BW

155-310

622 2500 5000 10000 20000

[*] [*] See See http://gate.hep.anl.gov/lprice/TAN. The 2001LHC requirements outlook now looks Very Conservative in 2003

http://gate.hep.anl.gov/lprice/TAN

FAST TCP: Aggregate Throughput

1 flow 2 flows 7 flows 9 flows 10 flows

Average utilization

95%

92%

90%

90%

88%

Measurements Std Packet Size Utilization averaged

over > 1hr 3000 km Path

RTT estimation: fine-grain timerRTT estimation: fine-grain timer Fast convergence to equilibriumFast convergence to equilibrium Delay monitoring in equilibriumDelay monitoring in equilibrium Pacing: reducing burstinessPacing: reducing burstiness

Now working towards10 Gbps in ~2 Flows

On Feb. 27-28, a On Feb. 27-28, a Terabyte of data was transferred in 3700 Terabyte of data was transferred in 3700 secondsseconds by S. Ravot of Caltech between the Level3 PoP in by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the TeraGrid Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memoryrouter at StarLight from memory to memory As a single TCP/IP stream at average rate of 2.38 Gbps. As a single TCP/IP stream at average rate of 2.38 Gbps. (Using large windows and 9kB “Jumbo frames”)(Using large windows and 9kB “Jumbo frames”)This beat the former record by a factor of ~2.5, and This beat the former record by a factor of ~2.5, and used the US-CERN link at 99% efficiency. used the US-CERN link at 99% efficiency.

10GigE Data Transfer Trial10GigE Data Transfer Trial

European CommissionEuropean Commission

10GigE NIC10GigE NIC

http://www.cern.ch/

UltraLight: An Ultra-scale Optical Network Laboratory for Next Generation Science

UltraLight: An Ultra-scale Optical Network Laboratory for Next Generation Science

Caltech, UF, FIU, UMich, SLAC,FNAL,MIT/Haystack,CERN, UERJ(Rio),

NLR, CENIC, UCAID,Translight, UKLight, Netherlight, UvA, UCLondon, KEK, Taiwan

Cisco, Level(3)

Flagship Applications(HENP, VLBI, Oncology, …)

Grid/Storage Management

Network Protocols &Bandwidth Management

En

d-to

-en

d M

on

ito

rin

gIn

tell

ige

nt

Ag

en

ts

En

d-to

-en

d M

on

ito

rin

gIn

tell

ige

nt

Ag

en

ts

Application Frameworks

Distributed CPU & Storage

Network Fabric

Grid Middleware

Flagship Applications(HENP, VLBI, Oncology, …)

Grid/Storage Management

Network Protocols &Bandwidth Management

En

d-to

-en

d M

on

ito

rin

gIn

tell

ige

nt

Ag

en

ts

En

d-to

-en

d M

on

ito

rin

gIn

tell

ige

nt

Ag

en

ts

Application Frameworks

Distributed CPU & Storage

Network Fabric

Grid Middleware

http://ultralight.caltech.edu

GECSR Proposal: A Grid-Enabled GECSR Proposal: A Grid-Enabled Collaboratory for Scientific ResearchCollaboratory for Scientific Research

Create a Persistent, Data-Intensive Collaboratory for Create a Persistent, Data-Intensive Collaboratory for Analysis by Global HENP collaborations; built to be widely Analysis by Global HENP collaborations; built to be widely applicable to a wide range of other large scale science projectsapplicable to a wide range of other large scale science projects

““Giving scientists from all world regions the means to function Giving scientists from all world regions the means to function as full partners in the process of search and discovery” as full partners in the process of search and discovery”

Customization for discipline- and project-specific applications

Community-Specific Knowledge Environments for Research and Ed.(collaboratory, grid community, e-science community, virtual community)

Highperformancecomputation

services(1)

Data, informationknowledge

managementservices

(2)

Observation,measurement

fabricationservices

(3)

Interfaces,visualization

services

(4)

CollaborativeServices

(5)

Networking, Operating systems, Middleware

Base Technology: computation, storage, communication

CMS/MONARC Analysis ModelCMS/MONARC Analysis Model

Hierarchy of Processes (Experiment, Analysis Groups, Individuals)Hierarchy of Processes (Experiment, Analysis Groups, Individuals)

SelectionSelection

Iterative selectionIterative selectionOnce per monthOnce per month~20 Groups’~20 Groups’

ActivityActivity(10(109 9 101077 events) events)

Trigger based andTrigger based andPhysics basedPhysics basedrefinementsrefinements

25 SI95sec/event25 SI95sec/event~20 jobs per month~20 jobs per month

25 SI95sec/event25 SI95sec/event~20 jobs per month~20 jobs per month

AnalysisAnalysisDifferent Physics cutsDifferent Physics cuts

& MC comparison& MC comparison~Once per day~Once per day

~25 Individual~25 Individualper Groupper GroupActivityActivity

(10(1066 –10 –1088 events) events)

Algorithms appliedAlgorithms appliedto datato data

to get resultsto get results

10 SI95sec/event10 SI95sec/event~500 jobs per day~500 jobs per day

10 SI95sec/event10 SI95sec/event~500 jobs per day~500 jobs per day

Monte CarloMonte Carlo

5k SI95sec/event5k SI95sec/event5k SI95sec/event5k SI95sec/event

RAW DataRAW Data

ReconstructionReconstruction Re-processingRe-processing3 Times per year3 Times per year

Experiment-Experiment-Wide ActivityWide Activity(10(1099 events) events)

New detector New detector calibrationscalibrations

Or understandingOr understanding

3000 SI95sec/event3000 SI95sec/event1 job year1 job year

3000 SI95sec/event3000 SI95sec/event1 job year1 job year

3000 SI95sec/event3000 SI95sec/event3 jobs per year3 jobs per year

3000 SI95sec/event3000 SI95sec/event3 jobs per year3 jobs per year

CMS CPU and Storage TotalCMS CPU and Storage Total

TOTAL Active tape for CMS :TOTAL Active tape for CMS :TIER0/1 CERN + 5 TIER1 = 1540 + 5 X 590 =TIER0/1 CERN + 5 TIER1 = 1540 + 5 X 590 = 4490 TB4490 TBTOTAL Archive tape for CMS :TOTAL Archive tape for CMS :TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 2632 + 5 X 433 + 25X50 =TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 2632 + 5 X 433 + 25X50 = 6047 TB6047 TB

TOTAL Tape for CMS :TOTAL Tape for CMS :TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 4172 + 5 X 1023 +25X50TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 4172 + 5 X 1023 +25X50 = = 10537 TB10537 TB

TOTAL Disk for CMS :TOTAL Disk for CMS :TIER0/1 CERN + 5 TIER1 +25 TIER2 = 796 + 5 X 313 + 25X70 =TIER0/1 CERN + 5 TIER1 +25 TIER2 = 796 + 5 X 313 + 25X70 = 4111 TB4111 TB

TOTAL TOTAL CPU CPU for CMS :for CMS :TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 615 + 5 X TIER0/1 CERN + 5 TIER1 + 25 TIER2 = 615 + 5 X 167167 + 25 X 32 = + 25 X 32 = 2250 kSI952250 kSI95

[Where 1 PC (Ca. 2000) ~ 25 SI95][Where 1 PC (Ca. 2000) ~ 25 SI95]

HENP Lambda Grids:Fibers for Physics

HENP Lambda Grids:Fibers for Physics

Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Storesfrom 1 to 1000 Petabyte Data Stores

Survivability of the HENP Global Grid System, with Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007)hundreds of such transactions per day (circa 2007)requires that each transaction be completed in a requires that each transaction be completed in a relatively short time. relatively short time.

Example: Take 800 secs to complete the transaction. ThenExample: Take 800 secs to complete the transaction. Then Transaction Size (TB)Transaction Size (TB) Net Throughput (Gbps)Net Throughput (Gbps) 1 101 10 10 10010 100 100 1000 (Capacity of 100 1000 (Capacity of

Fiber Today) Fiber Today) Summary: Providing Switching of 10 Gbps wavelengthsSummary: Providing Switching of 10 Gbps wavelengths

within ~3-5 years; and Terabit Switching within 5-8 yearswithin ~3-5 years; and Terabit Switching within 5-8 yearswould enable “Petascale Grids with Terabyte transactions”,would enable “Petascale Grids with Terabyte transactions”,as required to fully realize the discovery potential of major HENP as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.programs, as well as other data-intensive fields.

PPDG Past, Present and Future:PPDG Past, Present and Future:OutlineOutline

HENP Challenges: Science Drivers of Data-Intensive HENP Challenges: Science Drivers of Data-Intensive Grid SystemsGrid Systems

Progress in Grids for Physics: 1999-2003Progress in Grids for Physics: 1999-2003 Key Roles of the Particle Physics Data Grid: Key Roles of the Particle Physics Data Grid:

Mission, Focus, Accomplishments Mission, Focus, Accomplishments The Coming Generation ChangeThe Coming Generation Change

HENP Grids: Global End-to-end Managed System ArchitectureHENP Grids: Global End-to-end Managed System Architecture OGSA: Transition to a stateful services architecture;OGSA: Transition to a stateful services architecture;

appropriate for systems of this complexityappropriate for systems of this complexity Rapid Advances in NetworksRapid Advances in Networks

Future Vision: Dynamic, PetaScale Grids Future Vision: Dynamic, PetaScale Grids with Terabyte transactionswith Terabyte transactions


Layered Grid ArchitectureLayered Grid Architecture

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communicat’n(Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arc

hite

ctu

re

“The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, Foster, Kesselman, Tuecke, Intl J. High Performance Computing Applications, 15(3), 2001.

Welcome, Opening Remarks, Goals and Agenda

Technology

Transcript of Welcome, Opening Remarks, Goals and Agenda