The AppLeS Project: Harvesting the Grid Francine Berman U. C. San Diego This presentation will...

Post on 20-Jan-2016

214 views 0 download

Tags:

Transcript of The AppLeS Project: Harvesting the Grid Francine Berman U. C. San Diego This presentation will...

The AppLeS Project: Harvesting the Grid

Francine Berman

U. C. San Diego

Computing Today

data archives

networks

visualizationinstruments

MPPs

clusters

PCs

Workstations

Wireless

The Computational Grid• The Computational Grid

– ensemble of heterogeneous, distributed resources

– emerging platform for high-performance computing

• The Grid is the ultimate “supercomputer”– Grids aggregate an enormous number of resources

– Grids support applications that cannot be executed at any single site

Moreover,

– Grids provides increased access to each kind of resource (MPPs, clusters, etc.)

– Grid can be accessed at any time

Computational Grids• Computational Grids you know and love:

– Your computer lab

– The Internet

– The PACI partnership resources

– Your home computer, CS and all the resources you have access to

• You are already using the Computational Grid …

Programming the Grid I• Basics

– Need way to login, authenticate in different domains, transfer files, coordinate execution, etc.

Grid Infrastructure

Resources

Globus

Legion

Condor

NetSolve

PVM

Application Development Environment

Programming the Grid II

• Performance-oriented programming

– Need way to develop and execute performance-efficient programs

– Program must achieve performance in context of • heterogeneity • dynamic behavior• multiple administrative domains

This can be extremely challenging.

– What approaches are most effective to achieve Grid application execution performance?

Performance-oriented Execution

• Careful scheduling required to achieve application performance in parallel and distributed environments

• For the Grid, adaptivity right approach to leverage deliverable resource capacities

UCSD

UTK

OGI

File Transfer time to SC’99 in Portland [Alan Su]NWS prediction

• Fundamental components:– Application-centric performance model

• Provides quantifiable measure of Grid resources in terms of their potential impact on the application

– Prediction of deliverable resource performance at execution time

– User’s preferences and performance criteria

These components form the basis for AppLeS.

Adaptive Application Scheduling

What is AppLeS ?• AppLeS = Application Level Scheduler

– Joint project with Rich Wolski (U. of Tenn.)

• AppLeS is a methodology– Project has investigated adaptive application scheduling using

dynamic information, application-specific performance models, user preferences.

– AppLeS approach based on real-world scheduling.

• AppLeS is software– Have developed multiple AppLeS-enabled applications which

demonstrate the importance and usefulness of adaptive scheduling on the Grid.

How Does AppLeS Work?• Select resources

• For each feasible resource set, plan a schedule

– For each schedule, predict application performance at execution time

– consider both the prediction and its qualitative attributes

• Deploy the “best” of the schedules wrt user’s performance criteria – execution time– convergence– turnaround time

AppLeS + application• Resource selection• Schedule Planning• Deployment

Grid InfrastructureNWS

Resources

Network Weather Service (Wolski, U. Tenn.)

• The NWS provides dynamic resource information for AppLeS

• NWS is stand-alone system

• NWS – monitors current system state

– provides best forecast of resource load from multiple models

Sensor Interface

Reporting Interface

Forecaster

Model 2 Model 3Model 1

AppLeS Example: Jacobi2D

• Jacobi2D is a regular, iterative SPMD application

• Local communication using 5 point stencil

• AppLeS written in KeLP/MPI by Wolski

• Partitioning approach = strip decomposition

• Goal is to minimize execution time

Jacobi2D AppLeS Resource Selector

• Feasible resources determined according to application-specific “desirability” metrics

– memory

– application-specific benchmark

• Desirability metric used to sort resources

• Feasible resource sets formed from initial subset of sorted desirability list

– Next step is to plan a schedule for each feasible resource set

– Scheduler will choose schedule with best predicted execution time

Execution time for ith strip

where load = predicted percentage of CPU time available (NWS)comm = time to send and receive messages factored by

predicted BW (NWS)

AppLeS uses time-balancing to determine best partition on a given set of resources

Solve

for

loadipt

OperComp

CommCompAreaT

unloadedi

iiii

))((

Jacobi2D Performance Model and Schedule Planning

p T T T 2 1

}{ iArea

P1 P2 P3

Jacobi2D Experiments• Experiments compare

– Compile-time block [HPF] partitioning

– Compile-time irregular strip partitioning [no NWS, no resource selection]

– Run-time strip AppLeS partitioning

• Runs for different partitioning methods performed back-to-back on production systems

• Average execution time recorded

• Distributed UCSD/SDSC platform: Sparcs, RS6000, Alpha Farm, SP-2

Jacobi2D AppLeS Experiments• Representative Jacobi 2D

AppLeS experiment• Adaptive scheduling leverages

deliverable performance of contended system

• Spike occurs when a gateway between PCL and SDSC goes down

• Subsequent AppLeS experiments avoid slow link

0

1

2

3

4

5

6

7

Execu

tion T

ime (

secon

ds)

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

Problem Size

Comparison of Execution Times

Compile-time Blocked

Compile-time Irregular Strip

Runtime

Work-in-Progress: AppLeS Templates

• AppLeS-enabled applications perform well in multi-user environments.

• Methodology is right on target but …

– AppLeS must be integrated with application –-- labor-intensive and time- intensive

– You generally can’t just take an AppLeS and plug in a new application

Templates• Current thrust is to develop AppLeS templates which

– target structurally similar classes of applications

– can be instantiated in a user-friendly timeframe

– provide good application performance

AppLeS Template

ApplicationModule

PerformanceModule

SchedulingModule

DeploymentModule

AP

I

AP

I

AP

I

Network Weather Service

Case Study: Parameter Sweep Template

• Parameter Sweeps = class of applications which are structured as multiple instances of an “experiment” with distinct parameter sets

• Independent experiments may share input files

• Examples:

– MCell

– INS2DApplication Model

Example Parameter Sweep Application: MCell

• MCell = General simulator for cellular microphysiology

• Uses Monte Carlo diffusion and chemical reaction algorithm in 3D to simulate complex biochemical interactions of molecules

– Molecular environment represented as 3D space in which trajectories of ligands against cell membranes tracked

• Researchers plan huge runs which will make it possible to model entire cells at molecular level.

– 100,000s of tasks

– 10s of Gbytes of output data

– Would like to perform execution-time computational steering , data analysis and visualization

PST AppLeS

• Template being developed by Henri Casanova and Graziano Obertelli

• Resource Selection:

– For small parameter sweeps, can dynamically select a performance efficient number of target processors [Gary Shao]

– For large parameter sweeps, can assume that all resources may be used

Platform Model

Cluster

User’s hostand storage

network links

storage

MPP

• Contingency Scheduling: Allocation developed by dynamically generating a Gantt chart for scheduling unassigned tasks

• Basic skeleton1. Compute the next scheduling event

2. Create a Gantt Chart G

3. For each computation and file transfer currently underway, compute an estimate of its completion time and fill in the corresponding slots in G

4. Select a subset T of the tasks that have not started execution

5. Until each host has been assigned enough work, heuristically assign tasks to hosts, filling in slots in G

6. Implement schedule

Scheduling Parameter Sweeps

1 2 1 2 1 2

Network links

Hosts(Cluster 1)

Hosts(Cluster 2)

Com

puta

tion

Tim

e

Resources

G

Schedulingevent

Schedulingevent

Com

puta

tion

Parameter Sweep Heuristics• Currently studying scheduling heuristics useful for parameter sweeps in

Grid environments

• HCW 2000 paper compares several heuristics– Min-Min [task/resource that can complete the earliest is assigned first]

– Max-Min [longest of task/earliest resource times assigned first]

– Sufferage [task that would “suffer” most if given a poor schedule assigned first, as computed by max - second max completion times]

– Extended Sufferage [minimal completion times computed for task on each cluster, sufferage heuristic applied to these]

– Workqueue [randomly chosen task assigned first]

• Criteria for evaluation:

– How sensitive are heuristics to location of shared input files and cost of data transmission?

– How sensitive are heuristics to inaccurate performance information?

Preliminary PST/MCell Results• Comparison of the performance of scheduling heuristics when it is

up to 40 times more expensive to send a shared file across the network than it is to compute a task

• “Extended sufferage” scheduling heuristic takes advantage of file sharing to achieve good application performance

Max-min

Workqueue

XSufferage

Sufferage

Min-min

Preliminary PST/MCell Results with “Quality of Information”

AppLeS in Context• Application Scheduling

– Mars, Prophet/Gallop, MSHN, etc.

• Grid Programming Environments– GrADS, Programmers’ Playground, VDCE, Nile, etc.

• Scheduling Services– Globus GRAM, Legion Scheduler/Collection/Enactor

• Resource and High-Throughput Schedulers– PBS, LSF, Maui Scheduler, Condor, etc.

• PSEs– Nimrod, NEOS, NetSolve, Ninf

• Performance Monitoring, Prediction and Steering– Autopilot, SciRun, Network Weather Service, Remos/Remulac, Cumulus

• AppLeS project contributes careful and useful study of adaptive scheduling for Grid environments

New Directions• Quality of Information

– Adaptive Scheduling

– AppLePilot / GrADS

• Resource Economies– Bushel of AppLeS

– UCSD Active Web

• Application Flexibility– Computational Steering

– Co-allocation

– Target-less computing

Quality of Information

• How can we deal with imperfect or imprecise predictive information?

• Quantitative measures of qualitative performance attributes can improve scheduling and execution– lifetime

– cost, overhead

– accuracy

– penalty

B CAtim

e

Using Quality of Information• Stochastic Scheduling: Information about the variability of the

target resources can be used by scheduler to determine allocation

– Resources with more performance variability assigned slightly less work

– Preliminary experiments show that resulting schedule performs well and can be more predictable

SOR Experiment [Jenny Schopf]

Current “Quality of Information” Projects

• “AppLepilot”

– Combines AppLeS adaptive scheduling methodology with “fuzzy logic” decision making mechanism from Autopilot

– Provides a framework in which to negotiate Grid services and promote application performance

– Collaboration with Reed, Aydt, Wolski

– Builds on the software being developed for GrADS

GrADS: Building an Adaptive Grid Application Development and Execution Environment• System being developed as performance economy

– Performance contracts used to exchange information, bind resources, renegotiate execution environment

• Joint project with large team of researchers

Ken KennedyAndrew ChienRich WolskiIan FosterCarl Kesselman

Jack DongarraDennis GannonDan ReedLennart JohnssonFran Berman

PSE

Config.object

program

wholeprogramcompiler

Source appli-cation

libraries

Realtimeperf

monitor

Dynamicoptimizer

Grid runtime System

(Globus)

negotiation

Softwarecomponents

Scheduler/Service

Negotiator

Performance feedback

Perfproblem

Grid Application Development System

In Summary ...

• Development of AppLeS methodology, applications, templates, and models provides a careful investigation of adaptivity for emerging Grid environments

• Goal of current projects is to use real-world strategies to promote performance– dynamic scheduling – qualitative and quantitative modeling– multi-agent environments– resource economies

• AppLeS Corps:– Fran Berman, UCSD– Rich Wolski, U. Tenn– Jeff Brown– Henri Casanova– Walfredo Cirne– Holly Dail– Marcio Faerman

– Jim Hayes– Graziano Obertelli– Gary Shao– Otto Sievert– Shava Smallen– Alan Su

• Thanks to NSF, NASA, NPACI, DARPA, DoD

• AppLeS Home Page: http://apples.ucsd.edu