Application Scheduling on Distributed Resources

38
Application Scheduling on Distributed Resources Francine Berman U. C. San Diego and NPACI

description

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab - PowerPoint PPT Presentation

Transcript of Application Scheduling on Distributed Resources

Page 1: Application Scheduling on Distributed Resources

Application Scheduling on Distributed Resources

Francine Berman

U. C. San Diego

and

NPACI

Page 2: Application Scheduling on Distributed Resources

• Computational Grid becoming increasingly prevalent as a computational platform

• Focus is on using distributed resources as an ensemble– clusters of workstations

– MPPs

– remote instruments

– visualization sites

– storage archives

The Computational Grid

Page 3: Application Scheduling on Distributed Resources

Programming the Grid

• How do we write Grid programs?

• How do we achieve program performance?

• First try: extend MPP programs ...

Page 4: Application Scheduling on Distributed Resources

Programming the Grid

MPP Programming Model– processors, network are

uniform– single administrative

domain– “machine” is typically

dedicated to user

Grid Programming Model– resources are distributed,

heterogeneous– grid may comprise multiple

administrative domains– resources are shared by

multiple users

Page 5: Application Scheduling on Distributed Resources

Achieving Program Performance

MPP programs achieve

performance by – dedicating resources

– careful staging of computation and data

– considerable coordination

Computational Grids are

dynamic– load and availability of

resources vary with time

and load

– both system and application

behavior hard to predict

Grid Programming Challenge: How can programs leverage the deliverable performance of the Grid at execution time?

Page 6: Application Scheduling on Distributed Resources

Scheduling

• Scheduling is fundamental to performance

• On the Computational Grid, scheduling mechanism must

– perceive the performance impact of system resources on the application

– adapt to dynamic conditions

– optimize application schedule for Grid at execution time

Page 7: Application Scheduling on Distributed Resources

Whose Job Is It?• Application scheduling can be performed by

many entities

– Resource scheduler, job scheduler, program developer, system administrator, user, application scheduler

PSE

Config.object

program

wholeprogramcompiler

Source appli-cation

libraries

Realtimeperf

monitor

Dynamicoptimizer

Grid runtime system

negotiation

Softwarecomponents

Service negotiator

Scheduler

Performance feedback

Perfproblem

Grid Application Development System

Page 8: Application Scheduling on Distributed Resources

Scheduling and Performance

• Achieving application performance can conflict with system performance goals

– Resource Scheduler -- perf measure is utilization

– Job Scheduler -- perf measure is throughput

– System Administrator -- focuses on system perf

• Goal of scheduling application is to promote application performance over performance of other applications and system components

– Application Scheduler -- perf measure is app.-specific

Page 9: Application Scheduling on Distributed Resources

• Everything in the system is evaluated in terms of its impact on the application.

• performance of each system component can be considered as a measurable quantity

• forecasts of quantities relevant to the application can be manipulated to determine schedule

• This simple paradigm forms the basis for AppLeS.

Self-Centered Scheduling

Page 10: Application Scheduling on Distributed Resources

AppLeS

Joint project with Rich Wolski

• AppLeS = Application-Level Scheduler

• Each application has its own self-centered AppLeS agent.

• Custom application schedule achieved through– selection of potentially efficient resource sets

– performance estimation of dynamic system parameters and application performance for execution time frame

– adaptation to perceived dynamic conditions

Page 11: Application Scheduling on Distributed Resources

AppLeS Architecture• AppLeS incorporates

– application-specific information– dynamic information– prediction

• Each AppLeS schedule is customized for its application and envt.

• AppLeS scheduler promotes performance as defined by the user– execution time– convergence– turnaround time

NWS(Wolski)

UserPrefs

AppPerf

Model

PlannerResource Selector

Application

Act.Grid/cluster resources/

infrastructure

Page 12: Application Scheduling on Distributed Resources

Network Weather Service (Wolski)

• The NWS provides dynamic resource information for AppLeS

• NWS is stand-alone system

• NWS – monitors current system state

– provides best forecast of resource load from multiple models

Sensor Interface

Reporting Interface

Forecaster

Model ModelModel

Page 13: Application Scheduling on Distributed Resources

The Role of Prediction• Is monitoring enough for scheduling?

Fast Ethernet Bandwidth at SDSC

0

10

20

30

40

50

60

70

Time of Day

Meg

abits

per

Sec

ond

Measurements

Tue Wed Thu Fri Sat Sun Mon Tue

13:30

Page 14: Application Scheduling on Distributed Resources

Monitoring vs. Forecasting• Monitored data provides a snapshot of what has

happened, forecasting tells us: what will happen?.• Last value is not always the best predictor...

Mean Square Error PerformanceSDSC Ethernet

0

20

40

60

80

100

120

140MSE

Monitored data

Page 15: Application Scheduling on Distributed Resources

iii Commpt

OperAreaT

Using Forecasting in Scheduling

• How much work should each processor be given?

• Jacobi2D AppLeS solves equations for Area

N N Areai

P1 P2 P3

Fast Ethernet Bandwidth at SDSC

0

10

20

30

40

50

60

70

Time of Day

Meg

abits

per

Sec

ond

Measurements

Exponential SmoothingPredictions

Tue Wed Thu Fri Sat Sun Mon Tue

Page 16: Application Scheduling on Distributed Resources

Good Predictions Promote Good Schedules

• Jacobi2D experiments

0

1

2

3

4

5

6

7

Execu

tion T

ime (

secon

ds)

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

Problem Size

Comparison of Execution Times

Compile-time Blocked

Compile-time Irregular Strip

Runtime

Page 17: Application Scheduling on Distributed Resources

SARA: An AppLeS-in-Progress

• SARA = Synthetic Aperture Radar Atlas– application developed at JPL

and SDSC

• Goal: Assemble/process files for user’s desired image– thumbnail image shown

to user

– user selects desired bounding box for more detailed viewing

– SARA provides detailed image in variety of formats

Page 18: Application Scheduling on Distributed Resources

Simple SARA• Simple SARA focuses on obtaining remote data quickly

• Code developed by Alan Su

ComputeServer

DataServer

DataServer

DataServer

Computation servers

and data servers are

logical entities, not

necessarily different

nodes

Network shared by variable number of users

Computation assumed to be done at compute servers

Page 19: Application Scheduling on Distributed Resources

Simple SARA AppLeS

• Focus on resource selection problem: Which site can deliver data the fastest?

– Data for image accessed over shared networks

– Data sets 1.4 - 3 megabytes, representative of SARA file sizes

– Servers used for experiments• lolland.cc.gatech.edu

• sitar.cs.uiuc

• perigee.chpc.utah.edu

• mead2.uwashington.edu

• spin.cacr.caltech.edu

via vBNS

via general Internet

Page 20: Application Scheduling on Distributed Resources

Which is “Closer”?

• Sites on the east coast or sites on the west coast?

• Sites on the vBNS or sites on the general Internet?

• Consistently the same site or different sites at different times?

Page 21: Application Scheduling on Distributed Resources

Which is “Closer”?

• Sites on the east coast or sites on the west coast?

• Sites on the vBNS or sites on the general Internet?

• Consistently the same site or different sites at different times?

Depends a lot on traffic ...

Page 22: Application Scheduling on Distributed Resources

Simple SARA Experiments

• Ran back-to-back experiments from remote sites to UCSD/PCL

• Wolski’s Network Weather Service provides forecasts of network load and availability

• Experiments run during normal business hours mid-week

Page 23: Application Scheduling on Distributed Resources

Preliminary Results• Experiment with larger data set (3 Mbytes)

• During this time-frame, general Internet provides data mostly faster than vBNS

Page 24: Application Scheduling on Distributed Resources

• Experiment with smaller data set (1.4 Mbytes)• During this time frame, east coast sites provide

data mostly faster than west coast sites

More Preliminary Results

Page 25: Application Scheduling on Distributed Resources

9/21/98 Experiments• Clinton Grand Jury webcast commenced at trial 62

Page 26: Application Scheduling on Distributed Resources

Distributed Data Applications

• SARA representative of larger class of distributed data applications

• Simple SARA template being extended to accommodate– replicated data sources– multiple files per image– parallel data acquisition– intermediate compute sites– web interface, etc.

Page 27: Application Scheduling on Distributed Resources

. . .ComputeServers

DataServers

Client

Distributed Data Applications

Move the computationor move the data?

Which computeservers to use?

Which serversto use for multiplefiles?

Page 28: Application Scheduling on Distributed Resources

A Bushel of AppLeS … almost

• During the first “phase” of the project, we’ve focused on developing AppLeS applications

– Jacobi2D

– DOT

– SRB

– Simple SARA

– Genetic Algorithm

– CompLib

– INS2D

– Tomography, ...

• What have we learned?

Page 29: Application Scheduling on Distributed Resources

Lessons Learned From AppLeS

Compile-time Blocked Partitioning

Run-time AppLeS Non-

Uniform Strip Partitioning

• Dynamic information is critical.

Page 30: Application Scheduling on Distributed Resources

Lessons Learned from AppLeS

• Program execution and parameters may exhibit a range of performance

Page 31: Application Scheduling on Distributed Resources

Lessons Learned from AppLeS

• Knowing something about the “goodness” of performance predictions can improve scheduling

Execution time

0

50

100

150

200

250

300

350

Small Medium Large

Problem Size

Tim

e (s)

SuperAppLeSAppLeSMentat

SOR CompLib

Page 32: Application Scheduling on Distributed Resources

Lessons Learned from AppLeS

• Performance of application sensitive to scheduling policy, data, and system characteristics

Page 33: Application Scheduling on Distributed Resources

Achieving Performance on the Computational Grid

Adaptivity a fundamental paradigm for achieving performance on the Grid.

• AppLeS uses adaptivity to leverage deliverable resource performance

• Performance impact of all components considered

• AppLeS agents target dynamic, multi-user distributed environments

Page 34: Application Scheduling on Distributed Resources

Related Work• Application Schedulers

– Mars, Prophet/Gallop, VDCE• Scheduling Services

– Globus GRAM• Resource Allocators

– I-Soft, PBS, LSF, Maui Scheduler, Nile• PSEs

– Nimrod, NEOS, NetSolve, Ninf• High-Throughput Schedulers

– Condor• Performance Steering

– Autopilot, SciRun

Page 35: Application Scheduling on Distributed Resources

Current AppLeS Projects• AppLeS Templates

– distributed data applications

– parameter sweeps

– master/slave applications

– data parallel stencil applications

• Performance Prediction Engineering– scheduling with quality of information

• accuracy• lifetime• overhead

A B C

Page 36: Application Scheduling on Distributed Resources

AppLeS Projects• Real World Scheduling– Contingency Scheduling

• scheduling during execution

– Imperfect Scheduling• scheduling with

– partial information

– poor information

– dynamically changing information

– Multischeduling• resource economies

• scheduling “social structure”

X

Page 37: Application Scheduling on Distributed Resources

The Brave New World• “Grid-aware” programming will require comprehensive development and execution environment

– Adaptation will be fundamental paradigm

PSE

Config.object

program

wholeprogramcompiler

Source appli-cation

libraries

Realtimeperf

monitor

Dynamicoptimizer

Grid runtime system

negotiation

Softwarecomponents

Service negotiator

Scheduler

Performance feedback

Perfproblem

Grid Application Development System

Page 38: Application Scheduling on Distributed Resources

Project Information• Thanks to NSF, NPACI,

Darpa, DoD, NASA

• AppLeS Corps:– Francine Berman

– Rich Wolski

– Walfredo Cirne

– Henri Casanova

– Marcio Faerman

– Markus Fischer

– Jaime Frey

• AppLeS Home Page: http://www-cse.ucsd.edu/groups/hpcl/apples.html

– Jim Hayes

– Graziano Obertelli

– Jenny Schopf

– Gary Shao

– Shava Smallen

– Alan Su

– Dmitrii Zagorodnov