Download - Scheduling Algorithms for Grid Computing

8/19/2019 Scheduling Algorithms for Grid Computing

1/45

Technical Report No. 2006-504

Scheduling Algorithms for Grid omputing!

State of the Art and "pen #ro$lems

%angpeng &ong and Selim G. A'l

School of omputing(

)ueen*s +ni,ersit

ingston( "ntario/anuar 2006

A$stract!

Than's to ad,ances in ide-area netor' technologies and the lo cost of computing

resources( Grid computing came into $eing and is currentl an acti,e research area. "ne

moti,ation of Grid computing is to aggregate the poer of idel distri$uted resources(

and pro,ide non-tri,ial ser,ices to users. To achie,e this goal( an efficient Grid scheduling

sstem is an essential part of the Grid. Rather than co,ering the hole Grid scheduling

area( this sur,e pro,ides a re,ie of the su$1ect mainl from the perspecti,e of

scheduling algorithms. n this re,ie( the challenges for Grid scheduling are identified.

%irst( the architecture of components in,ol,ed in scheduling is $riefl introduced to

pro,ide an intuiti,e image of the Grid scheduling process. Then ,arious Grid schedulingalgorithms are discussed from different points of ,ie( such as static ,s. dnamic policies(

o$1ecti,e functions( applications models( adaptation( )oS constraints( strategies dealing

ith dnamic $eha,ior of resources( and so on. 3ased on a comprehensi,e understanding

of the challenges and the state of the art of current research( some general issues orth of

further eploration are proposed.

. ntroduction

The popularit of the nternet and the a,aila$ilit of poerful computers and

high-speed netor's as lo-cost commodit components are changing the a e use

computers toda. These technical opportunities ha,e led to the possi$ilit of using

geographicall distri$uted and multi-oner resources to sol,e large-scale pro$lems in

science( engineering( and commerce. Recent research on these topics has led to the

emergence of a ne paradigm 'non as Grid computing 78.

To achie,e the promising potentials of tremendous distri$uted resources( effecti,e and

efficient scheduling algorithms are fundamentall important. +nfortunatel( scheduling

algorithms in traditional parallel and distri$uted sstems( hich usuall run on

homogeneous and dedicated resources( e.g. computer clusters( cannot or' ell in the ne

circumstances 28. n this paper( the state of current research on scheduling algorithms for

the ne generation of computational en,ironments ill $e sur,eed and open pro$lems

ill $e discussed.

The remainder of this paper is organi9ed as follos. An o,er,ie of the Gridscheduling pro$lem is presented in Section 2 ith a generali9ed scheduling architecture. n

Section :( the progress made to date in the design and analsis of scheduling algorithms

for Grid computing is re,ieed. A summar and some research opportunities are offered in

Section 4.

2. ",er,ie of the Grid Scheduling #ro$lem

A computational Grid is a hardare and softare infrastructure that pro,ides

dependa$le( consistent( per,asi,e( and inepensi,e access to high-end computational

capa$ilities 458. t is a shared en,ironment implemented ,ia the deploment of a

persistent( standards-$ased ser,ice infrastructure that supports the creation of( and resource

sharing ithin( distri$uted communities. Resources can $e computers( storage space(

instruments( softare applications( and data( all connected through the nternet and amiddleare softare laer that pro,ides $asic ser,ices for securit( monitoring( resource

management( and so forth. Resources oned $ ,arious administrati,e organi9ations are


2/45

shared under locall defined policies that specif hat is shared( ho is alloed to access

hat( and under hat conditions 4;8. The real and specific pro$lem that underlies the Grid

concept is coordinated resource sharing and pro$lem sol,ing in dnamic(

multi-institutional ,irtual organi9ations 448.

%rom the point of ,ie of scheduling sstems( a higher le,el a$straction for the Grid

can $e applied $ ignoring some infrastructure components such as authentication(

authori9ation( resource disco,er and access control. Thus( in this paper( the folloingdefinition for the term Grid adopted! ualit-of-ser,ice re>uirements? 08.

To facilitate the discussion( the folloing fre>uentl used terms are defined!

@ A tas' is an atomic unit to $e scheduled $ the scheduler and assigned to a

resource.

@ The properties of a tas' are parameters li'e #+memor re>uirement( deadline(

priorit( etc.

@ A 1o$ Bor metatas'( or applicationC is a set of atomic tas's that ill $e carried out

on a set of resources. /o$s can ha,e a recursi,e structure( meaning that 1o$s arecomposed of su$-1o$s andor tas's( and su$-1o$s can themsel,es $e decomposed

further into atomic tas's. n this paper( the term 1o$( application and metatas' are

interchangea$le.

@ A resource is something that is re>uired to carr out an operation( for eample! a

processor for data processing( a data storage de,ice( or a netor' lin' for data

transporting.

@ A site Bor nodeC is an autonomous entit composed of one or multiple resources.

@ A tas' scheduling is the mapping of tas's to a selected group of resources hich

ma $e distri$uted in multiple administrati,e domains.

2

2. The Grid Scheduling #rocess and omponents

A Grid is a sstem of high di,ersit( hich is rendered $ ,arious applications(

middleare components( and resources. 3ut from the point of ,ie of functionalit( e

can still find a logical architecture of the tas' scheduling su$sstem in Grid. %or eample(

Dhu 2:8 proposes a common Grid scheduling architecture. Ee can also generali9escheduling

process in the Grid into three stages! resource disco,ering and filtering(

resource selecting and scheduling according to certain o$1ecti,es( and 1o$ su$mission 748.

As a stud of scheduling algorithms is our primar concern here( e focus on the second

step. 3ased on these o$ser,ations( %ig. depicts a model of Grid scheduling sstemshich

functional components are connected $ to tpes of data flo! resourceapplication information

flo and tas' or tas' scheduling command flo.a

in

or

%ig. ! A logical Grid scheduling architecture! $ro'en lines sho resource or application

information

flos and real lines sho tas' or tas' scheduling command flos.

3asicall( a Grid scheduler BGSC recei,es applications from Grid users( selects feasi$le

resources for these applications according to ac>uired information from the Grid

nformation Ser,ice module( and finall generates application-to-resource mappings( $ased

on certain o$1ecti,e functions and predicted resource performance. +nli'e their

counterparts in traditional parallel and distri$uted sstems( Grid schedulers usuall cannotcontrol Grid resources directl( $ut or' li'e $ro'ers or agents:8( or e,en tightl

coupled ith the applications as the application-le,el scheduling scheme proposes 8(


3/45

058. The are not necessaril located in the same domain ith the resources hich are

,isi$le to them. %ig. onl shos one Grid scheduler( $ut in realit multiple such

schedulers might $e deploed( and organi9ed to form different structures Bcentrali9ed(

hierarchical and decentrali9ed 558C according to different concerns( such as performance

or scala$ilit. Although a Grid le,el scheduler Bor Fetascheduler as it is sometime referred

to in the literature( e.g.( in 8C is not an indispensa$le component in the Grid

:infrastructure Be.g.( it is not included in the Glo$us Tool'it 258( the defacto standardthe Grid

computing communitC( there is no dou$t that such a scheduling componentcrucial for harnessing

the potential of Grids as the are epanding >uic'l( incorporating

resources from supercomputers to des'tops. "ur discussion on scheduling algorithms$ased on the

assumption that there are such schedulers in a Grid.

nformation a$out the status of a,aila$le resources is ,er important for a Grid

scheduler to ma'e a proper schedule( especiall hen the heterogeneous and dnamic

nature of the Grid is ta'en into account. The role of the Grid information ser,ice BGSCto pro,ide

such information to Grid schedulers. GS is responsi$le for collecting and

predicting the resource state information( such as #+ capacities( memor si9e( netor'

$andidth( softare a,aila$ilities and load of a site in a particular period. GS can anser >ueries for resource information or push information to su$scri$ers. The Glo$us

Fonitoring and &isco,er Sstem BF&SC ::8 is an eample of GS.

3esides ra resource information from GS( application properties Be.g.( approimate

instruction >uantit( memor and storage re>uirements( su$tas' dependenc in a 1o$ and

communication ,olumesC and performance of a resource for different application species

are also necessar for ma'ing a feasi$le schedule. Application profiling BA#C is usedetract

properties of applications( hile analogical $enchmar'ing BA3C pro,ides a measure

of ho ell a resource can perform a gi,en tpe of 1o$ 68 768. "n the $asis'noledge from A#

and A3( and folloing a certain performance model 48( cost

estimation computes the cost of candidate schedules( from hich the scheduler chooses

those that can optimi9e the o$1ecti,e functions.

The Haunching and Fonitoring BHFC module Balso 'non as the


4/45

massi,el parallel processors computers BF##C and cluster of or'stations B"EC.

Hoo'ing $ac' at such efforts( e find that scheduling algorithms are e,ol,ing ith the

architecture of parallel and distri$uted sstems. Ta$le captures some important features

of parallel and distri$uted sstems and tpical scheduling algorithms the adopt.

4

Ta$le ! I,olution of scheduling algorithms ith parallel and distri$uted computing sstems

Tpical Architecture &SF( F##

"E

Grid

hronolog

Hate 70s

Hate 7;0s

Fid 770s

ommercial HAN(

Tpical Sstem nterconnect

3us ( Sitch

EANnternetATF

Ker Ho

Ho +suall Not

Jigh

ost of nterconnection

Negligi$le

Negligi$le

Not Negligi$le

nterconnection Jeterogeneit

None

Ho

Jigh

Node Jeterogeneit

None

Ho

Jigh

Single Sstem mage

Les

Les

No

Resource #ool #redetermined

#redetermined and

Not #redetermined

Static&namicit

and Static

Static

and &namic

Resource Fanagement #olic

Fonotone

Fonotone

&i,erseJomogeneous

Jeterogeneous


5/45

Grid Scheduling

Tpical Scheduling Algorithms

Scheduling

Scheduling

Algorithms

Algorithms

AlgorithmsAlthough e can loo' for inspirations in pre,ious research( traditional scheduling

models generall produce poor Grid schedules in practice. The reason can $e found $

going through the assumptions underling traditional sstems 48!

@ All resources reside ithin a single administrati,e domain.

@ To pro,ide a single sstem image( the scheduler controls all of the resources.

@ The resource pool is in,ariant.

@ ontention caused $ incoming applications can $e managed $ the scheduler

according to some policies( so that its impact on the performance that the site can

pro,ide to each application can $e ell predicted.

@ omputations and their data reside in the same site or data staging is a highl

predicta$le process( usuall from a predetermined source to a predetermineddestination( hich can $e ,ieed as a constant o,erhead.

+nfortunatel( all these assumptions do not hold in Grid circumstances. n Grid

computing( man uni>ue characteristics ma'e the design of scheduling algorithms more

challenging 2:8( as eplained in hat follos.

@ Jeterogeneit and Autonom

Although heterogeneit is not ne to scheduling algorithms e,en $efore the emergence

of Grid computing( it is still far from full addressed and a $ig challenge for scheduling

algorithm design and analsis. n Grid computing( $ecause resources are distri$utedmultiple

domains in the nternet( not onl the computational and storage nodes $ut also the

underling netor's connecting them are heterogeneous. The heterogeneit resultsdifferent

capa$ilities for 1o$ processing and data access.

n traditional parallel and distri$uted sstems( the computational resources are usuall

managed $ a single control point. The scheduler not onl has full information a$out all

runningpending tas's and resource utili9ation( $ut also manages the tas' >ueue and

resource pool. Thus it can easil predict the $eha,iours of resources( and is a$le to assign

tas's to resources according to certain performance re>uirements. n a Grid( hoe,er(

in

in

5

resources are usuall autonomous and the Grid scheduler does not ha,e full control of the

resources. t cannot ,iolate local policies of resources( hich ma'es it hard for the Gridscheduler to estimate the eact cost of eecuting a tas' on different sites. The autonom

also results in the di,ersit in local resource management and access control policies( such

as( for eample( the priorit settings for different applications and the resource reser,ation

methods. Thus( a Grid scheduler is re>uired to $e adapti,e to different local policies. The

heterogeneit and autonom on the Grid user side are represented $ ,arious parameters(

including application tpes( resource re>uirements( performance models( and optimi9ation

o$1ecti,es. n this situation( ne concepts such as application-le,el scheduling and Grid

econom 208 are proposed and applied for Grid scheduling.

@ #erformance &namismM

Fa'ing a feasi$le scheduling usuall depends on the estimate of the performance that

candidate resources can pro,ide( especiall hen the algorithms are static. Grid schedulersor' in a dnamic en,ironment here the performance of a,aila$le resources is constantl

changing. The change comes from site autonom and the competition $ applications for


6/45

resources. 3ecause of resource autonom( usuall Grid resources are not dedicated to a

Grid application. %or eample( a Grid 1o$ su$mitted remotel to a computer cluster might

$e interrupted $ a cluster*s internal 1o$ hich has a higher priorit ne resources ma

1oin hich can pro,ide $etter ser,ices or some other resources ma $ecome una,aila$le.

The same pro$lem happens to netor's connecting Grid resources! the a,aila$le

$andidth can $e hea,il affected $ nternet traffic flos hich are non-rele,ant to Grid

1o$s. %or a Grid application( this 'ind of contention results in performance fluctuation(hich ma'es it a hard 1o$ to e,aluate the Grid scheduling performance under classic

performance models. %rom the point ,ie of 1o$ scheduling( performance fluctuation

might $e the most important characteristic of Grid computing compared ith traditional

sstems. A feasi$le scheduling algorithm should $e a$le to $e adapti,e to such dnamic

$eha,iors. Some other measures are also pro,ided to mitigate the impact of this pro$lem(

such as )oS negotiation( resource reser,ation Bpro,ided $ the underling resource

management sstemC and rescheduling. Ee discuss algorithms related to these mechanisms

in Section :.

@

Resource Selection and omputation-&ata Separation

n traditional sstems( eecuta$le codes of applications and inputoutput data areusuall in the same site( or the input sources and output destinations are determined $efore

the application is su$mitted. Thus the cost for data staging can $e neglected or the cost is a

constant determined $efore eecution( and scheduling algorithms need not consider it. 3ut

in a Grid hich consists of a large num$er of heterogeneous computing sites Bfrom

supercomputers to des'topsC and storage sites connected ,ia ide area netor's( the

computation sites of an application are usuall selected $ the Grid scheduler according to

resource status and certain performance models. Additionall( in a Grid( the

communication $andidth of the underling netor' is limited and shared $ a host of

$ac'ground loads( so the inter-domain communication cost cannot $e neglected. %urther(

M

Ee use the term &namism in this paper to refer to the dnamic change in grid resource

performance

pro,ided to a grid application.

6

man Grid applications are data intensi,e( so the data staging cost is considera$le. This

situation $rings a$out the computation-data separation pro$lem! the ad,antage $rought $

selecting a computational resource that can pro,ide lo computational cost ma $e

neutrali9ed $ its high access cost to the storage site.

These challenges depict uni>ue characteristics of Grid computing( and put significant

o$stacles to design and implement efficient and effecti,e Grid scheduling sstems.

t is $elie,ed( hoe,er( that research achie,ements on traditional scheduling pro$lems canstill pro,ide stepping-stones hen a ne generation of scheduling sstems is $eing

constructed ;8.

:. Grid Scheduling Algorithms! State of the Art

t is ell 'non that the compleit of a general scheduling pro$lem is N#-omplete

428. As mentioned in Section ( the scheduling pro$lem $ecomes more challenging

$ecause of some uni>ue characteristics $elonging to Grid computing. n this section( e

pro,ide a sur,e of scheduling algorithms in Grid computing( hich ill form a $asis for

future discussion of open issues in the net section.

:. A Taonom of Grid Scheduling Algorithms

n 248( asa,ant et al propose a hierarchical taonom for scheduling algorithms in

general-purpose parallel and distri$uted computing sstems. Since Grid is a special 'ind of such sstems( scheduling algorithms in Grid fall into a su$set of this taonom. %rom the

top to the $ottom( this su$set can $e identified as hat follos.


7/45

@

Hocal ,s. Glo$al

At the highest le,el( a distinction is dran $eteen local and glo$al scheduling. The

local scheduling discipline determines ho the processes resident on a single #+ are

allocated and eecuted a glo$al scheduling polic uses information a$out the sstem to

allocate processes to multiple processors to optimi9e a sstem-ide performance o$1ecti,e.

"$,iousl( Grid scheduling falls into the glo$al scheduling $ranch.@

Static ,s. &namic

The net le,el in the hierarch Bunder the glo$al schedulingC is a choice $eteen static

and dnamic scheduling. This choice indicates the time at hich the scheduling decisions

are made. n the case of static scheduling( information regarding all resources in the Grid

as ell as all the tas's in an application is assumed to $e a,aila$le $ the time the

application is scheduled. 3 contrast( in the case of dnamic scheduling( the $asic idea is

to perform tas' allocation on the fl as the application eecutes. This is useful hen it is

impossi$le to determine the eecution time( direction of $ranches and num$er of iterations

in a loop as ell as in the case here 1o$s arri,e in a real-time mode. These ,ariances

introduce forms of non-determinism into the running program 428. 3oth static anddnamic scheduling are idel adopted in Grid computing. %or eample( static

scheduling algorithms are studied in 8 2:8dnamic scheduling algorithms are presented.

and;8( and in 6;8 068 2;8 and 78(

%ig. 2! A Jierarchical taonom for scheduling algorithms. 3ranches co,ered $ Grid scheduling

algorithms up to date are denoted in italics. Iamples of each co,ered $ranch are shon at the

lea,es.

o Static Scheduling

n the static mode( e,er tas' comprising the 1o$ is assigned once to a resource. Thus(

the placement of an application is static( and a firm estimate of the cost of the computation

can $e made in ad,ance of the actual eecution. "ne of the ma1or $enefits of the static

model is that it is easier to program from a scheduler*s point of ,ie. The assignment of

tas's is fied a priori( and estimating the cost of 1o$s is also simplified. The static model

allos a uite possi$le and $eond the capa$ilit of a traditionalscheduler running static scheduling policies. To alle,iate this pro$lem( some auiliar

mechanisms such as rescheduling mechanism :8 are introduced at the cost of o,erhead

for tas' migration. Another side-effect of introducing these measures is that the gap

$eteen static scheduling and dnamic scheduling $ecomes less important 528.

o &namic Scheduling

;

&namic scheduling is usuall applied hen it is difficult to estimate the cost of

applications( or 1o$s are coming online dnamicall Bin this case( it is also called online

schedulingC. A good eample of these scenarios is the 1o$ >ueue management in some

metacomputing sstems li'e ondor 28 and Hegion 268. &namic tas' scheduling has

to ma1or components ;8! sstem state estimation Bother than cost estimation in staticschedulingC and decision ma'ing. Sstem state estimation in,ol,es collecting state

information throughout the Grid and constructing an estimate. "n the $asis of the estimate(


8/45

decisions are made to assign a tas' to a selected resource. Since the cost for an assignment

is not a,aila$le( a natural a to 'eep the hole sstem health is $alancing the loads of all

resources. The ad,antage of dnamic load $alancing o,er static scheduling is that the

sstem need not $e aare of the run-time $eha,ior of the application $efore eecution. t is

particularl useful in a sstem here the primar performance goal is maimi9ing resource

utili9ation( rather than minimi9ing runtime for indi,idual 1o$s 648. f a resource is

assigned too man tas's( it ma in,o'e a $alancing polic to decide hether to transfer some tas's to other resources( and hich tas's to transfer. According to ho ill initiate

the $alancing process( there are to different approaches! sender-initiated here a node

that recei,es a ne tas' $ut doesn*t ant to run the tas' initiates the tas' transfer( and

recei,er-initiated here a node that is illing to recei,e a ne tas' initiates the process

758. According to ho the dnamic load $alancing is achie,ed( there are four $asic

approaches 428!

+nconstrained %irst-n-%irst-"ut B%%"( also 'non as %irst-ome-%irst- Ser,edC

3alance-constrained techni>ues

ost-constrained techni>ues

J$rids of static and dnamic techni>ues

+nconstrained %%"! n the unconstrained %%" approach( the resource ith thecurrentl shortest aiting >ueue or the smallest aiting >ueue time is selected for the

incoming tas'. This polic is also called opportunistic load $alancing B"H3C 68 or

mopic algorithm. The ma1or ad,antage of this approach is its simplicit( $ut it is often far

from optimal.

3alance-constrained! The $alance-constrained approach attempts to re$alance the loads

on all resources $ periodicall shifting aiting tas's from one aiting >ueue to another.

n a massi,e sstem such as the Grid( this could $e ,er costl due to the considera$le

communication dela. So some adapti,e local re$alancing heuristic can $e applied. %or

eample( tas's are initiall distri$uted to all resources( and then( instead of computing the

glo$al re$alance( the re$alancing onl happens inside a uic'l distri$uted to all resources and started >uic'l the re$alancing process is

distri$uted and scala$le and the communication dela of re$alancing can $e reduced since

tas' shifting onl happens among the resources that are uired resources for pending 1o$s.

J$rid! A further impro,ement is the static-dnamic h$rid scheduling. The main idea

$ehind h$rid techni>ues is to ta'e the ad,antages of static schedule and at the same time

capture uncertain $eha,iors of applications and resources. %or the scenario of an

application ith uncertain $eha,ior( static scheduling is applied to those parts that alaseecute. At run time( scheduling is done using staticall computed estimates that reduce

run-time o,erhead. That is( static scheduling is done on the alas-eecuted-tas's( and


9/45

dnamic scheduling on others. %or eample( in those cases here there are special )oS

re>uirements in some tas's( the static phase can $e used to map those tas's ith )oS

re>uirements( and dnamic scheduling can $e used for the remaining tas's. %or the

scenario of lo predicta$le resource $eha,iors( static scheduling is used to initiate tas'

assignment at the $eginning and dnamic $alancing is acti,ated hen the performance

estimate on hich the static scheduling is $ased fails. Spring et al sho an eample of this

scenario in 008.Some other dnamic online scheduling algorithms( such as those in 28 and 8(

consider the case of resource reser,ation hich is popular in Grid computing as a a to

get a degree of certaint in resource performance. Algorithms in these to eamples aim to

minimi9e the ma'espan of incoming 1o$s hich consist of a set of tas's. Fateescu 8

uses a resource selector to find a co-reser,ation for 1o$s re>uiring multiple resources.

The 1o$ >ueue is managed $ a %%" fashion ith dnamic priorit correction. f

co-reser,ation fails for a 1o$ in a scheduling ccle( the 1o$*s priorit ill $e promoted in

the >ueue for the net scheduling round. The resource selector ran's candidate resources

$ their num$er of processors and memor si9e. Aggaral*s method 28 is introduced in

su$section :.:.

@ "ptimal ,s. Su$optimal

n the case that all information regarding the state of resources and the 1o$s is 'non(

an optimal assignment could $e made $ased on some criterion function( such as minimum

ma'espan and maimum resource utili9ation. 3ut due to the N#-omplete nature of

scheduling algorithms and the difficult in Grid scenarios to ma'e reasona$le assumptions

hich are usuall re>uired to pro,e the optimalit of an algorithm( current research tries to

find su$optimal solutions( hich can $e further di,ided into the folloing to general

categories.

@

Approimate ,s. Jeuristic

The approimate algorithms use formal computational models( $ut instead of searching

the entire solution space for an optimal solution( the are satisfied hen a solution that is

sufficientl ue can $e used to decrease the time ta'en to find an accepta$le

schedule. The factors hich determine hether this approach is orth of pursuit include

248!

0

A,aila$ilit of a function to e,aluate a solution.

The time re>uired to e,aluate a solution.

The a$ilit to 1udge the ,alue of an optimal solution according to some metric.

A,aila$ilit of a mechanism for intelligentl pruning the solution space.f traditional e,aluating metrics are used for tas' scheduling in Grid computing( e.g.(

ma'espan( the dnamic nature of Grid computing ill ,iolate the a$o,e conditions Bsee

:.2C( so that there are no such approimation algorithms 'non to date. The onl

approimate algorithms in Grid scheduling at the time of this riting are $ased on a nel

proposed o$1ecti,e function! Total #rocessor cle onsumption 508 58.

The other $ranch in the su$optimal categor is called heuristic. This $ranch represents

the class of algorithms hich ma'e the most realistic assumptions a$out a priori

'noledge concerning process and sstem loading characteristics. t also represents the

solutions to the scheduling pro$lem hich cannot gi,e optimal ansers $ut onl re>uire

the most reasona$le amount of cost and other sstem resources to perform their function.

The e,aluation of this 'ind of solution is usuall $ased on eperiments in the real orld or on simulation. Not restricted $ formal assumptions( heuristic algorithms are more

adapti,e to the Grid scenarios here $oth resources and applications are highl di,erse and


10/45

dnamic( so most of the algorithms to $e discussed in the folloing are heuristics.

@

&istri$uted ,s. entrali9ed

n dnamic scheduling scenarios( the responsi$ilit for ma'ing glo$al scheduling

decisions ma lie ith one centrali9ed scheduler( or $e shared $ multiple distri$uted

schedulers. n a computational Grid( there might $e man applications su$mitted or

re>uired to $e rescheduled simultaneousl. The centrali9ed strateg has the ad,antage of ease of implementation( $ut suffers from the lac' of scala$ilit( fault tolerance and the

possi$ilit of $ecoming a performance $ottlenec'. %or eample( Sa$in et al ;;8 propose a

centrali9ed metasheduler hich uses $ac'fill to schedule parallel 1o$s in multiple

heterogeneous sites. Similarl( Arora et al 68 present a completel decentrali9ed( dnamic

and sender-initiated scheduling and load $alancing algorithm for the Grid en,ironment. A

propert of this algorithm is that it uses a smart search strateg to find partner nodes to

hich tas's can migrate. t also o,erlaps this decision ma'ing process ith the actual

eecution of read 1o$s( there$ sa,ing precious processor ccles.

@

ooperati,e ,s. Non-cooperati,e

f a distri$uted scheduling algorithm is adopted( the net issue that should $econsidered is hether the nodes in,ol,ed in 1o$ scheduling are or'ing cooperati,el or

independentl Bnon-cooperati,elC. n the non-cooperati,e case( indi,idual schedulers act

alone as autonomous entities and arri,e at decisions regarding their on optimum o$1ects

independent of the effects of the decision on the rest of sstem. Good eamples of such

schedulers in the Grid are application-le,el schedulers hich are tightl coupled ith

particular applications and optimi9e their pri,ate indi,idual o$1ecti,es.

n the cooperati,e case( each Grid scheduler has the responsi$ilit to carr out its on

portion of the scheduling tas'( $ut all schedulers are or'ing toard a common

sstem-ide goal. Iach Grid scheduler*s local polic is concerned ith ma'ing decisions

in concert ith the other Grid schedulers in order to achie,e some glo$al goal( instead of

ma'ing decisions hich ill onl affect local performance or the performance of a

particular 1o$. An eample of cooperati,e Grid scheduling is presented in 758( here the

efficienc of sender-initiated and recei,er-initiated algorithms adopted $ distri$uted Grid

schedulers is compared ith that of centrali9ed scheduling and local scheduling.

The hierarch taonom classifies scheduling algorithms mainl from the sstem*s

point ,ie( such as dnamic or static( distri$uted or centrali9ed. There are still man other

important aspects forming a scheduling algorithm that cannot $e co,ered $ this method.

asa,ant et al 248 call them flat classification characteristics. n this paper( e discuss

the folloing properties and related eamples hich are rendered $ current scheduling

algorithms hen the are confronted ith ne challenges in the Grid computing scenario!hat*s the goal for schedulingO s the algorithm adapti,eO s there dependenc among

tas's in an applicationO Jo to deal ith large ,olumes of input and out data during

schedulingO Jo do )oS re>uirements influence the scheduling productO Jo does the

scheduler fight again dnamism in the GridO %inall( hat ne methodologies are applied

to the Grid scheduling pro$lemO

:.2 "$1ecti,e %unctions

%ig. :! "$1ecti,e functions co,ered in this sur,e.

The to ma1or parties in Grid computing( namel( resource consumers ho su$mit

,arious applications( and resources pro,iders ho share their resources( usuall ha,e

different moti,ations hen the 1oin the Grid. These incenti,es are presented $ o$1ecti,e

functions in scheduling. urrentl( most of o$1ecti,e functions in Grid computing areinherited from traditional parallel and distri$uted sstems. Grid users are $asicall

concerned ith the performance of their applications( for eample the total cost to run a


11/45

particular application( hile resource pro,iders usuall pa more attention to the

performance of their resources( for eample the resource utili9ation in a particular period.

Thus o$1ecti,e functions can $e classified into to categories! application-centric and

resource-centric 2:8. %ig. : shos the o$1ecti,e functions e ill meet in our folloing

discussion.

@

Application-entricScheduling algorithms adopting an application-centric scheduling o$1ecti,e function

aim to optimi9e the performance of each indi,idual application( as application-le,el

schedulers do. Fost of current Grid applications* concerns are a$out time( for eample the

2

ma'espan( hich is the time spent from the $eginning of the first tas' in a 1o$ to the end of

the last tas' of the 1o$. Fa'espan is the one of the most popular measurements of

scheduling algorithms and man eamples gi,en in the folloing discussion adopt it. As

economic models ;8 78 4:8 248 are introduced into Grid computing( the economic

cost that an application needs to pa for resources utili9ation $ecomes a concern of some

of Grid users. This o$1ecti,e function is idel adopted $ Grid economic models hich

are mainl discussed in Su$section :.6. 3esides these simple functions( man applicationsuse compound o$1ecti,e functions( for eample( some ant $oth shorter eecution time

and loer economic costs. The primar difficult facing the adoption of this 'ind of

o$1ecti,e functions lies in the normali9ation of to different measurements! time and

mone. Such situations ma'e scheduling in the Grid much more complicated. t is re>uired

that Grid schedulers $e adapti,e enough to deal ith such compound missions. At the

same time( the de,elopment of the Grid infrastructure has shon a ser,ice-oriented

tendenc 478( so the >ualit of ser,ices B)oSC $ecomes a $ig concern of man Grid

applications in such a non-dedicated dnamic en,ironment. The meaning of )oS is highl

dependent on particular applications( from hardare capacit to softare eistence.

+suall( )oS is a constraint imposed on the scheduling process instead of the final

o$1ecti,e function. The in,ol,ement of )oS usuall has effect on the resource selection

step in the scheduling process( and then influences the final o$1ecti,e optimi9ation. Such

scenarios ill $e discussed in :..

@

Resource-entric

Scheduling algorithms adopting resource-centric scheduling o$1ecti,e functions aim to

optimi9e the performance of the resources. Resource-centric o$1ecti,es are usuall related

to resource utili9ation( for eample( throughput hich is the a$ilit of a resource to process

a certain num$er of 1o$s in a gi,en period utili9ation( hich is the percentage of time a

resource is $us. Ho utili9ation means a resource is idle and asted. %or a multiprocessor

resource( utili9ation differences among processors also descri$e the load $alance of thesstem and decrease the throughput. ondor is a ell 'non sstem adopting throughput

as the scheduling o$1ecti,e 28. As economic models are introduced into Grid computing(

economic profit Bhich is the economic $enefits resource pro,iders can get $ attracting

Grid users to su$mit applications to their resourcesC also comes under the pur,ie of

resource management polices.

n the Grid computing en,ironments( due to autonom $oth in Grid users and resource

pro,iders( application-centric o$1ecti,es and resource centric o$1ecti,es often are at odds.

Hegion 268 pro,ides a methodolog alloing each group to epress their desires( and acts

as a mediator to find a resource allocation that is accepta$le to $oth parties through a

flei$le( modular approach to scheduling support.

The o$1ecti,e functions mentioned a$o,e are idel adopted $efore the emergence of Grid computing and man efforts ha,e $een made to approach an approimation 28 :08

578 08 or to get a


12/45

assumption( namel( that the resources are dedicated so that the can pro,ide constant

performance to an application. 3ut as e ha,e emphasi9ed in Section 2( this assumption

does not hold in Grid computing. This ,iolation ea'ens the pre,ious results. %or eample(

assume an optimal schedule ith ma'espan "#T can $e found( if the resources in,ol,ed

are sta$le. f the Grid resources are suddenl sloed don at "#T due to some reason

:

Binterrupted $ resources* local 1o$s( netor' contention( or hate,erC and the slo speedsituation continues for a long period( then the ma'espan of the actual schedule is far from

"#T and cannot $e $ounded $ scheduling algorithms that cannot predict the performance

changing. So( if the o$1ecti,e function of a schedule is ma'espan( and there is no $ound

either for resource performance or for the time period of the change( in other ords( if e

cannot predict the performance fluctuations( there ill $e no ma'espan approimation

algorithm in general that applies to a Grid 58.

To escape from this predicament( a no,el scheduling criterion for Grid computing is

proposed in 58! total processor ccle consumption BT#C( hich is the total num$er of

instructions the Grid could compute from the starting time of eecuting the schedule to the

completion time. T# represents the total computing poer consumed $ an application.

n this ne criterion( the length of a tas' is the num$er of instructions in the tas' the speedof a processor is the num$er of instructions computed per unit time and processors in a

Grid are heterogeneous so the ha,e ,arious speeds. n addition( the speed of each

processor ,aries o,er time due to the contention epected in an open sstem. Het sp(t $e the

speed of processor p during time inter,al t( tPC( here t is a non-negati,e integer.

Eithout loss of generalit( it can $e assumed that the speed of each processor does not

,ar during time inter,al t( tPC( for e,er t $ adopting an inter,al as short as the unit

time. t is also assumed that the ,alue of an sp(t cannot $e 'non in ad,ance. %ig. 4BaC

shos an eample of a set of tas's. %ig. 4B$C shos an eample of processor speed

distri$ution.

%ig. 4! A ne Grid scheduling criterion! T# 508.

Het T $e a set of n independent tas's ith the same length H. Het S $e a schedule of T

in a Grid ith m processors. Het F $e the ma'espan of S. The speed of processor p during

the time inter,al t( tPC is sp(t. Then( the T# of S is defined as!

Q p m

F Q t 0⎣ ⎦ s p(t

P Q p

m

BF ⎣

F ⎦Cs p( F⎣ ⎦ .

So the T# of the schedule in %ig. 4BcC is 2 P 45 P :; P 25 P 6 25 P 5 25

.2.

4

The ad,antage of T# is that it can $e little affected $ the ,ariance of resource

performance( et still related to the ma'espan. Since the total num$er of instructions

needed to run a 1o$ is constant( approimation algorithms $ased on this criterion can $e

de,eloped. n 508( a B P m Blog e Bm C P C approimation algorithm is gi,en for

nHco BnC m Bloge Bm C P coarse-grained independent tas's in the Grid. A B P C approimation

n


13/45

algorithm for scheduling of coarse-grained tas's ith precedence orders is descri$ed in

58( here HcpBnC is the critical path of the tas' graph.

:.: Adapti,e Scheduling

An adapti,e solution to the scheduling pro$lem is one in hich the algorithms and

parameters used to ma'e scheduling decisions change dnamicall according to the

pre,ious( current andor future resource status 248. n Grid computing( the demand for

scheduling adaptation comes from three points! the heterogeneit of candidate resources(the dnamism of resource performance( and the di,ersit of applications( as %ig. 4 shos.

orrespondent ith these three points( e can find three 'inds of eamples also.

%ig. 5! Taonom for adapti,e scheduling algorithms in Grid computing.

Resource Adaptation! 3ecause of resource heterogeneit and application di,ersit(

disco,ering a,aila$le resources and selecting an application-appropriate su$set of those

resources are ,er important to achie,e high performance or reduce the cost( for eample

Su et al 028 sho ho the selection of a data storage site affects the netor'

transmission dela. &ail et al :48 propose a resource selection algorithm in hich

a,aila$le resources are grouped first into dis1oint su$sets according to the netor' delas

$eteen the su$sets. Then( inside each su$set( resources are ran'ed according to their

memor si9e and computational poer. %inall( an appropriatel-si9ed resource group isselected from the sorted lists. The upper $ound for this ehausti,e search procedure is

gi,en and claimed accepta$le in the computational Grid circumstance. Su$hlo' et al 048

sho algorithms to 1ointl anal9e computation and communication resources for different

application demands and a frameor' for automatic node selection. The algorithms are

adapti,e to demands li'e selecting a set of nodes to maimi9e the minimum a,aila$le

$andidth $eteen an pair of nodes and selecting a set of nodes to maimi9e the

minimum a,aila$le fractional compute and communication capacities. The compleit of

these algorithms is also anal9ed and the results ha,e shon it is insignificant in

comparison ith the eecution time of the applications that the are applied to.

5

&namic #erformance Adaptation! The adaptation to the dnamic performance of

resources is mainl ehi$ited as! BiC changing scheduling policies or rescheduling 008

;8 Bfor eample( the sitching $eteen static scheduling algorithms hich use predicted

resource information and dnamic ones hich $alance the static scheduling resultsC( BiiC

or'load distri$uting according to application-specific performance models 8( and BiiiC

finding a proper num$er of resources to $e used 228 58. Applications to hich these

adapti,e strategies are applied usuall adopt some 'ind of di,ide-and-con>uer approach to

sol,e a certain pro$lem ;8. n the di,ide-and-con>uer approach( the initial pro$lem can

$e recursi,el di,ided into su$-pro$lems hich can $e sol,ed more easil. As a special

case of the di,ide-and-con>uer approach( a model for applications folloing the

manageror'er model is shon in %ig. 6 6:8. %rom an initial tas' Bnode A in %ig. 6C( anum$er of tas's Bnodes 3( ( & and IC are launched to eecute on pre-selected or

dnamicall assigned resources. Iach tas' ma recei,e a discrete set of data( and fulfil its

computational tas' independentl and deli,er its output BNode %C. Iamples of such

applications include parameter seep applications 228 2:8 608( and data stripe

processing 8 008. n contrast ith the manageror'er model Bhere the manager is in

charge of the $eha,iors of its or'ersC( an acti,e adaptation method named luster-aare

Random Stealing BRSC for Grid computing sstems is proposed in ;8 $ased on the

traditional Random Stealing BRSC algorithm. RS allos an idle resource steal 1o$s not

onl from the local cluster $ut also from remote ones ith a ,er limited amount of

ide-area communication. Thus( load $alancing among nodes running a

di,ide-and-con>uer application is achie,ed. n re,ieing eperiences gained for application le,el scheduling in Grid computing( 3erman et al :8 note that ,ia schedule

adaptation( it is possi$le to use sophisticated scheduling heuristics( li'e list-scheduling


14/45

approaches hich are sensiti,e to performance prediction errors( for Grid en,ironments in

hich resource a,aila$ilities change o,er time.

3

A

%

&I

%ig. 6! The parallel or'flo of a di,ide-and-con>uer application.

Application Adaptation! To achie,e high performance( application-le,el schedulers in the

Grid Be.g. AppHeS :8C are usuall tightl integrated ith the application itself and are not

easil applied to other applications. As a result( each scheduler is application-specific.

Noticing this limitation( &ail et al :48 eplicitl decouple the scheduler core Bthe

searching procedure introduced in the $eginning of this su$sectionC from

application-specific Be.g. performance modelsC and platform-specific Be.g. resource

information collectionC components used $ the core. The 'e feature to implement the

decoupling Bhile still 'eeping aareness of application characteristicsC is that application

characteristics are recorded andor disco,ered $ components such as a speciali9ed6

compiler and Grid-ena$led li$raries. These application characteristics are communicated

,ia ell-defined interfaces to schedulers so that schedulers can $e general-purpose( hile

still pro,iding ser,ices that are appropriate to the application at hand. Aggaral et al 28

consider another case that applications in Grid computing often meet( namel( resource

reser,ation( and de,elop a generali9ed Grid scheduling algorithm that can efficientl

assign 1o$s ha,ing ar$itrar inter-dependenc constraints and ar$itrar processing

durations to resources ha,ing prior reser,ations. Their algorithm also ta'es into account

ar$itrar delas in transfer of data from the parent tas's to the child tas's. n fact( this is a

heuristic list algorithm hich e ill discus in the net su$section. n :8( Eu et al gi,e

a ,er good eample of ho a self-adapti,e scheduling algorithm cooperates ith

long-term resource performance prediction 548 058. Their algorithm is adapti,e to

indi,isi$le single se>uential 1o$s( 1o$s that can $e partitioned into independent parallel

tas's( and 1o$s that ha,e a set of indi,isi$le tas's. Ehen prediction error of the sstem

utili9ation is reaching a threshold( the scheduler ill tr to reallocate tas's.

:.4 Tas' &ependenc of an Application

Ehen the relations among tas's in a Grid application are considered( a common

dichotom used is dependenc ,s. independenc. +suall( dependenc means there are

precedence orders eisting in tas's( that is( a tas' cannot start until all its parent are done.

&ependenc has crucial impact to the design of scheduling algorithms( so in this

su$section( algorithms are discussed $ folloing the same dichotom as shon in %ig. .%ig. ! Tas' dependenc taonom of Grid scheduling algorithms.

:.4. ndependent Tas' Scheduling

As a set of independent tas's arri,e( from the sstem*s point ,ie( a common strateg

is to assign them according to the load of resources in order to achie,e high sstem

throughput. This approach as discussed under the dnamic $ranch in Su$section :..

%rom the point of ,ie of applications( some static heuristic algorithms $ased on eecution

cost estimate can $e applied 8.

@

Iamples of Static Algorithms ith #erformance Istimate

FIT BFinimum Iecution TimeC! FIT assigns each tas' to the resource ith the $estepected eecution time for that tas'( no matter hether this resource is a,aila$le or not at

the present time. The moti,ation $ehind FIT is to gi,e each tas' its $est machine. This


15/45

can cause a se,ere load im$alance among machines. I,en orse( this heuristic is not

applica$le to heterogeneous computing en,ironments here resources and tas's are

characteri9ed as consistent( hich means a machine that can run a tas' faster ill run all

the other tas's faster.

FT BFinimum ompletion TimeC! FT assigns each tas'( in an ar$itrar order( to

the resource ith the minimum epected completion time for that tas'. This causes some

tas's to $e assigned to machines that do not ha,e the minimum eecution time for them.The intuition $ehind FT is to com$ine the $enefits of opportunistic load $alancing B"H3C

and FIT( hile a,oiding the circumstances in hich "H3 and FIT perform poorl.

Fin-min! The Fin-min heuristic $egins ith the set + of all unmapped tas's. Then(

the set of minimum completion time F for each tas' in + is found. Net( the tas' ith the

o,erall minimum completion time from F is selected and assigned to the corresponding

machine Bhence the name Fin-minC. Hast( the nel mapped tas' is remo,ed from +( and

the process repeats until all tas's are mapped Bi.e.( + is emptC. Fin-min is $ased on the

minimum completion time( as is FT. Joe,er( Fin-min considers all unmapped tas's

during each mapping decision and FT onl considers one tas' at a time. Fin-min maps

the tas's in the order that changes the machine a,aila$ilit status $ the least amount that

an assignment could. Het t i $e the first tas' mapped $ Fin-min onto an empt sstem.The machine that finishes ti the earliest( sa m1( is also the machine that eecutes ti the

fastest. %or e,er tas' that Fin-min maps after ti( the Fin-min heuristic changes the

a,aila$ilit status of m1 $ the least possi$le amount for e,er assignment. Therefore( the

percentage of tas's assigned to their first choice Bon the $asis of eecution timeC is li'el to

$e higher for Fin-min than for Fa-min Bsee $eloC. The epectation is that a smaller

ma'espan can $e o$tained if more tas's are assigned to the machines that complete them

the earliest and also eecute them the fastest.

Fa-min! The Fa-min heuristic is ,er similar to Fin-min. t also $egins ith the

set + of all unmapped tas's. Then( the set of minimum completion time F( is found. Net(

the tas' ith the o,erall maimum from F is selected and assigned to the corresponding

machine Bhence the name Fa-minC. Hast( the nel mapped tas' is remo,ed from +( and

the process repeats until all tas's are mapped Bi.e.( + is emptC. ntuiti,el( Fa-min

attempts to minimi9e the penalties incurred from performing tas's ith longer eecution

times. Assume( for eample( that the 1o$ $eing mapped has man tas's ith ,er short

eecution times and one tas' ith a ,er long eecution time. Fapping the tas' ith the

longer eecution time to its $est machine first allos this tas' to $e eecuted concurrentl

ith the remaining tas's Bith shorter eecution timesC. %or this case( this ould $e a

$etter mapping than a Fin-min mapping( here all of the shorter tas's ould eecute first(

and then the longer running tas' ould $e eecuted hile se,eral machines sit idle. Thus(

in cases similar to this eample( the Fa-min heuristic ma gi,e a mapping ith a more

$alanced load across machines and a $etter ma'espan.Fin-min and Fa-min algorithms are simple and can $e easil amended to adapt to

different scenarios. %or eample( in 58( a )oS Guided Fin-min heuristic is presented

hich can guarantee the )oS re>uirements of particular tas's and minimi9e the ma'espan

at the same time. Eu( Shu and Dhang 58 ga,e a Segmented Fin-min algorithm( in

;

hich tas's are first ordered $ the epected completion time Bit could $e the maimum

IT( minimum IT or a,erage IT on all of the resourcesC( then the ordered se>uence is

segmented( and finall Fin-min is applied to all these segments. The segment impro,es

the performance of tpical Fin-min hen the lengths of the tas's are dramaticall

different $ gi,ing a chance to longer tas's to $e eecuted earlier than in the case here

the tpical Fin-min is adopted.USuffrage! Another popular heuristic for independent scheduling is called Suffrage

68. The rationale $ehind Suffrage is that a tas' should $e assigned to a certain host and if


16/45

it does not go to that host( it ill suffer the most. %or each tas'( its suffrage ,alue is

defined as the difference $eteen its $est FT and its second-$est FT. Tas's ith high

suffrage ,alue ta'e precedence. 3ut hen there is input and output data for the tas's( and

resources are clustered( con,entional suffrage algorithms ma ha,e pro$lems. n this case(

intuiti,el( tas's should $e assigned to the resources as near as possi$le to the data source

to reduce the ma'espan. 3ut if the resources are clustered( and nodes in the same cluster

are ith near identical performance( then the $est and second $est FTs are also nearlidentical hich ma'es the suffrage close to 9ero and gi,es the tas's lo priorit. "ther

tas's might $e assigned on these nodes so that the tas' might $e pushed aa from its data

source. To fi this pro$lem( asano,a et al ga,e an impro,ement called USuffrage in 2:8

hich gi,es a cluster le,el suffrage ,alue to each tas'. Iperiments sho that USuffrage

outperforms the con,entional Suffrage not onl in the case here large data files are

needed( $ut also hen the resource information cannot $e predicted ,er accuratel.

Tas' Grouping! The a$o,e algorithms are usuall used to schedule applications that

consist of a set of independent coarse-grained compute-intensi,e tas's. This is the ideal

case for hich the computational Grid as designed. 3ut there are some other cases in

hich applications ith a large num$er of lighteight 1o$s. The o,erall processing of

these applications in,ol,es a high o,erhead cost in terms of scheduling and transmission toor from Grid resources. Futhu,elu et al 78 propose a dnamic tas' grouping scheduling

algorithm to deal ith these cases. "nce a set of fine grained tas's are recei,ed( the

scheduler groups them according to their re>uirements for computation Bmeasured in

num$er of instructionsC and the processing capa$ilit that a Grid resource can pro,ide in a

certain time period. All tas's in the same group are su$mitted to the same resource hich

can finish them all in the gi,en time. 3 this mean( the o,erhead for scheduling and 1o$

launching is reduced and resource utili9ation is increased.

The #ro$lem of Jeterogeneit! n heterogeneous en,ironments( the performance of

the a$o,e algorithms is also affected $ the rate of heterogeneit of the tas's and the

resources as ell as the consistenc of the tas's* estimated completion time on different

machines. The stud in 68 shos that no single algorithm can ha,e a permanent

ad,antage in all scenarios. This result clearl leads to the conclusion that if high

performance is anted as much as possi$le in a computational Grid( the scheduler should

ha,e the a$ilit to adapt different applicationresource heterogeneities.

@

Algorithms ithout #erformance Istimate

The algorithms introduced a$o,e use predicted performance to ma'e tas' assignments.

n 0:8 and 78 to algorithms are proposed that do not use performance estimate $ut

adopt the idea of duplication( hich is feasi$le in the Grid en,ironment here

computational resources are usuall a$undant $ut muta$le.

7Su$ramani et al 0:8 introduce a simple distri$uted duplication scheme for

independent 1o$ scheduling in the Grid. A Grid scheduler distri$utes each 1o$ to the least

loaded sites. Iach of these sites schedules the 1o$ locall. Ehen a 1o$ is a$le to start at

an of the sites( the site informs the scheduler at the 1o$-originating site( hich in turn

contacts the other - sites to cancel the 1o$s from their respecti,e >ueues. 3 placing each

1o$ in the >ueue at multiple sites( the epectations are impro,ed sstem utili9ation and

reduced a,erage 1o$ ma'espan. The parameter can $e ,aried depending upon the

scala$ilit re>uired. Sil,a et al 78 propose a resource information free algorithm called

Eor'>ueue ith Replication BE)RC for independent 1o$ scheduling in the Grid. The E)R

algorithm uses tas' replication to cope ith the heterogeneit of hosts and tas's( and also

the dnamic ,ariation of resource a,aila$ilit due to load generated $ others users in theGrid. +nli'e the scheme in 0:8 here there are no duplicated tas's actuall running( in

E)R( an idle resource ill replicate tas's that are still running on other resources. Tas's


17/45

are replicated until a predefined maimum num$er of replicas are reached. Ehen a tas'

replica finishes( other replicas are cancelled. n this approach( performance is increased in

situations hen tas's are assigned to slo$us hosts $ecause hen a tas' is replicated(

there is a greater chance that a replica is assigned to a fastidle host. Another ad,antage of

this scheme is that it increases the immunit to performance changing( since the possi$ilit

that all sites are changing is much smaller than one site.

:.4.2 &ependent Tas' SchedulingEhen tas's composing a 1o$ ha,e precedence orders( a popular model applied is the

directed acclic graph B&AGC( in hich a node represents a tas' and a directed edge

denotes the precedence orders $eteen its to ,ertices. n some cases( eights can $e

added to nodes and edges to epress computational costs and communicating costs

respecti,el. As Grid computing infrastructures $ecome more and more mature and

poerful( support for complicated or'flo applications( hich can $e usuall modeled

$ &AGs( are pro,ided. Ee can find such tools li'e ondor &AGFan 28( oG 48(

#egasus :8( Grid%lo 28 and ASAH"N 08. A comprehensi,e sur,e of these

sstems is gi,en in 28( hile e continue our focus on their scheduling algorithm

components in hat follos.

@ Grid Sstems Supporting &ependent Tas' Scheduling

To run a or'flo in a Grid( e need to consider to pro$lems! C ho the tas's in the

or'flo are scheduled( and 2C ho to su$mit the scheduled tas's to Grid resources

ithout ,iolating the structure of the original or'flo. Grid or'flo generators address

the first pro$lem and Grid or'flo engines are used to deal ith the second.

o Grid Eor'flo Ingines

Grid or'flo engines are responsi$le for oG is a set of A#s hich can $e used to

su$mit concrete or'flos to the Grid here the concrete or'flo means the tas's in a

&AG are alread mapped to resource locations here the are to $e eecuted( so oG

itself does not consider the optimi9ation pro$lem of or'flos. &AGFan or's similar

ith oG. t accepts &AG description files representing or'flos( and then folloing the

order of tas's and dependenc constraints in the description files( su$mits tas's to

20

ondor-G( hich schedule them onto the $est machines a,aila$le in a %%" strateg

ithout an long-term optimi9ation( 1ust li'e it does ith common ondor tas's.

o Grid Eor'flo Generators

#egasus pro,ides a $ridge $eteen Grid users and or'flo eecution engines li'e

&AGFan. n #egasus( there are to 'inds of or'flos! a$stract or'flos hich are

composed of tas's Breferred as application components in #egasusC and their dependencies

reflecting the data dependencies of tas's( and concrete or'flos hich are the mappings

of a$stract or'flos to Grid resources. #egasus* main concern is to generate these to'inds of or'flos according to demands $ users for certain data products. t does so $

searching a,aila$le application components hich can produce the re>uired data products

and a,aila$le input and intermediate data replicas in the Grid. To this ends( it pro,ides a

oncrete Eor'flo Generator BEGC :68. EG performs the mapping from an a$stract

or'flo to a concrete or'flo and generates the correspondent &AGFan su$mit files.

t automaticall identifies phsical locations for $oth application components and data(

finds appropriate resources to eecute the components reling on GS( and generates an

eecuta$le or'flo that can $e su$mitted to the ondor-G through &AGFan. Ehen

there are multiple appropriate resources a,aila$le( EG supports a fe standard selection

algorithms! random( round-ro$in and min-min :;8. Resource selection algorithms are

plugga$le components in #egasus so third-part de,eloped algorithms can $e appliedaccording to different concerns. As an eample( 3lthe et al 68 present a multiple rounds

mied min-min and ma-min algorithm for resource selection in hich the final mapping


18/45

selected is the one that has the minimal ma'espan. onsidering the dnamism of the Grid(

instead of su$mitting the hole tas' graph at once( #egasus applies a or'flo partition

method that su$mits laer-partitioned su$graphs iterati,el. 3ut( as shon in the

discussion $elo( laered partition ma not use the ad,antages of localit for tas'

independenc and( as a result( produce $ad schedules( especiall hen the &AG is

un$alanced. This ea'ness is also demonstrated in 08.

Similar ith #egasus( IN ;8 also adopts plugga$le algorithms for a$stractor'flo to concrete or'flo mapping( and in 78( random( $est of n random(

simulated annealing and game theor algorithms are tested. The latter to algorithms ill

$e discussed in Su$section :.6.

n Grid%lo 28( or'flo scheduling is conducted hierarchicall $ a glo$al Grid

or'flo manager and a local Grid su$-or'flo scheduler. Glo$al Grid or'flo

manager recei,es re>uests from users ith the or'flo description in UFH( and then

simulates or'flo eecution to find a near-optimal schedule in terms of ma'espan. The

simulation is done $ polling local Grid schedulers hich can estimate the finish time of

su$-or'flos on their local sites. A fu99 timing techni>ue is used to get the estimate(

and the possi$ilit of conflict on a shared resource among tas's from different

su$-or'flos is considered. The ad,antage of fu99 functions is that the can $ecomputed ,er fast and are suita$le for scheduling of time-critical Grid applications(

although the do not necessaril pro,ide the $est scheduling solution. Grid%lo also

pro,ides rescheduling functionalit hen the real eecution is delaed too far from the

estimate.

Ee see from the a$o,e discussion that most current efforts ha,e $een directed toards

supporting or'flos at the programming le,el( thus pro,iding potential opportunities for

algorithms designers Bas the allo scheduling algorithms to $e plugged inC. As Grid

2

computing inherits pro$lems from traditional sstems( a natural >uestion to as' is hat can

$e learned from the etensi,e studies on &AG scheduling algorithms in heterogeneous

computingO A complete sur,e of these algorithms is $eond the scope of this paper( $ut

some ideas and common eamples are discussed $elo to sho the pro$lems e are still

confronted ith in the Grid.

@

Taonom of Algorithms for &ependent Tas' Scheduling

onsidering communication delas hen ma'ing scheduling decisions introduces a $ig

challenge! the trade-off $eteen ta'ing ad,antage of maimal parallelism and minimi9ing

communication dela. This pro$lem is also called the ma-min pro$lem 428. Jigh

parallelism means dispatching more tas's simultaneousl to different resources( thus

increasing the communication cost( especiall hen the communication dela is ,er high.

Joe,er( clustering tas's onl on a fe resources means lo resource utili9ation. To dealith this pro$lem in heterogeneous computing sstems( three 'inds of heuristic algorithms

ere pre,iousl proposed.

o Hist Jeuristics

n general( list scheduling is a class of scheduling heuristics in hich tas's are assigned

ith priorities and placed in a list ordered in decreasing magnitude of priorit. Ehene,er

tas's contend for processing( the selection of tas's to $e immediatel processed is done on

the $asis of priorit ith higher-priorit tas's $eing assigned resources first 428. The

differences among ,arious list heuristics mainl lie in ho the priorit is defined and hen

a tas' is considered read for assignment.

An important issue in &AG scheduling is ho to ran' Bor eighC the nodes and edges

Bhen communication dela is consideredC. The ran' of a node is used as its priorit in thescheduling. "nce the nodes and edges are ran'ed( tas'-to-resource assignment can $e

found $ considering the folloing to pro$lems to minimi9e the ma'espan! ho to


19/45

paralleli9e those tas's ha,ing no precedence orders in the graph and ho to ma'e the time

cost along ith the critical path in the &AG as small as possi$le. Fan list heurists ha,e

$een in,ented( and some ne proposals can $e found in 08( 78 and ;:8 as ell as the

comparison of their algorithms ith older ones.

To lassic Iamples

JI%T! Topcuoglu et al 08 present a heuristic called Jeterogeneous

Iarliest-%inish-Time BJI%TC algorithm. The JI%T algorithm selects the tas' ith thehighest upard ran' Ban upard ran' is defined as the maimum distance from the current

node to the eiting node( including the computational cost and communication costC at

each step. The selected tas' is then assigned to the processor hich minimi9es its earliest

finish time ith an insertion-$ased approach hich considers the possi$le insertion of a

tas' in an earliest idle time slot $eteen to alread-scheduled tas's on the same resource.

The time compleit of JI%T is "BepC( here e is the num$er of edges and p is the

num$er of resources.

JI%T might $e one of the most fre>uentl referred to listing algorithms hich aim to

reduce ma'espan of tas's in a &AG. %or eample( it is tested in ASAH"N and compared

ith a genetic algorithm and a mopic algorithm 08( and the results sho its

effecti,eness in the Grid scenarios( especiall hen the tas' graph is un$alanced.22

%#! Radulescu et al ;:8 present another list heuristic called %ast ritical #ath B%#C(

intending to reduce the compleit of the list heuristics hile maintaining the scheduling

performance at the same time. The moti,ation of %# is $ased on the folloing

o$ser,ation regarding the compleit of list heuristics. 3asicall( a list heuristic has the

folloing procedures! the "Be P ,C time ran'ing phase( the "B, log ,C time ordering phase(

and finall the "BBe P ,C pC time resource selecting phase( here e is the num$er of edges(

, is the num$er of tas's and p is the num$er of resources. +suall the third term is larger

than the second term. The %# algorithm does not sort all the tas's at the $eginning $ut

maintains onl a limited num$er of tas's sorted at an gi,en time. nstead of considering

all processors as possi$le targets for a gi,en tas'( the choice is restricted to either the

processor from hich the last messages to the gi,en tas' arri,es or the processor hich

$ecomes idle the earliest. As a result( the time compleit is reduced to "B, log p P eC.

The #ro$lem of Jeterogeneit!

A critical issue in list heuristics for &AGs is ho to compute a node*s ran'. n a

heterogeneous en,ironment( the eecution time of the same tas' ill differ on different

resources as ell as the communication cost ,ia different netor' lin's. So for a particular

node( its ran' ill also $e different if it is assigned to different resources. The pro$lem is

ho to choose the proper ,alue used to ma'e the ordering decision. These ,alues could $e

the mean ,alue Bli'e the original JI%T in 08C( the median ,alue 628( the orst ,alue(

the $est ,alue and so on. 3ut Dhao et al 228 ha,e shon that different choices can affectthe performance of list heuristics such as JI%T dramaticall Bma'espan can change 4.2V

for certain graphC. Foti,ated $ this o$ser,ation( Sa'ellariou et al 78 ga,e a J$rid

algorithm hich is less sensiti,e to different approaches for ran'ing nodes. n this

algorithm( tas's are upard ran'ed and sorted decreasingl. Then the sorted tas's are

grouped along the sorted se>uence and in e,er group( tas's are independent. %inall( each

group can $e assigned to resources using heuristics for independent tas's.

The a$o,e algorithms ha,e eplored ho the heterogeneit of resources and tas's

impacts the scheduling algorithm( $ut the onl consider the heterogeneit of

computational resources( and miss the heterogeneit of communication lin's.

nstances of Hist Jeuristics in Grid omputing

#re,ious research in &AG scheduling algorithms is ,er helpful hen e consider thesame pro$lem in the Grid scenario. %or eample( a list scheduling algorithm is proposed

in ;8( hich is similar to the JI%T algorithm( $ut changes the method to compute the


20/45

le,el of a tas' node $ not onl including its longest path to an eit node( $ut also ta'ing

incoming communication cost from its parents into account. n 58( Fa et al propose a

ne list algorithm called Itended &namic ritical #ath B&#C hich is a Grid-ena$led

,ersion of the &namic ritical #ath B&#C algorithm( applied in a homogenous

en,ironment. The idea $ehind &# is continuous shortening the critical path in the tas'

graph( $ scheduling tas's in the current # to a resource here a tas' on the critical path

has an earlier start time. The &# algorithm as proposed for scheduling parameter-sapapplications in a heterogeneous Grid. The impro,ements include! BiC initial shuffling tas's

to multiple resources hen the scheduling $egins instead of 'eeping them on one node( BiiC

using the finish time instead of start time to ran' tas' nodes to adapt to heterogeneous

resources( and BiiiC multiple rounds of scheduling to impro,e the current scheduling instead

of onl scheduling once. The compleit of &# is "B,:C( hich is the same as the

&#.

2:

o &uplication 3ased Algorithms

An alternati,e a to shorten the ma'espan is to duplicate tas's on different resources.

The main idea $ehind duplication $ased scheduling is utili9ing resource idle time to

duplicate predecessor tas's. This ma a,oid the transfer of results from a predecessor to asuccessor( thus reducing the communication cost. So duplication can sol,e the ma-min

pro$lem.

&uplication $ased algorithms differ according to the tas' selection strategies for

duplication. "riginall( algorithms in this group ere usuall for an un$ounded num$er of

identical processors such as distri$uted memor multiprocessor sstems. Also the ha,e

higher compleit than the algorithms discussed a$o,e. %or eample( &ar$ha et al :58

present such an algorithm named T&S BTas' &uplication-$ased Scheduling AlgorithmC in

a distri$uted-memor machine ith a compleit of "B, 2 C. BNote( this compleit is for

homogeneous en,ironments.C

T&S! n :58( for each node in a &AG( its earliest start time BestC( earliest completion

time BectC( latest alloa$le start time BlastC( latest alloa$le completed time BlactC( and

fa,orite predecessor BfpredC should $e computed. The last is the latest time hen a tas'

should $e started otherise( successors of this tas' ill $e delaed Bthat is( their est ill

$e ,iolatedC. The fa,orite predecessors of a node i are those hich are predecessors of i

and if i is assigned to the same processors on hich these nodes are running( estBiC ill $e

minimi9ed. The le,el ,alue of a node Bhich denotes the length of the longest path from

that node to an eit node Balso 'non as sin' nodeC( ignoring the communicating cost

along that pathC is used as the priorit to determine the processing order of each tas'. To

compute these ,alues( the hole &AG of the 1o$ ill $e tra,ersed( and the compleit

needed for this step is "Be P ,C. 3ased on these ,alues( tas' clusters are created iterati,el.

The clustering step is li'e a depth-first search from an unassigned node ha,ing the loestle,el ,alue to an entr node. "nce an entr node is reached( a cluster is generated and tas's

in the same cluster ill $e assigned to the same resource. n this step( the last and lact

,alues are used to determine hether duplication is needed. %or eample( if 1 is a fa,orite

predecessor of i and BlastBiC - lactB1CC W c1(i ( here c 1(i is the communication cost $eteen 1

and i( i ill $e assigned to the same processor as 1( and if 1 has $e assigned to other

processors( it ill $e duplicated to i*s processor. n the clustering step( the &AG is

tra,ersed similarl to the depth-first search from the eiting node( and the compleit of

this step ould $e the same as the compleit of a general search algorithm( hich is also

"B, P eC. So the o,erall compleit is "B, P eC. n a dense &AG( the num$er of edges is

proportional to "B,2 C( hich is the orst case compleit of duplication algorithm. Note(

in the clustering step( the num$er of resources a,aila$le is alas assumed to $e smaller than re>uired( that is( the num$er of resources is un$ounded.

TANJ! To eploit the duplication idea in heterogeneous en,ironments( a ne


21/45

algorithm called TANJ BTas' duplication-$ased scheduling Algorithm for Netor' of

Jeterogeneous sstemsC is presented in ;48 and 78. ompared to the ,ersion for

homogeneous resources( the heterogeneous ,ersion has higher compleit( hich is "B,2

pC. This is reasona$le since the eecution time of a tas' differs on different resources. A

ne parameter is introduced for each tas'! the fa,orite processor BfpC( hich can complete

the tas' earliest. "ther parameters of a tas' are computed $ased on the ,alue of fp. n the

clustering step( the initial tas' of a cluster is assigned to its first fp( and if the first fp hasalread $een assigned( then to the second and so on. A processor reduction algorithm is

24

also pro,ided in 78( hich is used to merge clusters hen the num$er for processors is

less than that the clusters generate.

&uplication $ased algorithms are ,er useful in Grid en,ironments. The computational

Grid usuall has a$undant computational resources Brecall that the num$er of resource is

un$ounded in some duplication algorithmsC( $ut high communication cost. This ill ma'e

tas' duplication ,er cost effecti,e. &uplication has alread recei,ed some attention Bsee(

for eample( 0:8 and 78C( $ut current duplication $ased scheduling algorithms in the

Grid onl deal ith independent 1o$s. There are opportunities to create ne algorithms for

complicated &AGs scheduling in an en,ironment that is not onl heterogeneous( $ut alsodnamic.

o lustering Jeuristics

n parallel and distri$uted sstems( clustering is an efficient a to reduce

communication dela in &AGs $ grouping hea,il communicating tas's to the same

la$eled clusters and then assigning tas's in a cluster to the same resource. n general(

clustering algorithms ha,e to phases! the tas' clustering phase that partitions the original

tas' graph into clusters( and a post-clustering phase hich can refine the clusters produced

in the pre,ious phase and get the final tas'-to-resource map.

.5

.5

.5

.

%ig. ;! BaC A &AG ith computational and communication cost. B$C A linear clustering. BcC A

nonlinear

clustering5:8.Algorithms for Tas' lustering

At the $eginning( each node in a tas' graph is an independent cluster. n each iteration(

pre,ious clusters are refined $ merging some clusters. Generall( clustering algorithms

map tas's in a gi,en &AG to an unlimited num$er of resources. n practice( an additional

cluster merging step is needed after clusters are generated( so that the num$er of clusters

generated can $e e>ual to the num$er of processors. A tas' cluster could $e linear or

nonlinear. A clustering is called nonlinear if to independent tas's are mapped in the same

cluster otherise it is called linear. %ig. ; shos a &AG ith computational and

communication cost B%ig. ;BaCC( a linear clustering ith three clusters Xn ( n 2( nC( Xn:( n4(

n 6 Y( Xn5Y B%ig. ;B$CC( and a nonlinear clustering ith clusters Xn ( nY( Xn:( n4( n5( n6Y( and

XnY B%ig. ;BcCC 5:8. The pro$lem of o$taining an optimal clustering of a general tas' 25

graph is N#-complete( so heuristics are designed to deal ith this pro$lem B5:8 8 8


22/45

28C.

&S! Lang et al 8 propose a clustering heuristic called &ominant Se>uence lustering

B&SC algorithm. The critical path of a scheduled &AG is called &ominant Se>uence B&SC

to distinguish it from the critical path of a clustered &AG. The critical path of a clustered

graph is the longest path in that graph( including $oth non-9ero communication edge cost

and tas' eights in that path. The ma'espan in eecuting a clustered &AG is determined

$ the &ominant Se>uence( not $ the critical path of the clustered &AG. %ig. 7 BaC shosthe critical path of a clustered graph( hich consists of Xn ( n2( nY ith a length of 7. %ig.

7B$C is a schedule of this clustered graph( and %ig. 7 BcC gi,es the &S of the scheduled tas'

graph( hich consists of Xn ( n:( n 4( n5( n 6 ( nY ith a length of 0 5:8.

0

%ig. 7! BaC The clustered &AG and its # shon in thic' arros. B$C The Gantt chart of a schedule.

BcC

The scheduled &AG and the &S shon in thic' arros5:8.

n the &S algorithm( tas' priorities are dnamicall computed as the sum of their top

le,el and $ottom le,el. The top le,el and $ottom le,el are the sum of the computation and

communication costs along the longest path from the gi,en tas' to an entr tas' and an eit

tas'( respecti,el. Ehile the $ottom le,el is staticall computed at the $eginning( the tople,el is computed incrementall during the scheduling process. Tas's are scheduled in the

order of their priorities. The current node is an unassigned node ith highest propriet.

Since the entr node alas has the longest path to the eit node( clustering alas $egins

ith the entr node. The current node is merged ith the cluster of one of its predecessors

so that the top le,el ,alue of this node can $e minimi9ed. f all possi$le merging increases

the top le,el ,alue( the current node ill remain in its on cluster. After the current node is

clustered( priorities of all its successors ill $e updated. The time compleit of &S is

"BBe P ,C log ,C( in hich "Blog ,C comes from priorit updating at each step using a

$inar heap( and Be P ,C is for graph tra,ersal in the clustering iterations. So for a dense

2tas' graph( the compleit is roughl "B, log ,C.

ASS-! Hiou et al 8 present a cluster algorithm called ASS- hich emplos a

to-step approach. n the first step( ASS- computes for each node , a ,alue sB,C( hich

26

is the length of the longest path from an entr node to , Becluding the eecution time of ,C.

Thus( sB,C is the start time of , $efore clustering( and sB,C is 0 if , is an entr node. The

second step is the clustering step. /ust li'e &S( it consists of a se>uence of refinement

steps( here each refinement step creates a ne cluster or ZZgros** an eisting cluster.

+nli'e &S( ASS- constructs the clusters $ottom-up( i.e.( starting from the eit nodes.

To construct the clusters( the algorithm computes for each node , a ,alue fB,C( hich is the

longest path from , to an eit node in the current partiall clustered &AG. Het lB,C fB,C P

sB,C. The algorithm uses lB,C to determine hether the node , can $e considered for clustering at the current refinement step. The clustering $egins $ placing e,er eit node

in its on cluster( and then goes through a se>uence of iterations. n each iteration( it

considers to cluster e,er node , hose immediate successors ha,e all $een clustered.

Node , is merged to the cluster of its successor hich gi,es it the minimum lB,C ,alue if

the merge ill not increase the lB,C ,alue. ASS- does not re-compute the critical path in

each refinement step. Therefore( the algorithm has a compleit "Be P , log ,C( and shos

: to 5 times faster than &S in eperiments according to 8.

Algorithms for the #ost-clustering #hase

n 28( steps after tas' clustering are studied( hich include the cluster merging(

processor assignment and tas' ordering in local processors. %or the cluster merging( three

strategies are compared( namel! load $alancing BH3C( communication traffic minimi9ationBTFC( and random BRAN&C.

o H3! &efine the BcomputationalC or'load of a cluster as the sum of eecution times


23/45

of the tas's in the cluster. At each merging step( choose a cluster( ( that has a

minimum or'load among all clusters( and find a cluster( 2( that has a minimum

or'load among those clusters hich ha,e a communication edge $eteen

themsel,es and . Then the pair of clusters and 2 are merged.

o TF! &efine the BcommunicationC traffic of a pair of clusters B( 2C as the sum

of communication times of the edges from to 2 and from 2 to . At each

merging step( merge the pair of clusters hich ha,e the most traffic.o RAN&! At each refinement step( merge a random pair of clusters.

%or the processor assignment( a simple heuristic is applied to find a one-to-one

mapping $eteen a cluster and a processor! BC assign the cluster ith the largest total

communication traffic ith all other clusters to a processor B2C choose an unassigned

cluster ha,ing the largest communication traffic ith an assigned cluster and place it in a

processor closest to its communicating partner B:C repeat B2C until all clusters ha,e $een

assigned to processors.

Iperimental results in 8 and 28 indicate that the performance of clustering

heuristics e,aluated $ ma'espan depends on the granularit of the tas's of a graph. The

granularit of a tas' is the ratio of its eecution time ,s. the o,erhead incurred hen

communicating ith other tas's. This result means adapti,e a$ilit ill $e helpful for thescheduler to pro,ide higher scheduling >ualit if 1o$s ha,e high di,ersit.

The #ro$lem of Jeterogeneit

According to the $asic idea of tas' clustering( cluster heuristics need not consider

heterogeneit of the resources in the clustering phase. 3ut in the folloing cluster merging

and resource assigning phases( heterogeneit ill definitel affect the final performance.

"$,iousl( research in 28 does not consider this pro$lem and no other research on this

pro$lem has $een performed( to our 'noledge. luster heuristics ha,e not et impro,ed

2

for Grid computing either( here communication is usuall costl and performance of

resources ,aries o,er time. Therefore this remains an interesting topic for research in the

Grid computing en,ironement. Another ,alue of the cluster heuristic for Grid scheduling is

its multi-phase nature( hich pro,ides more flei$ilit to the Grid scheduler to emplo

different strategies according to the configuration and organi9ation of the underling

resources.

@

Algorithms onsidering &namism of Grid

Joe,er( there is an important issue for Grid computing hich has not $een discussed!

the resource performance dnamism. All algorithms that e ha,e mentioned in this

su$section schedule hole tas' graphs on the $asis of static resource performance estimate(

hich could $e 1eopardi9ed $ resource performance change during the eecution period.

+suall the performance dnamism is resulted from completion among 1o$s sharing thesame resource. This pro$lem could $e reconciled $ considering the possi$ilit of conflict

hen the scheduling decision is made. Je et al 568 sho us an eample of this approach.

Their algorithm considers the optimi9ation of &AG ma'espan on multiclusters hich ha,e

their on local schedulers and >ueues shared $ other $ac'ground or'loads( hich arri,e

as a linear function of time. The moti,ation is to map as man tas's as possi$le to the same

cluster in order to full utili9e the parallel processing capa$ilit( and at the same time

reduce the inter-cluster communication. The schedulers ha,e a hierarchical structure! the

glo$al scheduler is responsi$le for mapping tas's to different clusters according to their

latest finish time in order to minimi9e the ecess o,er the length of critical path. The local

scheduler on each multicluster pro,ides the estimated finish time of a particular tas' on

this cluster( reports it to the glo$al scheduler upon >ueries( and manages its local >ueue ina %%" a. The time compleit of the glo$al mapping algorithm is "BpMBnPCMn2PeC(

here p is the num$er of multiclusters.


24/45

Another approach to deal ith the dnamic pro$lem is appling dnamic algorithms.

n 58( the authors propose a pF-S algorithms hich etends a traditional dnamic

Faster-Sla,e scheduling model. n the pF-S algorithm( to >ueues are used $ the master(

the unscheduled >ueue and the read >ueue. "nl tas's in the Read >ueue can $e directl

dispatched to sla,e nodes( and tas's in the unscheduled >ueue can $e onl put into the

read >ueue hen all its parents ha,e $een in the read >ueue or dispatched. The

dispatching order in the read >ueue is $ased on tas's* priorities. Ehen a tas' is finished(the priorities of all its children*s ancestors ill $e dnamicall promoted.

n 628( another dnamic algorithm is proposed for scheduling &AGs in a shared

heterogeneous distri$uted sstem. +nli'e the pre,ious or's in hich a uni>ue glo$al

scheduler eists( in this or'( the authors consider multiple independent