© 2008 Jaeyeon Kangufdcimages.uflib.ufl.edu/UF/E0/02/26/03/00001/kang_j.pdf · true researcher. I...

1

SCHEDULING ALGORITHMS FOR ENERGY MINIMIZATION

By

JAEYEON KANG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2008

2

© 2008 Jaeyeon Kang

3

To my loved Mom and Dad

4

ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor, Sanjay Ranka, for his constant

support and guidance. He taught me passion, patience, and devotion which are necessary for a

true researcher. I would also like to thank my co-advisor, Sartaj Sahni, for his helpful advice and

guidance. He taught me thoroughness and attitude towards research and helped me think more

broadly about my work. My grateful thanks also go to my committee members, Jih-Kwon Peir,

Jose Fortes, and Paul Avery, for their valuable insights and comments.

I am grateful to all my colleagues for being my good friends and collaborators. They have

been very helpful and supportive academically and personally. They have made my journey

memorable. I wish to give special thanks to all of my friends in Korea for listening to me and

making me peaceful.

Finally, none of this would have happened without the full support of my loved family. I

would like to thank my mom who is in heaven, for always believing in me and supporting me.

She was the right person who helped me overcome a lot of difficulties throughout my PhD

program. I would like to thank my dad for motivating me to start this journey and encouraging

me to continue it. He has served as an excellent role model in my life. I would also like to thank

my brothers for their sincere support and encouragement. My deepest gratitude goes to my loved

husband, Hyuckchul, for being with me. Words are not enough to express my gratitude for

everything he has done for me. I love him and hope that I will be there for him when he needs

me. And, I thank my eight-month-old daughter, Katherine (Hyunseung), for coming to me. She

has made me the happiest person in the whole world. I love her and promise that I will always be

on her side.

5

TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...............................................................................................................4

LIST OF TABLES...........................................................................................................................9

LIST OF FIGURES .......................................................................................................................11

ABSTRACT...................................................................................................................................15

CHAPTER

1 INTRODUCTION ..................................................................................................................16

1.1 Introduction...................................................................................................................16 1.2 Preliminaries .................................................................................................................18

1.2.1 Energy Model....................................................................................................18 1.2.2 Application Model ............................................................................................19 1.2.3 Dynamic Environments.....................................................................................20

1.2.3.1 Overestimation....................................................................................20 1.2.3.2 Underestimation..................................................................................20

1.3 Scheduling for Energy Minimization............................................................................21 1.3.1 Static Assignment .............................................................................................21

1.3.1.1 Assignment to minimize total finish time...........................................21 1.3.1.2 Assignment to minimize total energy consumption ...........................22

1.3.2 Static Slack Allocation......................................................................................22 1.3.3 Dynamic Assignment ........................................................................................22 1.3.4 Dynamic Slack Allocation ................................................................................22

1.4 Contributions.................................................................................................................24 1.4.1 Static Assignment to Minimize Total Finish Time:..........................................24 1.4.2 Static Assignment to Minimize Total Energy Consumption ............................25 1.4.3 Static Slack Allocation to Minimize Total Energy Consumption.....................25 1.4.4 Dynamic Slack Allocation to Minimize Total Energy Consumption ...............26 1.4.5 Dynamic Assignment to Minimize Total Energy Consumption.......................27

1.5 Document Layout..........................................................................................................28

2 RELATED WORK.................................................................................................................29

2.1 Static Slack Allocation..................................................................................................31 2.1.1 Non-optimal Slack Allocation ..........................................................................31 2.1.2 Near-optimal Slack Allocation..........................................................................32

2.2 Dynamic Slack Allocation ............................................................................................34 2.3 Static Assignment .........................................................................................................34 2.4 Dynamic Assignment ....................................................................................................37

6

3 STATIC SLACK ALLOCATION .........................................................................................38

3.1 Proposed Slack Allocation ............................................................................................38 3.2 Unit Slack Allocation....................................................................................................41

3.2.1 Maximum Available Slack for a Task ..............................................................41 3.2.2 Compatible Task Matrix ...................................................................................42 3.2.3 Search Space Reduction....................................................................................44

3.2.3.1 Fully independent tasks ......................................................................45 3.2.3.2 Fully dependent tasks .........................................................................45 3.2.3.3 Compressible tasks .............................................................................45

3.2.4 Branch and Bound Search.................................................................................47 3.2.5 Estimating the Lower Bound to Reduce the Search Space ...............................49

3.3 Experimental Results ....................................................................................................50 3.3.1 Simulation Methodology...................................................................................51

3.3.1.1 The DAG generation...........................................................................51 3.3.1.2 Performance measures ........................................................................51

3.3.2 Memory Requirements......................................................................................52 3.3.3 Determining the Size of Unit Slack and the Number of Intervals ....................53 3.3.4 Homogeneous Environments ............................................................................55

3.3.4.1 Comparison of energy requirements...................................................55 3.3.4.2 Comparison of time requirements ......................................................59

3.3.5 Heterogeneous Environments ...........................................................................60 3.3.5.1 Comparison of energy requirements...................................................60 3.3.5.2 Comparison of time requirements ......................................................64

3.3.6 Effect of Search Space Reduction Techniques for PathDVS............................65

4 DYNAMIC SLACK ALLOCATION ....................................................................................68

4.1 Proposed Dynamic Slack Allocation ............................................................................69 4.1.1 Choosing a Subset of Tasks for Slack Reallocation .........................................71

4.1.1.1 Greedy approach.................................................................................72 4.1.1.2 The k time lookahead approach ..........................................................72 4.1.1.3 The k descendent lookahead approach ...............................................73

4.1.2 Time Range for Selected Tasks ........................................................................75 4.2 Experimental Results ....................................................................................................79

4.2.1 Simulation Methodology...................................................................................79 4.2.1.1 The DAG generation...........................................................................79 4.2.1.2 Dynamic environments generation .....................................................79 4.2.1.3 Performance measures ........................................................................80

4.2.2 Overestimation ..................................................................................................81 4.2.2.1 Comparison of energy requirements...................................................81 4.2.2.2 Comparison of time requirements ......................................................87

4.2.3 Underestimation ................................................................................................89 4.2.3.1 Comparison of deadline requirements ................................................90 4.2.3.2 Comparison of energy requirements...................................................95 4.2.3.3 Comparison of time requirements ....................................................100

7

5 STATIC ASSIGNMENT......................................................................................................102

5.1 Overall Scheduling Process.........................................................................................103 5.2 Proposed Static Assignment to Minimize Finish Time...............................................106

5.2.1 Task Selection .................................................................................................107 5.2.2 Processor Selection .........................................................................................108 5.2.3 Iterative Scheduling ........................................................................................109

5.3 Proposed Static Assignment to Minimize Energy ......................................................111 5.3.1 Task Prioritization...........................................................................................112 5.3.2 Estimated Deadline for a Task ........................................................................114 5.3.3 Processor Selection .........................................................................................115

5.3.3.1 Greedy approach for the computation of expected energy...............116 5.3.3.2 Example for assignment ...................................................................118

5.4 Experimental Results for Assignment Algorithms that Minimize Finish Time .........120 5.4.1 Simulation Methodology.................................................................................121

5.4.1.1 The DAG generation.........................................................................121 5.4.1.2 Performance measures ......................................................................121

5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms ...121 5.4.3 Comparison between CPS (Used in Prior Scheduling for Energy

Minimization) and ICP....................................................................................126 5.5 Experimental Results for Assignment Algorithms that Minimize Energy .................127

5.5.1 Simulation Methodology.................................................................................128 5.5.1.1 The DAG generation.........................................................................128 5.5.1.2 Performance measures ......................................................................128 5.5.1.3 Variations of our algorithms.............................................................129 5.5.1.4 Variations of GA based algorithms ..................................................130

5.5.2 DVS Schemes to Compute Expected Energy in Processor Selection Step.....131 5.5.3 Independence between Time and Energy Requirements ................................131

5.5.3.1 Comparison of energy requirements of proposed algorithms...........132 5.5.3.2 Comparison of energy requirements with GA based algorithms......134 5.5.3.3 Comparison of time requirements ....................................................139

5.5.4 Dependence between Time and Energy Requirements...................................141

6 DYNAMIC ASSIGNMENT ................................................................................................144

6.1 Proposed Dynamic Assignment ..................................................................................145 6.1.1 Choosing a Subset of Tasks for Rescheduling................................................146 6.1.2 Time Range for Selected Tasks ......................................................................147 6.1.3 Estimated Deadline and Energy ......................................................................149 6.1.4 Processor Selection .........................................................................................150

6.2 Experimental Results ..................................................................................................152 6.2.1 System Methodology ......................................................................................153

6.2.1.1 The DAG generation.........................................................................153 6.2.1.2 Dynamic environments generation ...................................................153 6.2.1.3 Performance measures ......................................................................154

6.2.2 Comparison of Energy Requirements .............................................................154 6.2.3 Comparison of Time Requirements ................................................................158

8

7 CONCLUSION AND FUTURE WORK .............................................................................160

7.1 Static Slack Allocation................................................................................................160 7.2 Dynamic Slack Allocation ..........................................................................................161 7.3 Static Assignment .......................................................................................................162 7.4 Dynamic Assignment ..................................................................................................162 7.5 Future Work ................................................................................................................163

LIST OF REFERENCES.............................................................................................................164

BIOGRAPHICAL SKETCH .......................................................................................................171

9

LIST OF TABLES

Table page 3-1 Results for 100 tasks in homogeneous environments: Improvement of PathDVS over

EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............55

3-2 Results for 200 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............56



3-5 Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in homogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS) .....................................................................58

3-6 Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous environments......................................................................................................................59

3-7 Results for 100 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............61




3-11 Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)......................................................63

10

3-12 Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous environments......................................................................................................................64

3-13 Number of tasks participating in search with respect to different number of tasks and processors...........................................................................................................................66

3-14 Depth of search tree with respect to different number of tasks and processors.................66

3-15 Number of nodes explored in search with respect to different number of tasks and processors...........................................................................................................................67

4-1 Normalized energy consumption of k time lookahead and k descendent lookahead algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension.............................................................83

4-2 Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished task rates and time increase rates for 0.05 deadline extension rate ................................................................................91

5-1 Results for 50 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................123





5-6 Results for 100 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates for 100 tasks on 16 processors (unit: percentage) .................................................................................125

11

LIST OF FIGURES

Figure page 1-1 Example of DAG and assignment DAG............................................................................20

1-2 Overall process of scheduling for energy minimization ....................................................23

3-1 Example of a DAG and assignment on two processors.....................................................41

3-2 Compatible task matrix and lists for an example in Figure 1-1.........................................44

3-3 Compression of assignment DAG .....................................................................................47

3-4 Compression of compatible task lists ................................................................................47

3-5 Reduced compatible task lists and search graph................................................................49

3-6 Runtime of PathDVS with respect to different size of DAGs (unit: ms)...........................52

3-7 Normalized energy consumption of PathDVS with respect to different unit slack rates for different number of tasks .....................................................................................53

3-8 Normalized energy consumption of LPDVS with respect to different interval rates for different number of tasks..............................................................................................54

3-9 Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for different number of tasks .......................................58

3-10 Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in homogeneous environments (unit: ms) ................................59

3-11 Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for different number of tasks in heterogeneous environments......................................................................................................................63

3-12 Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in heterogeneous environments (unit: ms)................................64

4-1 Tasks selected for slack reallocation in an assignment DAG depending on dynamic slack allocation algorithms ................................................................................................74

4-2 Overestimation: Time range for selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach ...............................................................78

4-3 Underestimation: Time range for selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach..............................................78

12

4-4 Normalized energy consumption of Greedy, dPathDVS, and kallDescendent with respect to different early finished task rates and time decrease rates for no deadline extension ............................................................................................................................82

4-5 Normalized energy consumption for no deadline extension..............................................84

4-6 Normalized energy consumption for 0.01 deadline extension rate....................................85



4-9 Normalized energy consumption for 0.1 deadline extension rate......................................86

4-10 Normalized energy consumption for 0.2 deadline extension rate......................................87

4-11 Computational time to readjust the schedule from an early finished task with respect to different time decrease rates for no deadline extension (unit: ns - via logarithmic scale) ..................................................................................................................................88

4-12 Results for variable deadline extension rates: Computational time to readjust the schedule from one early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale)........................................................................................89

4-13 Deadline miss ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate ................................................................................90

4-14 Deadline miss ratio for no deadline extension...................................................................92

4-15 Deadline miss ratio for 0.01 deadline extension rate.........................................................93



4-18 Deadline miss ratio for 0.1 deadline extension rate...........................................................94

4-19 Deadline miss ratio for 0.2 deadline extension rate...........................................................95

4-20 Energy increase ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate .........................................................................96

4-21 Energy increase ratio for no deadline extension ................................................................97

4-22 Energy increase ratio for 0.01 deadline extension rate ......................................................97



13

4-25 Energy increase ratio for 0.1 deadline extension rate ........................................................99

4-26 Energy increase ratio for 0.2 deadline extension rate ........................................................99

4-27 Computational time to readjust the schedule from a late finished task with respect to different time increase rates for no deadline extension (unit: ns - via logarithmic scale) ................................................................................................................................100

4-28 Results for variable deadline extension rates: Computational time to readjust the schedule from one late finished task with respect to different time decrease rates (unit: ns – via logarithmic scale)......................................................................................101

5-1 A high level description of proposed scheduling approach .............................................105

5-2 The ICP procedure...........................................................................................................110

5-3 The DVSbasedAssignment procedure ..............................................................................117

5-4 Example of assignment to minimize finish time and assignment to minimize DVS based energy.....................................................................................................................120

5-5 Normalized energy consumption of ICP and CPS using PathDVS with respect to different deadline extension rates for different number of tasks and processors.............127

5-6 Comparison between optimal scheme and greedy scheme for processor selection of A0 for 50 tasks on 4 and 8 processors .............................................................................131

5-7 Results for 50 tasks: Normalized energy consumption of our algorithms with respect to variable deadline extension rates for different number of processors .........................132

5-8 Results for 100 tasks: Normalized energy consumption of our algorithms with respect to variable deadline extension rates for different number of processors.............133

5-9 Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) with respect to different number of processors for variable deadline extension rates (unit: percentage).......................................................................................................................134

5-10 Normalized energy consumption of GARandNonOptimal and our algorithms for different number of tasks and processors.........................................................................136

5-11 Normalized energy consumption of GARandOptimal and our algorithms for different number of tasks and processors .......................................................................................137

5-12 Normalized energy consumption of GASolNonOptimal and our algorithms with respect to different extension rates for different number of tasks and processors...........138

5-13 Normalized energy consumption of GASolNonOptimal and our algorithms .................138

5-14 Normalized energy consumption of GASolOptimal and our algorithms ........................139

14

5-15 Runtime to execute our algorithms with respect to variable deadline extension rates for different number of tasks (unit: ms)...........................................................................140

5-16 Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms – logarithmic scale) ..............140

5-17 Results for 4 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................142

5-18 Results for 8 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................143

6-1 The DynamicDVSbasedAssignment procedure................................................................152

6-2 Results for 4 processors: Normalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks ..................................................................155




6-6 Computational time to readjust the schedule from an early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale) ...................................159

15

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

SCHEDULING ALGORITHMS FOR ENERGY MINIMIZATION

By

Jaeyeon Kang

August 2008 Chair: Sanjay Ranka Cochair: Sartaj Sahni Major: Computer Engineering

Energy consumption is a critical issue in parallel and distributed embedded systems. We

present novel algorithms for energy efficient scheduling of DAG (Directed Acyclic Graph) based

applications on DVS (Dynamic Voltage Scaling) enabled systems. The proposed scheduling

algorithms mainly consist of assignment and slack allocation. All schemes for the assignment

and the slack allocation effectively minimize energy consumption while meeting the deadline

constraints in static or dynamic environments. They are also equally applicable to the

homogenous and heterogeneous parallel machines. Experimental results show that the proposed

algorithms provide significantly good performance for energy minimization and require

considerably small computational time.

16

CHAPTER 1 INTRODUCTION

1.1 Introduction

Computers use a significant and growing portion of the energy consumption. Roughly 8%

of the electricity in the US is now being consumed by computers [1]. A study by Dataquest [15]

reported that the world-wide total power dissipation of processors in PCs was 160MW in 1992,

and by 2001 it had grown to 9000MW. It is now widely recognized that power-aware computing

is no longer an issue confined to mobile and real-time computing environments, but is also

important for desktop and conventional computing as well. In particular, high-performance

parallel and distributed systems, data centers, supercomputers, clusters, embedded systems,

servers, and networks consume considerable amount of energy. In addition to expenses related to

energy consumption of computers, significant additional costs have to be borne for cooling the

facility. Thus reducing the energy requirements of executing an application is very important for

both large scale systems that consume considerable amount of energy and embedded systems

that utilize battery for their power.

More recently, industry and researchers are eyeing multi-core processors, which can attain

higher performance by running multiple threads in parallel [18, 19, 36, 39, 40, 58, 67, 68]. By

integrating multiple cores on a chip, designers hope to sustain performance growth while

depending less on raw circuit speed and decreasing the power requirements per unit of

performance. These workhorses of the next generation of supercomputers and wireless devices

are poised to alter the horizon of high-performance computing. However, proper scheduling and

allocation of applications on these architectures is required [17].

Most effective energy minimization techniques are based on Dynamic Voltage Scaling

(DVS). The DVS technique assigns differential voltages to each task to minimize energy

17

requirements of an application [20, 63, 66]. Assigning differential voltages is the same as

allocating additional time or slack to a task. This technique has been found to be a very effective

method for reducing energy in DVS enabled processors. Scheduling algorithms without DVS

technique such as Energy Aware Scheduling [27, 28] and several heuristics in [61] do not

perform as well in DVS-enabled systems.

There is considerable research on DVS scheduling algorithms for independent tasks in a

single processor real time system [3, 4, 5, 11, 12, 21, 23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52,

59, 60, 70, 72, 73, 76, 78, 79]. Recently, several DVS based algorithms for slack allocation have

been proposed for tasks with precedence relationships in a multiprocessor real time system [6,

13, 22, 29, 31, 45, 46, 47, 48, 51, 55, 56, 57, 75, 77]. The precedence relationships are

represented as a Directed Acyclic Graph (DAG) consisting of nodes that represent computations

and edges that represent the dependency between the nodes. DAGs have been shown to be

representative of a large number of applications.

We explore novel scheduling algorithms for DVS based energy minimization of DAG

based applications on parallel and distributed machines. The proposed schemes are equally

applicable to homogenous and heterogeneous parallel machines. The scheduling of DAG based

applications with the goal of DVS based energy minimization broadly consists of two steps:

assignment and slack allocation.

• Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level. Note that the finish time of DAG at the maximum voltage has to be less than or equal to the deadline for any feasible schedule.

• Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the total energy consumption is minimized while the DAG can execute within a given deadline.

18

A scheduling algorithm can be classified into static scheduling algorithm (i.e., offline

algorithm) and dynamic scheduling algorithm (i.e., online algorithm). The static scheduling

algorithms for DAG execution use the estimated execution time of tasks. However, the estimated

execution time (ET) of tasks may be different from their actual execution time (AET) at runtime.

The dynamic environments can be divided into two broad categories based on whether the actual

execution time is less than or more than the estimated time: overestimation (AET < ET) and

underestimation (AET > ET). These dynamic environments may either potentially give a chance

to minimize energy requirements more or make deadline constraints missed. The dynamic

scheduling algorithms address these problems at runtime with the goals of minimizing energy

consumption and satisfying deadline constraints.

In this thesis, we present novel scheduling algorithms for energy minimization in both

static and dynamic environments. The algorithms can be mainly divided into four categories:

static slack allocation, dynamic slack allocation, static assignment, and dynamic assignment.

Algorithms for each of the four categories will be presented in Chapter 3, 4, 5 and 6,

respectively.

1.2 Preliminaries

In this section, we briefly describe the energy model, the application model, and the

dynamic environments used in this thesis.

1.2.1 Energy Model

The Dynamic Voltage Scaling (DVS) technique reduces the dynamic power dissipation by

dynamically scaling the supply voltage and the clock frequency of processors. The power

dissipation, Pd, is represented by fVCP ddefd ⋅⋅= 2 , where Cef is the switched capacitance, Vdd is

the supply voltage, and f is the operating frequency [9, 10]. The relationship between the supply

19

voltage and the frequency is represented by ( ) ddtdd VVVkf /2−⋅= , where k is the constant of

circuit and Vt is the threshold voltage. The energy consumed to execute task τi, Ei, is expressed

by iddefi cVCE ⋅⋅= 2 , where ci is the number of cycles to execute the task. The supply voltage can

be reduced by decreasing the processor speed. It also reduces energy consumption of task. Here

we use the task’s execution time at the maximum supply voltage during assignment to guarantee

deadline constraints, given as max/ fccompTime ii = .

1.2.2 Application Model

The Directed Acyclic Graph (DAG) represents the workflow among tasks. In a DAG

shown in Figure 1-1 (a), a node represents a task and a directed edge between nodes represents

the precedence relationship of tasks. Given a DAG, the assignment of tasks in a DAG to their

appropriate processors in a parallel architecture will be done through an assignment algorithm.

Figure 1-1 (b) depicts the assignment for the DAG of Figure 1-1 (a). The assignment is various

depending on mapping methods while it satisfies a given deadline of the DAG. Figure 1-1 (c)

represents the assignment DAG, which is the direct workflow among tasks generated after the

assignment. The direct precedence relationship of tasks may change from one in an original

DAG depending on the given assignment. For instance, task τ1 and task τ4 have a direct

dependency in the original DAG, but, in the assignment DAG, they have no direct dependency.

Furthermore, if task τ2 finishes at time 5, task τ5 has no more direct dependency with task τ2

while the dependency is indirectly presented in the assignment DAG. And, there may be

additional dependencies in the assignment DAG due to scheduling constraints within a

processor. For example, task τ3 and task τ4 have a dependency relationship in the assignment

DAG.

20

Figure 1-1. Example of DAG and assignment DAG: (a) DAG, (b) Assignment on two processors, (c) Assignment DAG

1.2.3 Dynamic Environments

The actual execution time (AET) of tasks may be different from their estimated execution

time (ET) used in static scheduling. We divide the tasks into two broad categories based on

whether the actual execution time is less than or more than the estimated time: overestimation

(i.e., AET < ET) and underestimation (i.e., AET > ET).

1.2.3.1 Overestimation

For most real time applications, an upper worst case bound on the actual execution time of

each task is used to guarantee that the application completes in a given time bound. Many such

tasks may complete earlier than expected during the actual execution. Also when historical data

is used to estimate the time requirements, the actual execution time of each task may be less than

its estimated execution time. This allows for dependent tasks to potentially begin at an earlier

time than what was envisioned during the static scheduling. The extra available slack can then be

allocated to tasks that have not yet begun execution with the goal of reducing the total energy

requirements while still meeting the deadline constraints.

1.2.3.2 Underestimation

For many applications that do not use the worst case execution time for estimation, the

actual execution time of a task may be larger than its estimated execution time. In this case, it

0 1 2 3 4 5 6 7 8 9 10 11 12

P0

1

2 3 4

5 6

7 deadline

1 2

3

5

4 6 7

7

1

2 3

5 4

6

P1

(a) (b) (c)

21

cannot be guaranteed that the deadline constraints will be always satisfied. However, slack can

be removed from future tasks with the hope of satisfying the deadline constraints as closely as

possible while trying to keep energy reduction.

1.3 Scheduling for Energy Minimization

Figure 1-2 shows the overall process of scheduling algorithm for energy minimization. The

following four step process for scheduling tasks in a DAG for energy minimization is broadly

required:

• Static assignment • Static slack allocation • Dynamic assignment • Dynamic slack allocation 1.3.1 Static Assignment

The static assignment process determines the ordering to execute tasks and the mapping of

tasks to processors based on the computation time at the maximum voltage level. The schedule

generated from this process is not completed because there may be slack until the deadline. The

assignment is performed by two different methods: assignment to minimize total finish time and

assignment to minimize total energy consumption.

1.3.1.1 Assignment to minimize total finish time

The assignment is performed in order to minimize total finish time of a DAG. The deadline

has to be greater than or equal to the total finish time for a feasible solution. An important side

effect of minimizing the total finish time is that for a given deadline, the total amount of

available slack is increased. In general, higher slack should lead to lower energy after the

application of slack allocation algorithms.

22

1.3.1.2 Assignment to minimize total energy consumption

The assignment is performed in order to minimize total energy consumption after slack

allocation (i.e., DVS based energy) while still meeting the deadline constraints. It can be done by

considering the energy consumption while determining the execution ordering of tasks and

expected energy after slack allocation while mapping tasks to processors. In general,

incorporating energy minimization during the assignment process should lead to better

performance in terms of reducing energy requirements.

1.3.2 Static Slack Allocation

The static slack allocation process allocates slack to tasks to minimize energy consumption

while meeting deadline constraints at compile time. The initial static schedule is generated after

static assignment and static slack allocation (i.e., static scheduling). The problem of slack

allocation can be posed as the following: Allocate a variable amount of slack to each task so that

the total energy consumption is minimized while the deadlines are met.

1.3.3 Dynamic Assignment

The dynamic assignment process reassigns tasks to processors whenever a task finishes

earlier or later than expected based on the current schedule (i.e., the initial static schedule or the

previous schedule updated at runtime) at runtime. The reassignment is performed to minimize

DVS based energy. However, if the deadline constraints are not satisfied, the reassignment is

ignored and the current assignment is kept. Once the reassignment is determined, slack is

reallocated to tasks (i.e., dynamic slack allocation) to minimize energy consumption while still

meeting the deadline constraints.

1.3.4 Dynamic Slack Allocation

The dynamic slack allocation process reallocates slack to tasks whenever a task finishes

earlier or later than expected based on the current schedule (i.e., the initial static schedule or the

23

previous schedule updated at runtime) at runtime. The current schedule is initialized to the static

schedule and updated whenever dynamic scheduling is applied from the occurrence of early or

late finished tasks at runtime. The assignment is not changed during slack reallocation. The main

goal of dynamic slack allocation algorithm is slightly different depending on dynamic

environments (i.e., whether the estimated execution time of a task is overestimated or

underestimated). For overestimation, the dynamic slack allocation algorithm minimizes energy

consumption while guaranteeing that the deadline constraints are always met. For

underestimation, it tries to reduce the possibility of the DAG not completing by the required

deadline while trying to keep energy reduction.

Figure 1-2. Overall process of scheduling for energy minimization

Static Scheduling

Static Assignment

Static Slack Allocation

runtime Dynamic Scheduling

Dynamic Assignment

Dynamic Slack Allocation

24

1.4 Contributions

In this section, we present the main contributions of the proposed scheduling algorithms

presented in this thesis.

1.4.1 Static Assignment to Minimize Total Finish Time:

While most of prior research on the scheduling for energy minimization of DAGs has not

concentrated on the assignment process, we show that the assignment itself is very important to

minimize energy requirements as much as the slack allocation process. In general, minimizing

the time (i.e., scheduling length of a DAG) and minimizing the energy are referred to as

conflicting goals. However, when using DVS techniques under a specified deadline, we show

that minimizing total finish time can lead to better energy requirements due to the increase of

total amount of available slack. The main features of the proposed static assignment algorithm

minimizing finish time are as follows:

• Assign multiple independent ready tasks simultaneously: The computation of priority of a task depends on estimating the execution path from this task to the last task of the DAG representing the workflow. Since the mapping of tasks yet to be scheduled is unknown and the cost of task execution depends on the processor that is assigned, the priority has to be approximated during scheduling. Hence, it is difficult to explicitly distinguish the execution order of tasks with similar priorities. Using this intuition, the proposed algorithm forms independent ready tasks whose priorities are similar into a group and finds an optimal solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready tasks that can be assigned consists of tasks for which all the predecessors have already been assigned.

• Iteratively refine the scheduling: The scheduling is iteratively refined by using the cost of the critical path based on the assignment generated in the previous iteration. Here the critical path is defined by the length of the longest path from a task to an exit task and it is used to determine the priority of the task. Assuming that the mappings of the previous iteration are good, it provides a better estimate of the cost of the critical path than using the average or median computation and communication time as the estimate in the first iteration.

25

1.4.2 Static Assignment to Minimize Total Energy Consumption

Most of the prior research on the scheduling for energy minimization of DAGs is based on

a simple list based assignment algorithm. The assignment that minimizes total finish time may be

a reasonable approach as minimizing time generally leads to more slack to be allocated and

finally reducing the energy requirements during the slack allocation step. However, this approach

cannot incorporate the differential energy and time requirements of each task of the workflow on

different processors. Our assignment algorithms mitigate the problem by considering the

expected effect of slack allocation during the assignment process. They significantly outperform

other existing algorithms in terms of energy consumption. Furthermore, they require small

computational time. The main features of the proposed static assignment algorithms minimizing

energy consumption are as follows:

• Utilize expected DVS based energy information during assignment: Our algorithm assigns the appropriate processor for each task such that the total energy expected after slack allocation is minimized. The expected energy after slack allocation (i.e., expected DVS based energy) for each task is computed by using the estimated deadline for each task so that the overall DAG can be executed within the deadline of the DAG. This leads to good performance in terms of energy minimization.

• Consider multiple task prioritizations: We test multiple assignments using multiple task prioritizations based on tradeoffs between energy and time for each task. This leads to good performance in terms of energy minimization. Furthermore, the execution of these assignments can be potentially done in parallel to minimize the computational time (i.e., runtime overhead).

1.4.3 Static Slack Allocation to Minimize Total Energy Consumption

The proposed scheduling algorithm, Path based DVS algorithm, finds the best task set that

can efficiently use unit slack for minimizing energy consumption. It incorporates assignment

based dependency relationships among tasks as well as different energy profiles of tasks on

different processors. It provides near optimal solutions for energy minimization with

considerably smaller computational time and memory requirements as compared to an existing

26

algorithm that provides near optimal solutions (i.e., linear programming based approach). The

main features of the proposed static slack allocation algorithm are as follows, in particular, in a

perspective of requiring small computation time:

• Utilize compatible task matrix: The compatible task matrix represents the list of tasks which can share unit slack (i.e., a minimum indivisible unit slack) together for each task. The matrix is composed based on the following two characteristics: First, each assignment-based path which consists of tasks with precedence relationships in an assignment DAG cannot have more than one unit slack. Second, this unit slack cannot be allocated to more than one task on each assignment-based path. Using the matrix, the branch and bound search method can be efficiently applied.

• Apply search space reduction techniques: In general, the branch and bound search method requires large computational time. Thus, to reduce the search space (which reduces the computational time as a result), we check whether each task is a divide a fully independent task, a fully dependent task, or a compressible task. Here only one representative of compressible tasks participates in the search. It dramatically reduces the search space while not reducing the quality of energy performance.

1.4.4 Dynamic Slack Allocation to Minimize Total Energy Consumption

Prior dynamic slack allocation algorithms for DAGs are based on using a simple greedy

approach that allocates the slack to the next ready task on the same processor where the task that

completes earlier than expected was executed. This slack forwarding based approach, although

fast, is shown not to perform well in our experiments in terms of energy reduction. A simple

option for adjusting slack at runtime is to reapply the static slack allocation algorithms for the

unexecuted tasks when a task finishes early or late. It can be expected to be close to the best that

can be achieved for energy minimization, particularly when applying near optimal static slack

allocation algorithms. However, the time requirements of static algorithms are large and they

may not be practical for many runtime scenarios. The proposed dynamic slack allocation

algorithms effectively reallocate the slack to unexecuted tasks to reduce more energy and/or

meet a given deadline at runtime. They are comparable to static algorithms applied at runtime in

terms of reducing energy and/or meeting a given deadline, but require considerably smaller

27

computational time. Also, they are effective for cases when the estimated execution time of tasks

is underestimated or overestimated. The main features of the proposed dynamic slack allocation

algorithms are as follows:

• Select the subset of tasks for slack reallocation: The potentially rescheduled tasks via the dynamic slack allocation algorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic slack allocation (i.e., rescheduling) is applied to the subset of tasks that depends on the algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead of reallocating the slack during runtime. Clearly, this should be done so that the other goal of energy reduction is also met simultaneously.

• Determine the time range for the selected tasks: The time range of the selected tasks has to be changed as some of the tasks have completed earlier or later than expected. Based on the computation time in the current schedule and assignment-based dependency relationships among tasks, we recompute the time range (i.e., earliest start time and latest finish time) where the selected tasks should be executed. Slack is allocated to the selected tasks within this time range in order to try to meet the deadline constraints.

1.4.5 Dynamic Assignment to Minimize Total Energy Consumption

There is very little research on the dynamic scheduling for DAGs with the goal of energy

minimization. We have shown that reallocating the slack at runtime (i.e., dynamic slack

allocation) leads to better energy minimization. However, it may not be enough to improve

energy requirements at runtime. We show that reassignment of tasks along with reallocation of

slack during runtime can lead to better performance in terms of energy minimization as

compared to only reallocating the slack at runtime. For an approach that is effective and useful at

runtime, its computational time (i.e., runtime overhead) is also small. The main features of the

proposed dynamic assignment algorithm are as follows:

• Select the subset of tasks for reassignment: Like in dynamic slack allocation, the potentially rescheduled tasks via the dynamic assignment algorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic reassignment is applied to the subset of tasks among the tasks. The tasks considered for rescheduling are limited in order to minimize the overhead of reassigning processors during runtime.

28

• Determine the time range for the selected tasks: The time range of the selected tasks has to be determined in order to meet the deadline constraints. Based on the computation time in the current schedule and assignment-based dependency relationships among tasks, we recompute the time range where the selected tasks should be executed. While the time range is defined for the selected tasks given an assignment in the dynamic slack reallocation (i.e., earliest start time and latest finish time for the selected tasks on their assigned processors), for reassignment, it is defined over each processor for the selected tasks (i.e., available earliest start time and latest finish time for the selected tasks on each processor). The reassignment for the selected tasks is performed within this determined time range.

• Utilize expected DVS based energy information during reassignment: Our algorithm reassigns the appropriate processor for each selected task such that the total energy expected after slack allocation is minimized. The expected DVS based energy for each selected task is computed by using the estimated deadline for each task so that the selected tasks can be executed within the time range. This leads to good performance in terms of energy minimization while meeting deadline constraints.

1.5 Document Layout

The remainder of this document is organized as follows. Chapter 2 presents the related

work on scheduling for energy minimization. Chapter 3 presents the static slack allocation

algorithm to minimize total energy consumption under the deadline constrains. Chapter 4

presents the dynamic slack allocation to minimize total energy consumption under the deadline

constraints at runtime. Chapter 5 presents the static assignment algorithms. Chapter 6 presents

the dynamic assignment algorithm to minimize total energy consumption. In Chapter 7,

conclusion and future work are described.

29

CHAPTER 2 RELATED WORK

There has been significant interest in the development of energy aware scheduling

algorithms has been actively conducted as the energy is important in many systems. The energy

aware scheduling algorithms can be divided depending on their goal: scheduling to minimize

overall energy consumption, scheduling to balance energy consumption for each processor, and

so on. The scheduling with the goal of balancing energy is usually applicable in wireless sensor

networks [74]. For most other cases, the scheduling is done with the goal of energy minimization

and is the focus of this dissertation.

The scheduling algorithms for energy minimization can be broadly divided depending on:

• Whether Dynamic Voltage Scaling (DVS) technique is used or not? • Whether it is for independent tasks or dependent tasks (i.e., tasks with precedence

relationships)? • Whether it is for single processor systems or multiprocessor systems? • Whether it is for homogeneous systems or heterogeneous systems? • Whether it is applied at compile time or runtime?

In the following, we briefly describe the current work that addresses the above issues.

Several algorithms have developed to minimize energy consumption without DVS

technique [27, 28, 61]. However, they do not perform well in DVS-enabled systems. Also, the

DVS technique has been found to be a very effective method for reducing energy in DVS

enabled processors. The proposed scheduling algorithms in this thesis focus on the DVS

technique.

The scheduling algorithm for energy minimization can be divided depending on the

characteristics of tasks consisting of target applications: scheduling for independent tasks and

scheduling for dependent tasks (i.e., tasks with precedence relationships). The precedence

relationships are represented as a Directed Acyclic Graph (DAG) consisting of nodes that

30

represent computations and edges that represent the dependency between the nodes. There is

considerable research on DVS scheduling algorithms for independent tasks [3, 4, 5, 11, 12, 21,

23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52, 59, 60, 70, 72, 73, 76, 78, 79]. However, many

applications are represented by DAG. The proposed scheduling algorithms in this thesis are

focused on DAG based applications.

The scheduling algorithms for energy minimization can be also be categorized based on

whether the target system is a single processor system or a multiprocessor system. There is

considerable research on DVS scheduling algorithms in a single processor real time system [3, 4,

5, 25, 49, 50, 52, 72]. However, in practice, a multiprocessor real time system is used to execute

many applications. The proposed scheduling algorithms in this thesis focus on a multiprocessor

system. In addition, the multiprocessor system can be divided into a homogeneous

multiprocessor system and a heterogeneous multiprocessor system. While several prior

scheduling algorithms in a multiprocessor system can only apply for a homogeneous system, the

proposed scheduling algorithms in this thesis are applicable for both homogeneous and

heterogeneous systems.

Finally, the scheduling algorithms for energy minimization can be also divided depending

on whether it is applied at compile time (i.e., static algorithms) or at runtime (i.e., dynamic

algorithms). Several runtime approaches have been studied in the literature [4, 5, 21, 23, 33, 35,

47, 51, 59, 60, 76, 77, 78, 79 ]. However, most of these approaches have been developed for

independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. The proposed scheduling algorithms in

this thesis focus on the dynamic algorithms for DAG based applications in a multiprocessor

system as well as the static algorithms.

31

As described in Chapter 1, the scheduling algorithm for energy minimization broadly

consists of two steps: assignment and then slack allocation. Most of the prior research on the

scheduling for energy minimization of DAGs on parallel machines has not focused on the

assignment process, but more on the slack allocation process. However, the assignment process

is very important to minimize energy consumption in addition to the slack allocation process.

The proposed scheduling algorithms in this thesis focus on both the assignment algorithms and

the slack allocation algorithms for energy minimization.

In the following sections, we present related work for static slack allocation, dynamic slack

allocation, static assignment, and dynamic assignment, for the scheduling of DAG based

applications on homogenous and heterogeneous parallel processors respectively, in detail.

2.1 Static Slack Allocation

There is considerable research on DVS scheduling algorithms for independent tasks [3, 11,

12, 25, 26, 34, 38, 43, 49, 50, 52, 59, 70, 72, 73]. Recently, several DVS based algorithms for

slack allocation have been proposed for tasks with precedence relationships in a multiprocessor

real time system [6, 13, 22, 29, 45, 46, 48, 55, 57, 75]. The slack allocation algorithms (i.e., DVS

scheme) can be mainly divided into two categories: non-optimal slack allocation and near-

optimal slack allocation.

2.1.1 Non-optimal Slack Allocation

The slack is greedily allocated to tasks based on decreasing or increasing order of their

finish time [13], or allocated evenly to all possible tasks [45]. In [22], the scheduling algorithm

iteratively assigns slack based on dynamic recalculation of priorities. The algorithms in [13, 22,

45] ignore the various energy profiles of tasks on different processors during slack allocation and

lead to poor energy reduction. Using these energy profiles can lead to reduction in potential

32

energy saving [48, 55]. The static slack allocation algorithms described in [48, 55] work as

follows:

• Divide the total slack available into an equal partition called “unit slack”

• Iteratively execute the following till all the available slack is used: Allocate the unit slack to a task(s) that leads to maximum reduction in energy

However, because of the dependency relationships among tasks in an assignment, the sum of

energy reduction of several tasks (i.e., tasks executed in parallel) may be higher than the highest

energy reduction of a single task(s). In this case, the allocation of slack to a single task(s) with

the highest energy reduction one at a time as used in [48, 55] leads to suboptimal slack

allocation. Our scheme effectively exploits this fact to determine a set of multiple independent

tasks which cumulatively have the maximum energy reduction.

2.1.2 Near-optimal Slack Allocation

As a near-optimal slack allocation algorithm, Linear Programming (LP) based approach

has been developed [75]. The formulation in [75] for the continuous voltage case is formulated

as Linear Programming (LP) problem where the objective is the minimization of total energy

consumption. The constraints include deadline constraints for each task and the relationships

among tasks from an original DAG and the relationships among tasks on the same processor

after assignment. Since the formulation in [75] does not consider the communication time among

tasks, we extend the version by considering the communication time when representing

precedence relationships among tasks. The linear based formulation for the continuous voltage

case is as follows:

33

( )

iii

ii

iiii

sourcesink

jijiiiijij

Γii

compTimedeadlinestartTimecompTimex

deadlinexstartTimedeadlinestartTimestartTime

pPredpredwherexcommTimestartTimestartTime

xf

−≤≤≤≤

Γ∈∀≤+≤−

∈∈∀≥−−−

∑∈

0 0

,

, , ,0 subject to

Minimize

τ

τττ

where xi is the computation time of task τi that can be slowed, f(xi) is the energy model

depending on computation time, startTimei is the start time of task τi on its assigned processor,

predi is the set of direct predecessors of task τi in a DAG, pPredi is the task assigned prior to task

τi on the same assigned processor, compTimei is the computation time of task τi on its assigned

processor, and commTimeij is the communication time between task τi and task τj on their

assigned processors. The sink and source nodes are dummy nodes representing the start and end

of a DAG, respectively. Their computation time and communication time connected to both

nodes are zero.

The function f(x), in general is a nonlinear function. As an effective approximation, the

convex objective function that minimizes energy can be formulated as a piecewise linear

function. The accuracy of this approximation increases with the larger number of intervals (or

smaller length of intervals). This effectively leads to choices that are more energy efficient.

Convex optimization problems for the target application with linear constraints and objective

function that is the sum of convex function of independent variables can be solved in polynomial

time [2, 24, 37]. This is based on using a piecewise linear approximation of the energy functions

for each variable. In [24], the number of intervals for the piecewise linear function is

proportional to 8n (In our case, n will be the number of tasks). This processor has to be repeated

multiple times to achieve the required level of accuracy. In practice, we found that significantly

34

less number of intervals and a single iteration is sufficient to achieve acceptable level of

accuracy (i.e. level after which the reduction in energy plateaus).

The LP based algorithm provides near optimal solutions but requires much time and

memory requirements. Our scheme addresses the problems (i.e., time and memory) by

combining compatible task matrix, search space reduction techniques, and lower bound while

providing near optimal solutions.

2.2 Dynamic Slack Allocation

Several runtime approaches for slack allocation have been studied in the literature [4, 5,

21, 23, 33, 35, 47, 51, 59, 60, 76, 77, 78, 79]. Most of these approaches have been developed for

independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. For tasks with precedence relationships in

a multiprocessor real time system, the algorithm in [51] uses greedy technique (i.e., slack

forwarding) that allocates the generated slack to the next ready task on the same processor where

an early finished task was executed. Although the time requirement of the greedy approach is

small, the performance in terms of reducing energy is significantly lower than applying the static

methods at runtime. Our methods show that the use of more intelligent methods can lead to

improved reduction in energy requirements.

2.3 Static Assignment

The assignment algorithms used in the scheduling for energy minimization can be mainly

classified into the following two broad categories: assignment to minimize finish time and

assignment to minimize energy.

• Assignment to minimize finish time: The goal of this assignment is to minimize total finish time of a DAG. If the deadline constraints are met, appropriate slack is allocated in the second phase to tasks to minimize energy.

• Assignment to minimize energy: This method tries to make assignments that lead to lower energy (before slack allocation) but may not meet deadline constraints. Furthermore, even if they minimize total energy consumption before slack allocation, they may not minimize

35

the energy consumption after slack allocation. This is because the energy after slack allocation depends on the execution time, available slack, and energy profiles of the tasks.

Most prior scheduling algorithms for energy minimization use simple list assignment

algorithms. Parallel computing literature consists of a variety of algorithms that minimize the

finish time of DAG on a parallel machine. Prior research on task scheduling in DAGs to

minimize total finish time has mainly focused on algorithms for a homogeneous environment

[16, 41, 42, 54, 69, 71]. Scheduling algorithms such as Dynamic Critical Path (DCP) algorithm

[41] that give good performance in a homogeneous environment may not be efficient for a

heterogeneous environment as the computation time of a task may be dependent on the processor

to which the task is mapped. Several scheduling algorithms for a heterogeneous environment

have been recently proposed [8, 32, 44, 62, 64]. Most of them are based on static list scheduling

heuristics to minimize the finish time of DAGs, for example, Dynamic Level Scheduling (DLS)

[62], Heterogeneous Earliest Finish Time (HEFT) [64], and Iterative List Scheduling (ILS) [44].

The DLS algorithm selects a task to schedule and a processor where the task will be executed at

each step. It has two features that can have an adverse impact on its performance. First, it uses

the earliest start time to select a processor for a task to be scheduled. This may not be effective

for a heterogeneous environment as the completion of the task may depend on the processor

where the task is assigned. Second, it uses the average of computation time across all the

processors for a given task to determine a critical task. This can cause an inaccurate estimation of

task’s priority.

The HEFT algorithm reduces the cost of scheduling by using pre-calculated priorities of

tasks in scheduling and uses the earliest finish time for the selection of a processor. This can, in

general, provide better performance as compared to the DLS algorithm. However, since the

algorithm uses the average of computation time across all the processors for a given task to

36

determine tasks’ priorities, it may lead an inaccurate ordering for executing tasks. To address the

problem, the ILS algorithm generates an initial schedule by using HEFT and iteratively improves

it by updating priorities of tasks. While it has been shown to have good performance [44], we

show that the determination of task’s priority can be improved by using group based assignment.

This is because the calculated priorities of tasks have a degree of inaccuracy on a heterogeneous

environment as the assignment of future tasks is unknown.

Most of existing algorithms for energy minimization are based on one execution of

assignment and slack allocation. To improve performance in terms of energy, an iterative

execution of assignment and slack allocation based on genetic algorithms or simulated annealing

has been proposed. They are based on trying out several assignments (or iteratively refining the

assignment). Each assignment is followed by a slack allocation algorithm to determine the

energy requirements. The Genetic Algorithm (GA) based approach in [56, 57] consists of two

interleaved steps:

• Processor selection for tasks based on GA

• For each processor selection, derive the best scheduling which includes the execution ordering of tasks using another GA

Each GA evolves the solutions via two point crossover and mutation from randomly

generated initial solutions and explores the large search space for getting better solution. Given

each schedule from the processor selection and the task ordering, a DVS based slack allocation

scheme is applied. This approach was shown to outperform existing algorithms in terms of

energy consumption based on their experimental results. However, the assignment itself still

does not consider the energy consumption after slack allocation. Also, the testing of energy

requirements of multiple solutions each corresponding to a different assignment requires

considerable computational time.

37

2.4 Dynamic Assignment

There is little research on the dynamic scheduling for DAGs with the goal of energy

minimization. Furthermore, the existing dynamic scheduling algorithms have concentrated only

on dynamic slack reallocation. However, as shown in this thesis, reassignment of tasks (i.e.,

dynamic assignment) along with reallocation of slack during runtime can be expected to lead to

better performance in terms of energy minimization.

38

CHAPTER 3 STATIC SLACK ALLOCATION

The slack allocation algorithms assume that an assignment of tasks to processors has

already been made. The problem of slack allocation can be posed as the following:

Allocate variable amount of slack to each task so that the total energy is minimized while the deadlines can still be met.

Most prior slack allocation algorithms provide non-optimal solutions for energy

minimization. They ignore the various energy profiles of tasks on different processors during

slack allocation. While some of algorithms use the energy profiles for better energy

minimization, they still ignore the dependency relationships among tasks in an assignment. All

of them lead to poor energy reduction. To address these problems, our slack allocation

algorithm incorporates assignment based dependency relationships among tasks as well as

different energy profiles of tasks. Unlike most algorithms, a Linear Programming (LP) based

approach provides near optimal solutions for energy minimization. However, it requires large

computational time and memory. We introduce a slack allocation algorithm which provides close

to optimal solutions for energy minimization but requires less computational time and memory

compared to LP based approach.

3.1 Proposed Slack Allocation

The Path based algorithm, our novel approach for energy minimization, is an iterative

approach that allocates a small amount of slack (called unit slack) in each iteration and asks the

following question:

Find the subset of tasks that can be allocated this unit slack so that the total energy consumption is minimized while the deadline constraint is also met.

39

The above process is iteratively applied till all the slack is used. We show that each iteration of

the problem can be reduced to finding a weighted maximal independent set of tasks, where the

weight is given by the amount of energy reduction by allocating unit slack.

The dependency relationships in an assignment DAG constrain the total slack which can be

allocated to the different tasks. For instance, in Figure 1-1, consider an example in which one

unit of slack can be allocated (i.e. the deadline is 12 units). The total unit slack that can be

allocated for the one unit of slack is one or two:

• If task τ7 (or τ1) is allocated the slack, no other task can use this slack in order to satisfy the deadline constraints.

• Tasks τ2 andτ3 (or τ2 & τ4, τ2 & τ6, τ4 & τ5, τ5 & τ6) can use this slack concurrently as they are not dependent on each other and both can be slowed down.

The appropriate option to choose between the two choices depends on the energy reduction in

task τ7 versus the sum of energy reduction for tasks τ2 and τ3.

Our slack allocation algorithm considers the overall assignment-based dependency

relationships among tasks, while the most existing algorithms ignore them. We define two

phases:

• Phase 1: Slack allocation from start time to total finish time based on a given assignment - in this case the slack can be allocated to only a subset of tasks that are not on the critical path.

• Phase 2: Slack allocation from total finish time to deadline - in this case the slack can potentially be allocated to all the tasks.

For instance, while, in Figure 1-1, there is no slack from start time to total finish time, in

Figure 3-1, the slack of time 5 to 6 is considered for the slack allocation from start time to total

finish time. The slack can be allocated only to task τ2. However, the slack of time 8 to 9 at Phase

2 can be allocated to a subset of tasks (e.g., τ1, τ2 & τ3, or τ4).

40

The execution of Phase 1 precedes the execution of Phase 2 to expect more energy saving

by reducing the possibility of redundant slack allocation to the same tasks. In the example of

Figure 3-1, assume that the energy of tasks τ1, τ2, τ3, and τ4 reduced by allocating one time unit of

slack is 1, 10, 1, and 10, respectively and the energy model follows a quadratic function. The

total energy saving is 20 by allocating slack to task τ2 at Phase 1 and then task τ4 at Phase 2.

Meanwhile, when allocating slack to tasks τ2 and τ3 at Phase 2 and then task τ2 at Phase 1, the

total energy saving is 16.6. It gives a difference of 17%.

For each of the two phases, our algorithm iteratively allocates one unit of slack (the size of

this unit called unitSlack is a parameter). For Phase 1, at each iteration over unitSlack, only tasks

with the maximum available slack are considered because of the limited number of slack

allocable tasks and the different amount of available slack for each task. Thus tasks considered at

each iteration may be changed. For instance, consider an example where only three tasks have

available slack of 5, 4, and 3 respectively. In the first iteration, only one task with a slack of 5

will be considered. In the next iteration, two tasks will be considered as both of them have a

slack of 4. This process is iteratively executed till there is no task which can use slack until total

finish time. Meanwhile, at Phase 2, all tasks are considered for slack allocation at each iteration.

The number of iterations at Phase 2 is equal to totalSlack divided by unitSlack, where totalSlack

is defined by the difference of actual deadline and total finish time. At each iteration, one

unitSlack is allocated to one or more tasks that lead to maximum sum of energy reduction over

the full use of the unitSlack. The characteristic that each task is allocated the entire unitSlack or

no slack during each iteration allows for the use of branch and bound techniques to find the

optimal slack allocation. The size of the unitSlack can be reduced to a level where the further

reducing it does not significantly improve the energy requirements.

41

Figure 3-1. Example of a DAG and assignment on two processors

3.2 Unit Slack Allocation

In this section, we present our slack allocation algorithm over a minimum indivisible unit

slack, called as unitSlack, which finds the best task set that can efficiently use unitSlack for

minimizing energy consumption. A key requirement of the slack allocation algorithm is to

incorporate assignment-based dependency relationships among tasks as well as different energy

profiles of tasks on different processors.

The slack allocation algorithm is motivated from the characteristic that each assignment-

based path which consists of tasks with precedence relationships in an assignment DAG cannot

have more than one unitSlack. Furthermore, this slack cannot be allocated to more than one task

on each path. In Figure 1-1, there are three assignment-based paths: τ1-τ2-τ5-τ7 (Path1), τ1-τ3-τ5-

τ7, (Path2), and τ1-τ3-τ4-τ6-τ7 (Path3). The maximum amount of unitSlack that can be allocated to

tasks is the number of paths and only one task along each of these three paths can be allocated

the unitSlack. An implication of the above is that two tasks on the same path of an assignment

DAG cannot both be allocated unitSlack. Using a matrix which represents tasks that can share

slack for given tasks, the branch and bound search method is efficiently applied.

3.2.1 Maximum Available Slack for a Task

Each task has differential amount of maximum available slack. This is due to the fact that

the assignment algorithm has to maintain the precedence relationships among tasks in an original

0 1 2 3 4 5 6 7 8 9

deadline

1

2 3

4

P0

P1 3

4 2 1

42

DAG. This slack is divided by unitSlack for normalization, i.e., the maximum number of

unitSlack’s that can be allocated to a task is equal to maximum available slack divided by

unitSlack. The maximum available slack of task τi, slacki, is defined by the difference of the

latest start time of τi, LSTi, and the earliest start time of τi, ESTi. The latest start time of task τi,

the earliest start time of task τi, and the slack of task τi are respectively defined by

( ) iijjsuccjpSuccii compTimecommTimeLSTLSTdeadlineLSTi

i−⎟

⎠⎞⎜

⎝⎛ −=

∈min,,min

( )( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

++

+=

∈ ijjjpred

pPredpPredi

i commTimecompTimeEST

compTimeESTstartEST

ij

ii

τmax

,,max

iii ESTLSTslack −=

where deadlinei is the deadline of task τi, starti is the start time of τi, succi is the set of direct

successors of τi in a DAG, pSucci is the task assigned next to τi on the same assigned processor,

predi is the set of direct predecessors of τi in a DAG, and pPredi is the task assigned prior to τi on

the same assigned processor. Note that at Phase 1 the deadline is assumed to be equal to total

finish time unless the specified deadline of a task is earlier than the total finish time.

3.2.2 Compatible Task Matrix

The matrix represents the list of tasks which can share unitSlack together for each task or

vice versa. If task τi and task τj are in the same assignment-based path, elements mij and mji in

compatible task matrix M are set to zero. Otherwise, the elements are set to one. The elements

related to the same task (i.e., mij where i = j) are set to zero. If the value of element indicating the

relationship of two tasks is equal to one, the two tasks can share unitSlack together because they

are independently (or in parallel) executed. However, if the value is equal to zero, the two tasks

cannot share unitSlack because only one task in each assignment-based path can have unitSlack.

43

The assignment-based dependency relationships among tasks may be changed after slack

allocation over unitSlack. The change of assignment-based dependency relationships also lets

compatible task matrix modified. The compatible task matrix M is defined by

⎩⎨⎧ =∩

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=otherwise ,0

if ,1 where,

... : :

... ...

21

22221

11211

φjiij

nnnn

n

n

ΠΠm

mmm

mmmmmm

M

where mij indicates whether task τi and task τj can be slack-sharable and Πi is the set of

assignment-based paths including task τi. While n (matrix size: n by n) is the total number of

tasks at Phase 2, it is the number of tasks whose maximum available slack is the greatest size at

Phase 1. This matrix can be easily generated by performing a transitive closure on the

assignment DAG and then taking the complement of that matrix. The DAG structure can also be

used to derive a list of ancestors for each task. This list can be updated by performing a level

wise search of the DAG.

In most cases, it generates a sparse matrix. This can be effectively represented by an array

of lists (one for each task). The compatible task list of task τi consists of tasks not in the same

paths with the task τi. Thus tasks included in compatibleTaski are ones which can share unitSlack

together with task τi. The compatible task list of task τi, compatibleTaski, is defined by

{ }Γ∈Γ∈=Π∩Π= ikiki wherekTaskcompatible ττφ ,,|

where Γ is the set of all tasks in a DAG.

Figure 3-2 shows the compatible matrix and lists for the example in Figure 1-1. Using the

compatible task matrix/lists, the set of tasks which can share unitSlack together is found such that

the sum of energy reduction of tasks is maximized. It corresponds to the maximum weighted

independent set (MWIS) problem which is known to be NP-hard [7, 53, 65]. Our approach on

44

task scheduling for energy minimization addresses this problem using a branch and bound search

and demonstrates its efficiency.

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=======

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

[]]5 ,2[]6 ,4[]5 ,2[

]2[]6 4, ,3[

[]

0000000001001001010000010010000001001011000000000

7

6

5

4

TaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatible

M3

2

1

Figure 3-2. Compatible task matrix and lists for an example in Figure 1-1

3.2.3 Search Space Reduction

We reduce the search space by performing the following checks for each task: fully

independent tasks, fully dependent tasks, and compressible tasks. The rule to distinguish task τi

using compatible task matrix and lists is as follows:

searchin consider Else

candidate allocable as consider then 0 , If Else

to allocate then 1 , If

i

iij

iij

τ

unitSlackτmj

τunitSlackmjj, i

=∀

=≠∀

The rule to distinguish task τi using compatible task lists is as follows:

( ) ( )( )

searchin consider Elsecandidate allocable as consider then 0 If Else

to allocate then 1 If

i

ii

ii

unitSlackTaskcompatibleNτunitSlack-ΓNTaskcompatibleN

ττ=

=

where N(compatibleTaski) is the number of tasks in compatibleTaski and N(Γ) is the number of

tasks.

45

3.2.3.1 Fully independent tasks

If a task is included only in an assignment-based path consisting of only the task (i.e.,

independent task from all of other tasks), unitSlack is certainly allocated to the task without

search regardless of the slack allocation of other tasks.

3.2.3.2 Fully dependent tasks

If a task are in all assignment-based paths (i.e., dependent task with all tasks), the task is

one of candidate task sets which unitSlack can be allocated to. Thus, the energy reduction of the

task is compared with those of other candidates without including this during the search. In

Figure 3-2, tasks τ1 and τ7 are the examples of fully dependent tasks.

3.2.3.3 Compressible tasks

The tasks on the same assignment-based paths can be represented by a single task for the

purpose of slack allocation. The representative of compressible tasks is a task with the maximum

energy reduction among the compressible tasks. This can lead to substantial reduction in runtime

without decreasing energy performance. In the assignment DAG of Figure 3-3 (a), tasks τ3, τ5,

and τ11 can be compressed and represented by a single representative task (e.g., τ3) since the

paths where they are included are all same. The representative of the compressed tasks is a task

with the maximum energy reduction among compressed tasks. Using compatible task lists, we

can check if tasks can be compressed instead of seeing assignment-based paths including tasks.

The rule of compression from a compatible task list is as follows:

jijiick ererTaskcompatibleTaskcompatible >=← & if ττ

where τkc is the k-th compressed task including a representative of the compressed task and eri is

energy reduction of task τi.

46

Figure 3-3 illustrates an initial assignment DAG and its compressed DAG for a given

application with 12 tasks. In Figure 3-4, the compression process of compatible task lists for the

example of Figure 3-3 is illustrated. In other words, (a) and (b) in Figure 3-4 represent the

assignment DAG of (a) and the compressed assignment DAG of (b) in Figure 3-3, respectively,

using compatible task lists. From the initial compatible task lists, the following tasks are

compressed: (τ1, τ12), (τ2, τ9, τ10), (τ3, τ5, τ11), and (τ4, τ8). Each compressible task list is

represented by one task with the maximum energy reduction (e.g., τ1, τ2, τ3, τ8). The second

column in Figure 3-4 (b) shows compatible task lists after the compression. Any fully

independent task (e.g., τ2) is automatically allocated unitSlack and excluded for the search by

removing the task from the compatible task lists of other tasks and itself. Once the fully

independent task is removed from the compatible task lists, task τ1 is identified as a fully

dependent one and also excluded for the search. It is considered as a feasible solution without

any search. The remaining tasks except for the fully independent tasks and the fully dependent

tasks participate in search.

Tasks that can be effectively merged with other tasks are removed (i.e., tasks with greater

index are removed) from compatible task lists to avoid redundant traversal in search. For

instance, task τ7 has task τ6 in the compressed compatible task list of task τ7, but the task τ6 is

removed since the compatible task list of task τ6 includes task τ7. The third column in Figure 3-4

(b) shows the reduced compatible task lists after compression. The search is finally performed

with tasks τ3, τ6, τ7, τ8 based on the reduced compatible task lists. Through the search, two

solutions, {τ3, τ8} and {τ6, τ7, τ8}, are considered in addition to fully dependent tasks (e.g., τ1) as

feasible solutions.

47

Figure 3-3. Compression of assignment DAG: (a) Assignment DAG, (b) Compressed assignment DAG

Figure 3-4. Compression of compatible task lists: (a) Compatible task lists in a given assignment, (b) Compressed and reduced compatible task lists

3.2.4 Branch and Bound Search

The energy reduction of a task is defined by the difference of its original energy and its

energy expected after allocating a unitSlack to the task. A branch and bound algorithm is used to

search all the possible compatible solutions to determine the one that has the maximum energy

reduction. The feasible states in the state space consist of all the compatible subsets of tasks. We

(a) [2, 9, 10] 12

[2, 4, 8, 9, 10] 11 [1, 3, 4, 5, 6, 7, 8, 12]10 [1, 3, 4, 5, 6, 7, 8, 12] 9 [2, 3, 5, 6, 7, 9, 10, 11]8

[2, 4, 6, 8, 9, 10] 7 [2, 4, 7, 8, 9, 10] 6 [2, 4, 8, 9, 10] 5

[2, 3, 5, 6, 7, 9, 10, 11]4 [2, 4, 8, 9, 10] 3

[1, 3, 4, 5, 6, 7, 8, 12] 2 [2, 9, 10] 1

Compatible Tasks task

[ ] [2, 3, 6, 7] 8 [8] [2, 6, 8] 7

[7, 8] [2, 7, 8] 6 [8] [2, 8] 3

Allocate unitSlack[1, 3, 6, 7, 8] 2 Feasible solution [2] 1

Reduction Compression Compatible Tasks task

(b)

(a)

: start (dummy) node

: compressed task 1

3 4

5

86

2

9

10

7

11

12

S

1

3 8

6

2

7

S5

(b)

48

use a Depth First Search (DFS) to effectively search through all possible subset of compatible

tasks. The advantage of using a DFS is that it only stores one search path representing a

candidate task set which unitSlack can be allocated to during search. By maintaining a running

lower bound from the energy reduction of traversed search paths so far, we apply bounding

heuristics that eliminate search spaces where a better solution cannot be found.

At any given node of the state space tree, the set of possible search options is limited to the

list of available tasks corresponding to the intersection of all the list of tasks from the root to that

particular node. Each node in a search graph has its own explorable task list indicating tasks

which can be explored as child nodes of the node. The explorable task list of node νi including

task τk, explorableTaski, is defined by

⎩⎨⎧

≠→∩=→

=φ

φ

iikkparent

iikki , parentν τ,TaskcompatibleTaskexplorable

, parentν τ,TaskcompatibleTaskexplorable

i if

if

The cost of node νx, c(x), is defined by c(x) = f(x) + g(x), where f(x) is the sum of energy

reduction of tasks from the root to node νx and g(x) is the estimate on the sum of energy

reduction of tasks of child nodes from node νx. g(x) is obtained as the sum of energy reduction of

tasks in the explorable task list of the node and represents an upper bound to the amount of

energy reduction of tasks of child nodes. Thus, when exploring nodes in search, if c(x) is lower

than the lower bound, the node νx is pruned, otherwise, it is expanded. The cost value c(x) on leaf

node νx indicates the actual sum of energy reduction of tasks in the search path. If c(x) on leaf

node νx is greater than lower bound, the lower bound is updated as c(x) and the search path

becomes a candidate solution. The optimal task set over unitSlack is finally found. Figure 3-5

illustrates the reduction of compatible task list in Figure 3-2 and its application to explore a

search graph. Through the search, five solutions, {τ2, τ3}, {τ2, τ4}, {τ2, τ6}, {τ4, τ5}, and {τ5, τ6},

49

are considered in addition to fully dependent tasks (e.g., τ1, τ7) as feasible solutions which

unitSlack can be allocated to.

Figure 3-5. Reduced compatible task lists and search graph

3.2.5 Estimating the Lower Bound to Reduce the Search Space

Finding the set of tasks which makes the sum of energy reduction of tasks maximized by

allocating unitSlack can be referred to the maximum weighted independent set problem (MWIS).

The authors in [53] showed that simple greedy algorithms for the MWIS guarantee to find a task

set whose weight is at least ( ) ( )[ ]( )∑ ∈+

GVvvdvW 1/ where W(v) is the weight of vertex v in a

graph G and d(v) is the degree of vertex v. We modify the guaranteed minimum weight for

MWIS problem to apply it to our problem as an initial lower bound. The lower bound,

lowerbound, is initialized as follows:

( ) ( )∑Γ∈ +−Γ

=si i

si

TaskcompatibleNNer

lowerbound1

,

where Γs is the set of tasks participating in the search and N(Γs)is the number of tasks

participating in the search, N(compatibleTaski) is the number of tasks included in

compatibleTaski, and eri is the energy reduced by allocating unitSlack to task τi.

[ ] 6

[6] 5

[5] 4

[ ] 3

[3, 4, 6] 2

Compatible Tasks Task

6 6

2 5

3 4

[3, 4, 6]

[ ] [ ] [ ] [ ]

4

5 [ ]

[ 6 ] [5]

[5] ∩ [6]= [ ] [ 6 ] ∩ [ ]= [ ]

[3, 4, 6] [ ]= [ ] ∩

50

If the set of fully dependent tasks is nonempty, the lower bound is compared with the

energy reduction of each fully dependent task. In the example of Figure 3-2, before the search,

the lower bound is updated by the maximum energy reduction among fully dependent tasks τ1

and τ7 if the initial lower bound is lower. Then the fully dependent task (τ1 or τ7) with the

maximum energy reduction becomes a feasible solution for slack allocation. Furthermore, at

each iteration, unless the assignment-based dependency relationships among tasks are changed

from the previous step, the energy reduction of the solution of the previous step (i.e., the sum of

energy reduction of tasks which unitSlack is allocated to at the previous step) can be used as the

lower bound for the next unit slack allocation.

3.3 Experimental Results

We compare the performance of our DVS algorithm (i.e., PathDVS), DVS algorithm to

allocate slack to task(s) with the highest energy reduction in [48, 55] (i.e., EProfileDVS), and

greedy slack allocation based DVS algorithm in [13] (i.e., GreedyDVS). All the DVS algorithms

assume that the assignment of tasks to processor is already completed. The following two

different assignment strategies are used: ICP which assigns based on the earliest finish time

(presented in Chapter 5) and CPS which assigns based on the earliest possible start time [48]. We

also compare the performance of PathDVS and LPDVS, an extension to the formulation in [75]

to incorporate communication costs. PathDVS and LPDVS algorithms provide close to the

optimal solution and controlled by the size of the unitSlack and the number of intervals

respectively. The size of unitSlack and the number of intervals are also set to the best size and

length obtained empirically in this experiment. For LPDVS, CPLEX v.10.0 [14], was used to

solve the LP problem by using a piecewise linear function for convex objective function.

51

3.3.1 Simulation Methodology

In this section, we describe DAG generation and performance measure used in our

experiments.

3.3.1.1 The DAG generation

In order to show the performance of the proposed static slack allocation algorithm in both

heterogeneous and homogeneous environments, we randomly generated a large number of

synthetic graphs with 100, 200, 300, and 400 tasks. For heterogeneous systems, the execution

time of each task on each processor at the maximum voltage is varied from 10 to 40 units. The

communication time between a task and its child task for a pair of processors is varied from 1 to

4 units. For homogeneous systems, within the similar extent, the execution time of each task on

all processors at the maximum voltage is varied from 10 to 40 units and all of the communication

time among tasks on different processors is set to 2 units. The energy consumed to execute each

task is varied from 10 to 80. The execution of graphs is performed on 4, 8, and 16 processors.

For each combination of values of number of tasks and processors, 20 different synthetic graphs

are generated.

3.3.1.2 Performance measures

The performance is measured in terms of normalized total energy consumption, that is,

total energy normalized by the energy obtained from an assignment algorithm without a DVS

scheme. The deadline is determined by: deadline = (1 + deadline extension rate) * maximum

total finish time from assignments without DVS scheme. We provide experimental results for

deadline extension rate equal to 0.0 (no deadline extension), 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, and

0.4.

52

3.3.2 Memory Requirements

The size of the compatible task matrix is O(n2). Generally this matrix is sparse and can be

reduced into O(kn) using lists, where n is the number of tasks and k is the constant representing

the number of compatible tasks. At every level the list of explorable tasks of size bounded by

O(n) is stored, but its size becomes zero at the leaf node as reduced gradually at each level. Our

branch and bound method uses DFS and only stores one path whose length is the number of

tasks that can be allocated slack together and should be O(min(n,p)), where p is the number of

processors. Thus the number of variables stored during search is O(n) and the overall memory

requirement of our algorithm is O(kn+n) – it can be reduced by using search space reduction

techniques. The number of variables required for LPDVS is proportional to O(n * number of

intervals) and its memory requirement depends on the actual implementation of linear

programming. Using CPLEX on a machine with 2 Gigabyte memory, the maximum number of

tasks that LPDVS can reliably execute was around 200 for 0.4 deadline extension rate (i.e. 400

piecewise linear intervals per a task). Meanwhile, we were able to execute DAGs of size 1000

using PathDVS as shown in Figure 3-6.

8 Processors with 0.4 Deadline Extension Rate

0

50000

100000

150000

200000

250000

300000

350000

0 200 400 600 800 1000 1200

Number of Tasks

Run

time

Figure 3-6. Runtime of PathDVS with respect to different size of DAGs (unit: ms)

53

3.3.3 Determining the Size of Unit Slack and the Number of Intervals

Figure 3-7 shows the results of comparison of energy consumption of PathDVS with

respect to different sizes of unitSlack. The size of unitSlack is determined by the rate of total

finish time (i.e., unitSlack = totalFinishTime * unitSlackRate). The performance of our slack

allocation algorithm in terms of energy depends on the size of unitSlack. In general, the smaller

size of unitSlack leads more energy saving while it makes runtime increased. However, the size

of the unitSlack can be limited to a level where further reducing it does not significantly improve

energy requirements. Based on the results, the size of unitSlack corresponding to 0.0005 unit

slack rate does not give significant improvement on energy. While there is 7-10% improvement

of energy with 0.001 unitSlackRate over 0.01 unitSlackRate, there is less than 0.3% difference of

energy between 0.001 and 0.0005 unitSlackRates. Thus the size of unitSlack corresponding to

0.001 unitSlackRate is a reasonable choice.

100 Tasks

0

0.2

0.4

0.6

0.8

1

0.1 0.01 0.001 0.0005

Unit Slack Rate

Nor

mal

ized

Ene

rgy

0 deadline extension0.1 deadline extension0.2 deadline extension0.3 deadline extension0.4 deadline extension

200 Tasks

0

0.2

0.4

0.6

0.8

1

0.1 0.01 0.001 0.0005

Unit Slack Rate

Nor

mal

ized

Ene

rgy


Figure 3-7. Normalized energy consumption of PathDVS with respect to different unit slack rates

for different number of tasks: (a) 100 tasks and (b) 200 tasks

The authors in [24] suggest that the LP problem with a convex objective function and

linear constraints can be optimally solved using 8n intervals for the piecewise linear function that

approximates that the convex function, where n is the number of tasks. However, we found that

in practice, the smaller number of intervals is sufficient for our target applications. The total time

54

amount which will be divided by interval for the piecewise linear function for each task is equal

to the amount of total maximum available slack (i.e., deadline extension rate * total finish time +

slack available until total finish time or available slack based on minimum voltage) – further

dividing the time amount is unnecessary and requires more computational time. The total slack

available to each task can be approximately bounded by total available slack (i.e., deadline - total

finish time before slack allocation). The number of intervals is proportional to the deadline

extension rate divided by the interval rate (i.e., the number of intervals ∝ deadline extension rate

/ intervalRate). Figure 3-8 shows the result of comparison of energy consumption of LPDVS

with respect to different interval rates by which the objective function is divided. Based on the

results, the length of interval corresponding to 0.0005 intervalRate does not give significant

improvement on energy compared to 0.001 intervalRate. However, there is 2-8% improvement

of energy with 0.001 intervalRate over 0.01 intervalRate while there is 0.05% difference of

energy between 0.001 and 0.0005 intervalRates. Thus the length of interval corresponding to

0.001 intervalRate is a reasonable choice.

100 Tasks

0

0.2

0.4

0.6

0.8

1

0.1 0.01 0.001 0.0005

Interval Rate

Nor

mal

ized

Ene

rgy


200 Tasks

0

0.2

0.4

0.6

0.8

1

0.1 0.01 0.001 0.0005

Interval Rate

Nor

mal

ized

Ene

rgy


Figure 3-8. Normalized energy consumption of LPDVS with respect to different interval rates for

different number of tasks: (a) 100 tasks and (b) 200 tasks

55

3.3.4 Homogeneous Environments

In this section, we show the performance of the proposed static slack allocation algorithm

in homogeneous environments where the computation time of each task and the communication

time among tasks on all processors are same.

3.3.4.1 Comparison of energy requirements

Tables 4-1, 4-2, 4-3, and 4-4 show the improvement of PathDVS over EProfileDVS and

GreedyDVS for different number of processors with respect to different assignments and

different number of processors for different number of tasks in homogeneous environments.

PathDVS considerably outperforms other existing DVS algorithms regardless of using any

assignment algorithms. For instance, given ICP assignment, PathDVS improves by 12-29% over

EProfileDVS and 60-70% over GreedyDVS with 0.4 deadline extension rate. The results show

that the performance improvement of PathDVS is higher for larger number of processors.

Table 3-1. Results for 100 tasks in homogeneous environments: Improvement of PathDVS over

EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)

Deadline Extension Rate

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.26 1.16 1.82 3.55 5.86 8.10 9.03 9.81

ICP Greedy 2.18 5.96 9.34 17.69 28.19 42.41 51.90 59.08EProfile 0.08 1.02 1.88 3.67 5.53 7.60 9.02 9.94

4 Processors

CPS Greedy 2.19 6.08 9.53 18.04 28.49 42.60 52.24 59.41EProfile 1.00 2.59 3.97 7.24 11.24 16.06 18.97 20.64


8 Processors



16 Processors

CPS Greedy 6.41 13.27 18.81 30.61 42.62 56.23 64.34 70.06

56

Table 3-2. Results for 200 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)


0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.21 1.32 2.17 4.08 6.24 8.59 10.03 11.14


4 Processors



8 Processors



16 Processors

CPS Greedy 3.81 9.80 14.60 25.52 38.06 53.52 62.88 69.16

Table 3-3. Results for 300 tasks in homogeneous environments: Improvement of PathDVS over



0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.10 0.80 1.22 2.30 3.86 6.49 8.91 10.50


4 Processors



8 Processors



16 Processors

CPS Greedy 3.21 10.90 16.60 28.69 41.70 56.59 65.20 71.00

57

Table 3-4. Results for 400 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)


0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.09 1.20 2.05 4.40 7.25 11.30 14.59 16.66


4 Processors



8 Processors



16 Processors

CPS Greedy 1.76 10.39 16.26 28.75 42.03 57.09 65.76 71.52

Figure 3-9 shows the energy comparison of DVS algorithms (i.e., PathDVS, EProfileDVS,

GreedyDVS) using ICP for different number of tasks. The results show that the performance

improvement of PathDVS over the other DVS algorithms generally increases as the deadline

extension rate increases.

Table 3-5 shows the energy comparison between PathDVS and LPDVS in homogeneous

environments. Note that the comparison is limited to 200 tasks as this was the largest problem

that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS

and the intervalRate for LPDVS are set to 0.001. These results show that the two algorithms are

comparable in energy minimization or PathDVS is slightly better for most cases.

58

100 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy

GreedyDVSEProfileDVSPathDVS

200 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


300 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


400 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


Figure 3-9. Normalized energy consumption of slack allocation algorithms with respect to

different deadline extension rates for different number of tasks: (a) 100 tasks, (b) 200 tasks, (c) 300 tasks, and (d) 400 tasks

Table 3-5. Normalized energy consumption of PathDVS and LPDVS with respect to different

deadline extension rates in homogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)

100 Tasks 200 Tasks

LPDVS PathDVS Difference LPDVS PathDVS Difference 0 0.962454 0.962451 0.000003 0.978646 0.978653 -0.000007

0.01 0.921541 0.921532 0.000009 0.924348 0.924422 -0.000074 0.02 0.888233 0.888223 0.000010 0.883956 0.884003 -0.000047 0.05 0.809833 0.809764 0.000069 0.793064 0.793112 -0.000048 0.1 0.713714 0.713611 0.000103 0.686825 0.685985 0.00084 0.2 0.579983 0.579758 0.000225 0.547527 0.543398 0.004129 0.3 0.487378 0.48714 0.000238 0.455571 0.447699 0.007872 0.4 0.417437 0.417173 0.000264 0.388111 0.377237 0.010874

59

3.3.4.2 Comparison of time requirements

Table 3-6 and Figure 3-10 show the comparison of computational time for PathDVS and

LPDVS in homogeneous environments. The PathDVS requires less runtime because it

substantially reduces the search space by using compatible task lists, their compression, and the

lower bound. In particular, the time requirements of PathDVS are substantially smaller as the

deadline extension rate decreases (i.e., tight deadline) while it increases linearly as the deadline

extension rate increases due to the iterative search over unitSlack. For many practical real time

systems, the tight deadline is true. Based on the results shown in Table 3-6, for no deadline

extension (i.e., deadline extension rate equal to 0), the runtime of PathDVS is one to two orders

magnitude less than that of LPDVS.

Table 3-6. Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous

environments 100 Tasks 200 Tasks

4 Processors 61.97 210.32 8 Processors 19.46 52.74

100 Tasks

0

200

400

600

800

1000

0.01 0.02 0.05 0.1 0.2 0.3 0.4


Run

time

LPDVSPathDVS

200 Tasks

0

500

1000

1500

2000

2500

3000

0.01 0.02 0.05 0.1 0.2 0.3 0.4


Runt

ime

LPDVSPathDVS

Figure 3-10. Runtime to execute algorithms with respect to different deadline extension rates for

different number of tasks in homogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks

60

3.3.5 Heterogeneous Environments

In this section, we show the performance of the proposed static slack allocation algorithm

in heterogeneous environments where the computation time of each task and the communication

time among tasks are different on different processors.


Tables 4-7, 4-8, 4-9, and 4-10 show the improvement of PathDVS over EProfileDVS and

GreedyDVS for different number of processors with respect to different assignments (i.e., ICP

and CPS assignments) and different number of processors (i.e., 4, 8, and 16 processors) for

different number of tasks (i.e., 100, 200, 300, and 400 tasks) in heterogeneous environments.

Like in homogeneous environments, PathDVS considerably outperforms other existing DVS

algorithms regardless of using any assignment algorithms. For instance, given ICP assignment,

PathDVS improves by 7-36% over EProfileDVS and 80-93% over GreedyDVS with 0.4

deadline extension rate. The results also show that the performance improvement of PathDVS is

higher for larger number of processors and larger number of tasks. Figure 3-11 shows the energy

comparison of DVS algorithms (i.e., PathDVS, EProfileDVS, GreedyDVS) using ICP for

different number of tasks (i.e., 100, 200, 300, and 400 tasks). Based on the results, in general, the

performance improvement of PathDVS over the other DVS algorithms generally increases as the

deadline extension rate increases.

61

Table 3-7. Results for 100 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)


0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 6.70 6.74 6.78 6.88 6.91 7.05 7.12 7.22


4 Processors



8 Processors



16 Processors

CPS Greedy 13.04 17.37 21.38 30.40 40.71 53.99 62.40 68.33

Table 3-8. Results for 200 tasks in heterogeneous environments: Improvement of PathDVS over



0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 10.22 10.28 10.37 10.47 10.54 10.79 10.91 10.88


4 Processors



8 Processors



16 Processors

CPS Greedy 9.94 15.37 19.70 29.47 41.40 54.65 62.60 68.26

62

Table 3-9. Results for 300 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)


0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 7.75 7.79 7.86 7.93 8.07 8.22 8.31 8.34


4 Processors



8 Processors



16 Processors

CPS Greedy 10.19 16.75 21.91 33.27 45.46 59.14 66.96 72.21

Table 3-10. Results for 400 tasks in heterogeneous environments: Improvement of PathDVS

over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)


0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 9.30 9.37 9.45 9.53 9.68 9.90 9.97 10.01


4 Processors



8 Processors



16 Processors

CPS Greedy 7.83 14.56 20.18 32.27 44.82 58.80 66.82 72.20

63

100 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


200 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


300 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


400 Tasks

0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.05 0.1 0.2 0.3 0.4


Nor

mal

ized

Ene

rgy


Figure 3-11. Normalized energy consumption of slack allocation algorithms with respect to

different deadline extension rates for different number of tasks in heterogeneous environments: (a) 100 tasks, (b) 200 tasks, (c) 300 tasks, and (d) 400 tasks

Table 3-11. Normalized energy consumption of PathDVS and LPDVS with respect to different

deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)

100 Tasks 200 Tasks

PathDVS LPDVS Difference PathDVS LPDVS Difference 0 0.922500 0.922383 - 0.000116 0.947985 0.947781 -0.000203

0.01 0.885561 0.885098 -0.000462 0.906328 0.906165 -0.000162 0.02 0.851380 0.850993 -0.000386 0.870523 0.870373 -0.000149 0.05 0.770131 0.769853 -0.000277 0.785193 0.785067 -0.000126 0.1 0.671220 0.671066 -0.000154 0.681989 0.682629 0.000639 0.2 0.537683 0.537611 -0.000072 0.543238 0.545835 0.002597 0.3 0.447022 0.447104 0.000081 0.449777 0.453937 0.004160 0.4 0.379900 0.380161 0.000261 0.382132 0.385930 0.003798

64

Table 3-11 shows the energy comparison between PathDVS and LPDVS in heterogeneous

environments. Note that the comparison is limited to 200 tasks as this was the largest problem

that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS

and the intervalRate for LPDVS are set to 0.001. Like in homogeneous environments, these

results show that the two algorithms are comparable in energy minimization.


Table 3-12 and Figure 3-12 show the runtime comparison between PathDVS and LPDVS.

Like in homogeneous environments, PathDVS requires less runtime because it substantially

reduces the search space by using compatible task lists, their compression, and the lower bound.

In particular, the time requirements of PathDVS are substantially smaller as the deadline

extension rate decreases (i.e., tight deadline). For instance, the runtime ratio of LPDVS to

PathDVS is 56.49 for 200 tasks on 4 processors for no deadline extension.

Table 3-12. Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous

environments 100 Tasks 200 Tasks

4 Processors 37.38 56.49 8 Processors 13.27 12.22

100 Tasks

0

200

400

600

800

1000

0.01 0.02 0.05 0.1 0.2 0.3 0.4


Run

time

LPDVSPathDVS

200 Tasks

0

500

1000

1500

2000

2500

0.01 0.02 0.05 0.1 0.2 0.3 0.4


Run

time

LPDVSPathDVS

Figure 3-12. Runtime to execute algorithms with respect to different deadline extension rates for

different number of tasks in heterogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks

65

3.3.6 Effect of Search Space Reduction Techniques for PathDVS

The main factor that determines the cost of PathDVS algorithm is the size of search space.

In this section, we present the effect of search space reduction techniques introduced in this

paper (i.e., compression, compatible task matrix/lists, and lower bound). The experiments are

performed on 50 different synthetic graphs for each combination of values of number of tasks

and processors with 0.01 deadline extension rate. We present results the average values of the

different metrics for Phase 2 as it is considerably more computation intensive than Phase 1. The

cost of Phase 1 is small as the number of slack allocable tasks considered is smaller.

The size of search space depends on the depth of search tree and the number of tasks

participating in the search. The size of search space is O(n^d) where n is the number of total

tasks and d is the depth of search tree. By using compression, the size can be reduced to O(t^d)

where t is the number of tasks participating in the search.

Table 3-13 shows the average number of tasks after compression. Note that the

compression technique classifies tasks into three categories: fully independent tasks, fully

dependent tasks, and compressible tasks, and then makes only a representative for each

compressible task participate in the search. These results show that the compression methods

reduce the number of task significantly (58-94%) leading to a much smaller search space. The

amount of compression decreases as the number of processors increase. This is because the

amount of compression achieved is based on the assignment-based dependency relationship

among tasks in the assignment DAG (not the actual DAG). This relationship generally becomes

more complex with the increase of number of processors.

Table 3-14 shows the depth of search tree. Based on the results, the depth is proportional to

the number of processors (i.e., depth ≈ number of processors) and the size can be referred to as

66

O(t^p) where p is the number of processors. Thus, the maximum number of independent tasks

which unitSlack can be allocated together is approximately same to the number of processors.

Although the worst case size of search space is O(t^p), the use of compatible task

matrix/lists can lead to a substantially smaller number of tasks that are expanded (i.e., explorable

tasks) at each level. Furthermore, the maximum level that is searched is generally much smaller

than the depth. This makes the search space significantly smaller and is further reduced by the

use of branch and bound techniques. Table 3-15 shows the number of nodes explored in the

search. The number of nodes explored is considerably smaller than the total search space.

Table 3-13. Number of tasks participating in search with respect to different number of tasks and

processors Number of Tasks Number of Processors Number of Tasks Participating in Search

4 11.8 8 24.5 100 Tasks 16 42.4 4 12.1 8 24.8 200 Tasks 16 53.6

Table 3-14. Depth of search tree with respect to different number of tasks and processors

Number of Tasks Number of Processors Depth of Search Tree 4 4 8 8.2 100 Tasks 16 17.3 4 4 8 7.9 200 Tasks 16 17.4

67

Table 3-15. Number of nodes explored in search with respect to different number of tasks and processors

Number of Tasks Number of Processors Number of Node Explored in Search 4 22 8 1114 100 Tasks 16 141342 4 27 8 1000 200 Tasks 16 415924

68

CHAPTER 4 DYNAMIC SLACK ALLOCATION

Static scheduling algorithms for DAG execution use the estimated execution time. The

estimated execution time (ET) of tasks may be different from their actual execution time (AET)

at runtime. We divide the dynamic environments into two broad categories based on whether the

actual execution time is less than or more than the estimated time: overestimation (AET < ET)

and underestimation (AET > ET).

For most real time applications, an upper worst case bound on the actual execution time

(i.e., worst case execution time) of each task is used to guarantee that the application completes

in a given time bound. This corresponds to overestimation of actual execution time. Therefore,

many tasks may complete earlier than expected during the actual execution. This allows for

assignment-based dependent tasks to potentially start earlier than what was envisioned during the

static scheduling. The extra available slack can then be allocated to tasks that have not yet begun

execution with the goal of reducing the total energy requirements while still meeting the deadline

constraints.

For many applications that do not use the worst case time for estimation, historical data is

used to estimate the time requirements of each task and the estimated execution time may be less

than the actual execution time. This corresponds to underestimation of actual execution time. In

this case, for tasks where the time is underestimated, many future tasks may complete later than

expected during the actual execution. Thus, it cannot be guaranteed that the deadline constraints

will be always satisfied. However, slack can be removed from future tasks with the hope of

satisfying the deadline constraints as closely as possible.

A simple option for adjusting slack at runtime is to reapply the static slack allocation

algorithms for the unexecuted tasks when a task finishes early or late. However, the time

69

requirements of static algorithms applied at runtime are generally large and they may not be

practical for many runtime scenarios. We explore novel dynamic (or runtime) algorithms for

achieving these goals.

In this chapter, we present novel dynamic algorithms that lead to good performance in

terms of both computational time (i.e., runtime overhead) and energy requirements. The main

intuition behind our methods is that the slack allocation can be restricted to a small subset of

tasks so that the static slack allocation algorithms can be applied to a small subset rather than all

the tasks. There are three main contributions of our methods:

• They require significantly less computational time (i.e., runtime overhead) than applying the static algorithm at runtime for every instance when a task finishes early or late.

• The performance in terms of reducing energy and/or meeting a given deadline is comparable to applying the static algorithm at runtime.

• They are effective for cases when the estimated execution time of tasks is underestimated or overestimated.

4.1 Proposed Dynamic Slack Allocation

We assume that a static algorithm has already been applied before executing tasks and the

schedule needs to be adjusted whenever a task finishes early or late. The dynamic slack

allocation algorithm reallocates the slack whenever a task finishes earlier or later than expected

based on the current schedule. The current schedule is initialized to the static schedule and

updated whenever dynamic slack allocation is applied from the occurrence of early or late

finished tasks at runtime. Our algorithms do not change the assignment of tasks to the

processors.

The requirements of dynamic slack allocation algorithm depend on whether the execution

time is overestimated (AET < ET) or underestimated (AET > ET).

70

• Overestimation: The extra slack can be potentially allocated to tasks that are not yet executed. Here the goal of dynamic slack allocation algorithms is to reduce energy while still meeting deadline constraints.

• Underestimation: In this case, the primary goal of dynamic slack allocation algorithms is to reduce the slack of future tasks to try to complete the DAG within the deadline constraints or as closely as possible to the deadline. A secondary goal is to minimize the energy requirements.

Although our approach can be used in a mixed environment (i.e., an environment where

some tasks are underestimated and some tasks are overestimated), the main motivation is to

support an environment where the estimated execution time of tasks is mostly overestimated or

underestimated. The main focus for the case of underestimated tasks is to meet deadline, while

for overestimated tasks is to minimize energy.

The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks

for which the schedule will be readjusted. The schedule for the remaining tasks (i.e., tasks not

selected for the slack reallocation) is not affected. There are two steps that need to be addressed.

First, select the subset of tasks for slack reallocation. The potentially rescheduled tasks via the

dynamic slack allocation algorithm are tasks which have not yet started when the algorithm is

applied. We assume that the voltage can be selected before a task starts executing. The dynamic

slack allocation (i.e., rescheduling) is applied to the subset of tasks that depends on the

algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead

of reallocating the slack during runtime. Clearly, this should be done so that the other goal of

energy reduction is also met simultaneously. Second, determine the time range for the selected

tasks: The time range of the selected tasks has to be changed as some of the tasks have

completed earlier or later than expected. Based on the computation time in the current schedule

and assignment-based dependency relationships among tasks, we recompute the time range (i.e.,

earliest start time and latest finish time) where the selected tasks should be executed. Slack has to

71

be allocated to the selected tasks within this time range in order to try to meet the deadline

constraints.

At this stage a static slack allocation approach is applied to the subset of tasks within the

time range as described above. It is worth noting that the dynamic slack allocation algorithms

presented in this section are independent of the static scheduling algorithms. Once the tasks and

their constraints are determined, any static scheduling algorithm can be potentially used at

runtime. We have used the methods providing near optimal solutions (i.e., LP based approach,

Path based approach as described in Chapter 3) for this purpose. The computational overhead is

kept small due to the limited number of tasks selected for slack reallocation.

Before applying the dynamic slack allocation, the computation time of each selected task is

set to its estimated execution time used in the assignment algorithm (before any static slack

allocation) for calculating the slack during dynamic slack allocation. The slack is recalculated for

the selected tasks (ignoring the slack that was allocated during the static scheme). This will, in

general, lead to better energy requirements as considering the change of assignment-based

dependency relationships among tasks from the early finished task. It is based on the fact that the

slack allocation by considering assignment-based dependency relationships among tasks leads to

better performance in terms of reducing energy.

4.1.1 Choosing a Subset of Tasks for Slack Reallocation

The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks

for which the schedule will be readjusted. The schedule for the remaining tasks (i.e., tasks not

selected for the slack reallocation) is not affected. Figure 4-1 shows the subset of tasks for slack

reallocation in an assignment DAG when task τ2 finishes early or late based on two dynamic

slack allocation algorithms that reallocate slack: k time lookahead approach and k descendent

72

lookahead approach. These approaches are described in detail in the next subsections. Note that

the assignment DAG may be changed based on the change of assignment-based direct

dependency relationships due to the slack reallocation and the early or late finished tasks.

4.1.1.1 Greedy approach

In greedy approach, only the assignment-based direct successors of the early or late

finished task are considered for readjusting the schedule. In the example shown in Figure 4-1,

only the direct successors of task τ2, e.g., tasks τ4 and τ5, are considered for slack allocation. The

greedy approach uses slack forwarding [51], which allocates slack to a direct successor of the

early or late finished task on the same processor. We extend the greedy approach in [51] by

considering all assignment-based direct successors for slack allocation on any processors. This

extension is expected to make more energy reduced compared to allocating slack to a single task.

4.1.1.2 The k time lookahead approach

Using k time lookahead approach, all tasks within a limited range of time are considered

for readjusting the schedule. The range of time is limited based on the value of k (i.e., k *

maximum computation time of tasks). The maximum computation time is defined as the

computation time of the task that takes the maximum time. In the example shown in Figure 4-1,

assume that the computation time of each task is one unit, the communication time among tasks

is zero, and the tasks in the same depth finish at the same time for ease of presentation of the key

concepts. In this case, if k is equal to 2, the time range would be 2 units (2 * one unit) and then

tasks within the time range from the finish of task τ2, e.g., τ4, τ5, τ6, τ7, τ8, τ9, and τ10, are

considered. The set of tasks selected for the slack reallocation when task τl finishes early is

defined by

73

s.t. where

},max

lll

jΓτliliiallocation

estaticFTimftimeτ

compTimek*fimeime, staticFTftimeme|staticSTi{τΓj

≠

+≤≥=∈

where staticSTimei is the start time of task τi in the static or previous schedule, staticFTimei is the

finish time of task τi in the static or previous schedule, ftimel is the actual finish time of task τl at

runtime, and compTimej is the computation time of task τj on its assigned processor, a.k.a., the

estimated execution time at the maximum voltage.

The approach with ‘all’ option for k (i.e., k-all time lookahead approach) corresponds to

the static slack allocation approach without the limitation on the time range for tasks considered

for rescheduling. Thus the k-all time lookahead approach is same as applying the static slack

allocation to all the remaining tasks at runtime. One would expect this to be close to the best that

can be achieved, particularly when applying near optimal static slack allocation algorithms (i.e.,

LP based approach, Path based approach) as described in Chapter 3. The set of tasks selected for

the slack reallocation when task τl finishes early is defined by

lllliiallocation estaticFTimftimeτftimeestaticSTimΓ ≠≥= s.t. where},|{τ

4.1.1.3 The k descendent lookahead approach

Unlike the k time lookahead approach, the k descendent lookahead approach considers

only tasks whose schedules are directly influenced by the early or late finished task. The main

intuition is that limiting the tasks to direct descendants will reduce scheduling time requirements

and also lead to good performance in terms of energy as keeping the schedule for uninfluenced

tasks or indirectly influenced tasks. Specifically, the k-th assignment-based direct successors of

the early or late finished task are considered. The number of tasks considered for readjusting the

schedule is limited with the value of k. Only descendants that are at a distance up to k are

considered. In the example of Figure 4-1, using the descendent lookahead approach that k is

74

equal to 2, the considered tasks are direct assignment-based successors of task τ2, e.g., tasks τ4

and τ5, and their direct successors, e.g., tasks τ7, τ8, and τ9 . However, task τ9 will not be

allocated slack because of no available slack for the task due to the direct dependency of task τ6.

The approach with ‘all’ option for k (i.e., k-all descendent lookahead approach) corresponds to

setting k equal to the remaining depth. The set of tasks selected for the slack reallocation is

defined by

stepfirst after the step previous at the generated step,first at the s.t. where

},|{

allocationl

lll

liiallocation

ΓτestaticFTimftimeτ

assgnSuccΓ

∈≠

∈= ττ

where assgnSuccl is the set of assignment-based direct successors of task τl.

Figure 4-1. Tasks selected for slack reallocation in an assignment DAG depending on dynamic slack allocation algorithms

1

2

4

3

5 6

7 8 9

11

10

1

2 3

6

7 8 9

11

10

k-2 Time Lookahead

k-2 Descendent Lookahead

k-all Descendent Lookahead

4 5

k-all Time Lookahead (Static DVS applied at runtime)

Greedy

75

4.1.2 Time Range for Selected Tasks

The static schedule (or the previous schedule updated at runtime) for tasks not in the set of

slack reallocable tasks (i.e., the set of selected tasks for slack reallocation) is kept to be the same.

For the set of slack reallocable tasks, the following changes are made before applying algorithms

for slack reallocation: computation time, start time, finish time (or deadline) of tasks.

First, the minimum computation time of a task is set to its estimated time at the maximum

voltage (i.e., staticCTimei = compTimei where τi ∈ Γallocation. Here staticCTimei is the computation

time of task τi in the static or previous schedule generated by the last slack reallocation). This is

the same time that was used during static assignment process. This effectively ensures that

maximum flexibility is available for slack reallocation. For instance, for tasks τ5 and τ8 in Figure

4-2 (c), their computation time is changed into their own estimated computation time before

applying runtime algorithm. However, their computation time in Figure 4-2 (d) is not changed

since it depends on whether or not they are slack reallocable tasks. Tasks in light grey colored

boxes indicate slack reallocable tasks.

Next, the start time of the tasks is changed as flexibly as possible to meet the deadline

constraints as well as the finish times of assignment-based predecessors of each task. Note that

the finish time of the predecessors that have already completed or are not part of the selected

tasks is fixed. In a case of overestimation, the selected tasks for slack reallocation may start

earlier than the current scheduled time. For instance, in Figure 4-2 (c), due to the early finish of

task τ1, task τ3 and task τ4 can start early, but task τ5 cannot start early because of the

assignment-based direct dependency relationship with task τ2. Meanwhile, in a case of

underestimation, the selected tasks for slack reallocation may have to start later than the current

scheduled time. For instance, in Figure 4-3 (c), due to the late finish of task τ1, tasks τ3 and τ4

76

should start late, but task τ5 can still start early because it is not directly influenced by the late

finished task τ1.

Finally, the finish time (or deadlines) of the tasks is changed so that they can be completed

as late as possible while ensuring that deadline constraints are (as closely) met. The successors of

a task that is not part of the selected tasks are based on the current schedule (i.e., task τ7 in Figure

4-2 (d) and Figure 4-3 (d)). In a case of overestimation, the deadlines for the selected tasks keep

their scheduled finish time. For instance, it is acceptable if slack reallocable tasks τ6, τ7, and τ8

finish no later than their finish time in the static schedule depicted in Figure 4-2 (a). Meanwhile,

in a case of underestimation, the deadlines for the selected tasks may be pushed back to ensure

that each task can complete at maximum voltage. For instance, the deadline of task τ7 has to be

increased as there is no slack in τ4. The deadlines of other tasks that can complete their execution

before their scheduled finish time (i.e., task τ6) are not changed since changing their deadlines

into the maximum finish time (i.e., finish time of task τ7) may negatively impact the remaining

tasks.

Figure 4-2 and Figure 4-3 illustrate the application of the above constraints both for k time

lookahead approach and k descendent lookahead approach, for the cases of overestimation and

underestimation respectively. The dotted box shows the range of time consisting of the start time

and the finish time (or deadline) for slack reallocable tasks which are considered for slack

reallocation at runtime. For edges among tasks, the solid line represents an assignment-based

direct dependency relationship among the tasks while the dotted line represents an assignment-

based indirect dependency relationship among the tasks.

Using the above constraints, each slack allocable task has different amount of the

maximum available slack for reallocation. The actual slack is computed to be within the time

77

range for slack reallocable tasks. The maximum available slack of slack reallocable task τi,

slacki, is defined by the difference of the latest start time of task τi, LSTi, and the earliest start

time of task τi, ESTi. The latest start time of task τi, the earliest start time of task τi, and the

maximum available slack of task τi are computed as follows, respectively:

( ) iijjsuccpSuccii estaticCTimcommTimeLSTLSTdeadlineLSTij

i−⎟

⎠⎞

⎜⎝⎛ −=

∈τmin,,min

( )( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

++

+=

∈ ijjjpred

pPredpPredi

i commTimeestaticCTimEST

estaticCTimESTstartEST

ij

ii

τmax

,,max

iii ESTLSTslack −=

where deadlinei is the deadline of task τi, starti is the start time of task τi, succi is the set of direct

successors of task τi in a DAG, pSucci is the task assigned next to task τi on the same assigned

processor, predi is the set of direct predecessors of task τi in a DAG, pPredi is the task assigned

prior to task τi on the same assigned processor, commTimeij is the communication time between

task τi and task τj on their assigned processors, and staticCTimei is the computation time of task

τi in the static or previous schedule generated by the last slack reallocation. Here the earliest start

time and the latest start time of a task not included in the set of slack reallocable tasks are equal

to its start time based on completed (i.e., callocationjjjj ΓτwhereestaticSTimLSTEST ∈== , Here

staticSTimej is the start time of task τi in the static or previous schedule generated by the last

slack reallocation).

Once the time range is determined for slack reallocable tasks, the slack is reallocated to

appropriate tasks by using a slack allocation approach in order to minimize total energy

requirements and then the schedule is updated.

78

Figure 4-2. Overestimation: Time range for selected slack allocable tasks using k-time lookahead

approach and k-descendent lookahead approach: (a) Initial static schedule, (b) Schedule from the early finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach

Figure 4-3. Underestimation: Time range for selected slack allocable tasks using k-time

lookahead approach and k-descendent lookahead approach: (a) Initial static schedule, (b) Schedule from the late finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach

1 2

3 4

6 7

5

9

8

deadline

1 2

3 4

6

7

5

9

8

1 2

3 4

6 7

5

9

8

1 2

3 4

6

7

5

9

8

(d) (b) (c) (a)

slack reallocable task late finished task

start

(d) (b) (c) (a)

1 2

3 4

6 7

5

9

8

1 2

3 4

6

7

5

9

8

1 2

3 4

6

7

5

9

8

slack reallocable task early finished task

deadline

2

3 4

6

7

5

9

8

1start

79


In this section, we compare the performance of various dynamic slack allocation

algorithms (i.e., k-Descendent, k-Time}, and Greedy) and compare them to applying static slack

allocation in dynamic environments.

Each dynamic algorithm is applied to a static schedule given through a known assignment

algorithm which assigns based on the early finish time and a static slack allocation algorithm

(i.e., LPDVS, PathDVS). Our previous experiments in Chapter 3 show that the energy

minimization of LPDVS is comparable to PathDVS while its time requirement is higher. To

distinguish PathDVS that is used to generate a static schedule, we call PathDVS applied at

runtime as dPathDVS. The size of unit slack for PathDVS and dPathDVS is set to (0.001 * finish

time of a DAG) based on empirical results for static slack as described in Chapter 3.


In this section, we describe DAG generation, dynamic environments generation, and

performance measure used in our experiments.


We randomly generated a large number of graphs with 100 and 200 tasks. Since the results

for heterogeneous environments are similar to those for homogeneous environments, we present

only the results for the latter. The execution time of each task is varied from 10 to 40 units and

the communication time among tasks is set to 2 units. The execution of graphs is performed on 4,

8, and 16 processors.

4.2.1.2 Dynamic environments generation

We simulated a number of dynamic cases to study the effectiveness of our algorithms.

Here are some of the important parameters that can be varied to create dynamic cases for

overestimation and underestimation respectively:

80

Overestimation

• The fraction of tasks that finish earlier than expected (i.e., tasks with AET < ET) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyFinishedTaskRate * total number of tasks).

• The fractional difference between actual execution time and estimated time for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecreaseRate * estimated execution time).

Underestimation

• The fraction of tasks that finish later than expected (i.e., tasks with AET > ET) is given by the lateFinishedTaskRate (i.e., number of late finished tasks = lateFinishedTaskRate * total number of tasks).

• The fractional difference between actual execution time and estimated time for each task that finishes late is given by timeIncreaseRate (i.e., amount of increase = timeIncreaseRate * estimated execution time).

To generate cases with overestimation, we experimented with earlyFinishedTaskRate’s equal to

0.2, 0.4, 0.6, and 0.8 and timeDecreaseRate's equal to 0.1, 0.2, 0.3, and 0.4. To generate cases

with underestimation, we experimented with lateFinishedTaskRate’s equal to 0.2, 0.4, 0.6, and

0.8 and timeIncreaseRate’s equal to 0.05, 0.1, 0.15, and 0.2.

The deadline is determined by: deadline = (1 + deadline extension rate) * total finish time

from assignments without DVS scheme. The deadline corresponds to the time requirements of an

execution schedule that minimizes execution time for a given set of processors. This represents

the overall slack that is available for allocation. We experimented with deadline extension rates

equal to 0.0 (no extension), 0.01, 0.02, 0.05, 0.1, and 0.2.


An important measure is the amount of computational time (i.e., runtime overhead)

required to readjust the schedule when the execution time is less than or greater than estimated

time. The followings are other important measures for cases with overestimation and

underestimation.

81

For the case of overestimation, normalized energy consumption is measured. This is

computed as the total energy required for completing the DAG by the total energy for

completing the DAG assuming static slack allocation (i.e. all tasks completing in exactly their

estimated time). A lower value of the normalized energy consumption is desirable.

And, for the case of underestimation, deadline miss ratio and energy increase ratio are

measured. When the tasks take more time than the estimated time, the overall execution time

may be more than the deadline. The deadline miss ratio measures the difference between the

actual execution time and the deadline normalized by the deadline. A lower value of the deadline

miss ratio is desirable. A value equal to zero implies that the deadline was not missed. And, the

energy increase ratio is computed as the increase in total energy required for completing the

DAG by the total energy for completing the DAG assuming static slack allocation (i.e. all tasks

completing in exactly their estimated time). A lower value of the energy increase ratio is

desirable.

4.2.2 Overestimation

In this section, we show the performance of our algorithms in the case that the execution

time of tasks is overestimated (i.e., the actual execution time of a task is less than its estimated

time).


We first compared k-all descendent algorithm with Greedy and dPathDVS algorithms.

Figure 4-4 shows the normalized energy requirements of kallDescendent, Greedy, and

dPathDVS algorithms with respect to different time decrease rates and different early finished

task rates for no deadline extension (i.e., deadline extension rate equal to zero). The results show

that the energy requirements of kallDescendent are significantly better than the greedy approach.

For instance, for timeDecreaseRate equal to 0.4, kallDescendent reduces energy by 17% and

82

29% as compared to Greedy algorithm with 0.2 and 0.8 earlyFinishedTaskRate’s, respectively.

Most importantly, the energy requirements vis-a-vis dPathDVS are within 1% in almost all

cases. The time requirement of kallDescendent is one to two orders of magnitude smaller than

dPathDVS as shown in Figure 4-11. These results demonstrate the subset of tasks that comprise

only the descendants can be used for slack allocation to simultaneously reduce time requirements

while keeping the energy requirements to be comparable to using static scheduling algorithms at

runtime.

0.2 Early Finished Task Rate

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy

GreedydPathDVSkallDescendent


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy


Figure 4-4. Normalized energy consumption of Greedy, dPathDVS, and kallDescendent with

respect to different early finished task rates and time decrease rates for no deadline extension

Table 4-1 shows the energy comparison of our proposed algorithms, k time lookahead (i.e,

kTime) and k descendent lookahead (i.e., kDescendent) algorithms, with variable k values for

each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent).

83

These results show that the energy requirements of k3Time or k6Descendent are comparable

with those of kallDescendent. The difference between k6Descendent (or k3Time) and

kallDescendent is within 1-5% of each other. While kallDescendent is better than k3Time and

k6Descendent when the fraction of early finished tasks is small, k6Descendent and k3Time are

better when the fraction of early finished tasks is large.

Table 4-1. Normalized energy consumption of k time lookahead and k descendent lookahead

algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension

Early Finished

Task Rate

Time Decrease

Rate

k2 Time

k3 Time

k4 Descendent

k6 Descendent

kall Descendent

0.1 0.9425 0.9367 0.9379 0.9334 0.9207

0.2 0.9108 0.9008 0.9024 0.8952 0.8753

0.3 0.8866 0.8721 0.8738 0.8639 0.8372 0.2

0.4 0.8701 0.8506 0.8515 0.8393 0.8069

0.1 0.8899 0.8826 0.8845 0.8800 0.8780

0.2 0.8307 0.8194 0.8223 0.8164 0.8153

0.3 0.7857 0.7696 0.7730 0.7657 0.7660 0.4

0.4 0.7527 0.7312 0.7348 0.7266 0.7276

0.1 0.8481 0.8426 0.8420 0.8404 0.8424

0.2 0.7699 0.7621 0.7610 0.7604 0.7657

0.3 0.7092 0.6984 0.6969 0.6974 0.7061 0.6

0.4 0.6647 0.6497 0.6480 0.6492 0.6606

0.1 0.8070 0.8023 0.8007 0.8002 0.8071

0.2 0.7111 0.7057 0.7029 0.7041 0.7186

0.3 0.6430 0.6355 0.6319 0.6343 0.6548 0.8

0.4 0.5890 0.5782 0.5744 0.5781 0.6042

84

Figures 4-5, 4-6, 4-7, 4-8, 4-9, and 4-10 show the energy requirements of our proposed

dynamic slack allocation algorithms, k time lookahead (i.e., kTime) and k descendent (i.e.,

kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is equal to 2

and 3 for kTime, and 4, 6, and all option for kDescendent), greedy algorithm (i.e., Greedy), and

static slack allocation applied at runtime (i.e., dPathDVS), for no deadline extension, 0.01, 0.02,

0.05, 0.1, and 0.2 deadline extension rates, respectively. The results are very similar with ones

for no deadline extensions as described in the above.


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg

y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg


Figure 4-5. Normalized energy consumption for no deadline extension

85


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg


Figure 4-6. Normalized energy consumption for 0.01 deadline extension rate


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



86


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg




0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



87


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Norm

aliz

ed E

nerg




Figure 4-11 shows the average time requirements to readjust the schedule due to a single

task's early finish. The computational time of k6Descendent is roughly an order of magnitude

lower than kallDescendent and 3-4 times lower than k3Time. Based on the time and energy

comparisons described above, k6Descendent provides reasonable performance in energy

requirements at substantially lower overheads.

88

10000

100000

1000000

10000000

100000000

1000000000

0.1 0.2 0.3 0.4Time Decrease Rate

Com

puta

tiona

l Tim

e GreedydPathDVSk2Timek3Timek4Descendentk6DescendentkallDescendent

Figure 4-11. Computational time to readjust the schedule from an early finished task with respect to different time decrease rates for no deadline extension (unit: ns - via logarithmic scale)

Figure 4-12 shows the time requirements to readjust the schedule due to a single task’s

early finish with respect to different time decrease rates for different deadline extension rates

(i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results

are very similar with ones for no deadline extension as described in the above.

89

(a) 0.01 Deadline Extension Rate

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim

e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time

(b) 0.02 Deadline Extension Rate

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim


(c) 0.05 Deadline Extension Rate

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim


(d) 0.1 Deadline Extension Rate

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim


(e) 0.2 Deadline Extension Rate

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim


Figure 4-12. Results for variable deadline extension rates: Computational time to readjust the schedule from one early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale): (a) for 0.01 deadline extension rate, (b) for 0.02 deadline extension rate, (c) for 0.05 deadline extension rate, (d) for 0.1 deadline extension rate, and (e) for 0.2 deadline extension rate

4.2.3 Underestimation

In this section, we show the performance of our algorithms in the case that the execution

time of tasks is underestimated (i.e., the actual execution time of a task is greater than its

estimated time).

90

4.2.3.1 Comparison of deadline requirements

We first compared k-all descendent algorithm with Greedy and dPathDVS algorithms. The

results in Figure 4-13 show that kallDescendent is significantly better than the greedy approach

in terms of being able to maintain the deadline requirements. Most importantly, the deadline

missed ratio vis-a-vis dPathDVS was within 0.1% in most cases. The time requirement of

kallDescendent is one to two orders of magnitude smaller than dPathDVS as shosn in Figure 4-

27. These results demonstrate the subset of tasks that comprise only the descendants can be used

for slack allocation to simultaneously reduce time requirements while meeting the deadline as

closely as the static algorithms (executed at runtime).

0.2 Late Finished Task Rate

0

0.03

0.06

0.09

0.12

0.15

0.05 0.1 0.15 0.2Time Increase Rate

Dea

dlin

e M

iss

Rat

io

No SchemeGreedydPathDVSkallDescendent


0

0.03

0.06

0.09

0.12

0.15


Dea

dlin

e M

iss

Rat

io



0

0.03

0.06

0.09

0.12

0.15


Dea

dlin

e M

iss

Rat

io



0

0.03

0.06

0.09

0.12

0.15


Dea

dlin

e M

iss

Rat

io


Figure 4-13. Deadline miss ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate

91

Table 4-2 shows the deadline miss ratio of our proposed algorithms, k time lookahead (i.e,

kTime) and k descendent lookahead (i.e., kDescendent) algorithms, with variable k values for

each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent).

These results show that the deadline miss ratios of k3Time and k6Descendent are comparable

with that of kallDescendent.

Table 4-2. Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished task rates and time increase rates for 0.05 deadline extension rate

Late Finished

Task Rate

Time Increase

Rate

k2 Time

k3 Time

k4 Descendent

k6 Descendent

kall Descendent

0.05 0.001 0.000 0.001 0.000 0.000

0.1 0.004 0.002 0.004 0.002 0.000

0.15 0.010 0.007 0.010 0.006 0.0010.2

0.2 0.018 0.013 0.018 0.013 0.003

0.05 0.000 0.000 0.000 0.000 0.000

0.1 0.003 0.001 0.003 0.002 0.001

0.15 0.010 0.008 0.010 0.009 0.0060.4

0.2 0.022 0.020 0.022 0.020 0.016

0.05 0.003 0.003 0.003 0.003 0.003

0.1 0.012 0.012 0.012 0.012 0.013

0.15 0.027 0.027 0.027 0.028 0.0290.6

0.2 0.050 0.051 0.051 0.051 0.052

0.05 0.010 0.009 0.010 0.010 0.010

0.1 0.033 0.033 0.033 0.034 0.035

0.15 0.061 0.062 0.062 0.062 0.0640.8

0.2 0.100 0.100 0.100 0.100 0.101

92

Figures 4-14, 4-15, 4-16, 4-17, 4-18, and 4-19 show the deadline miss ratio of our

proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent

(i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is

equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), static scheduling without

any change at runtime (i.e., NoScheme), greedy algorithm (i.e., Greedy), and static slack

allocation applied at runtime (i.e., dPathDVS), with respect to different time increase rates and

different early finished task rates, for no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2

deadline extension rates, respectively. The results are very similar with ones for 0.05 deadline

extension rate described in the above.


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio

NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio


Figure 4-14. Deadline miss ratio for no deadline extension

93


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio


Figure 4-15. Deadline miss ratio for 0.01 deadline extension rate


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



94


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio




0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s Ra

tio



95


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.05 0.1 0.15 0.2

Time Increase Rate

Dead

line

Mis

s R

atio




Figure 4-20 shows the energy increase ratio for the three algorithms: dPathDVS,

kallDescendent, and k6Descendent. The deadline extension rate is set to 0.05 (this corresponds to

the case when the amount of slack is small and has to the potential of a large number of deadline

misses). The three algorithms were found to be comparable for the amount of energy increase. In

general, the k-6 descendent lookahead algorithm is better in terms of energy when larger number

of tasks finish late while the static algorithm applied at runtime is better when smaller number of

tasks finish late.

96


0

0.05

0.1

0.15

0.2

0.25


Ener

gy In

crea

se R

atio

dPathDVSkallDescendentk6Descendent


0

0.05

0.1

0.15

0.2

0.25


Ener

gy In

crea

se R

atio



0

0.05

0.1

0.15

0.2

0.25


Ener

gy In

crea

se R

atio



0

0.05

0.1

0.15

0.2

0.25


Ener

gy In

crea

se R

atio


Figure 4-20. Energy increase ratio with respect to different time increase rates and late finished

task rates for 0.05 deadline extension rate

Figures 4-21, 4-22, 4-23, 4-24, 4-25, and 4-26 show the energy increase ratio of our

proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent

(i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is

equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), greedy algorithm (i.e.,

Greedy), and static slack allocation applied at runtime (i.e., dPathDVS), with respect to different

time increase rates and different early finished task rates, for no deadline extension, 0.01, 0.02,

0.05, 0.1, and 0.2 deadline extension rates, respectively. The results are very similar with ones

for 0.05 deadline extension rate as described in the above.

97


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat

io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time


0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat


Figure 4-21. Energy increase ratio for no deadline extension


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS

kallDescendentk4Descendentk6Descendentk2Timek3Time


0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS


Figure 4-22. Energy increase ratio for 0.01 deadline extension rate

98


0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat




0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.05

0.1

0.15

0.2

0.25

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.05

0.1

0.15

0.2

0.25

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.05

0.1

0.15

0.2

0.25

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



99


0

0.02

0.04

0.06

0.08

0.1

0.12

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.05 0.1 0.15

Time Increase Rate

Ene

rgy

Incr

ease

Rat



0

0.05

0.1

0.15

0.2

0.25

0.3

0.05 0.1 0.15 0.2

Time Increase Rate

Ene

rgy

Incr

ease

Rat




0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.05

0.1

0.15

0.2

0.25

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



0

0.05

0.1

0.15

0.2

0.25

0.3

0.05 0.1 0.15 0.2

Time Increase Rate

Ener

gy In

crea

se R

atio dPathDVS



100


Figure 4-27 shows the average time requirements to readjust the schedule per task that is

underestimated. The computational time of k6Descendent is roughly an order of magnitude

lower than kallDescendent and 3-4 times lower than k3Time. Based on the time, deadline miss

ratio, and energy increase ratio comparisons described above, k6Descendent provides reasonable

performance in deadline satisfaction and energy requirements at substantially lower overheads.

1000

10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim


c

Figure 4-27. Computational time to readjust the schedule from a late finished task with respect to

different time increase rates for no deadline extension (unit: ns - via logarithmic scale)

Figure 4-28 shows the time requirements to readjust the schedule due to a single task’s

early finish with respect to different time decrease rates for different deadline extension rates

(i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results

are very similar with ones for 0.05 deadline extension rate as described in the above.

101

(a) No Deadline Extension

10000

100000

1000000

10000000

100000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim



10000

100000

1000000

10000000

100000000

1000000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim



10000

100000

1000000

10000000

100000000

1000000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim



10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim



10000

100000

1000000

10000000

100000000

1000000000

10000000000

0.05 0.1 0.15 0.2

Time Increase Rate

Com

puta

tiona

l Tim


Figure 4-28. Results for variable deadline extension rates: Computational time to readjust the

schedule from one late finished task with respect to different time decrease rates (unit: ns – via logarithmic scale): (a) for 0.0 deadline extension rate (no deadline extension) (b) for 0.01 deadline extension rate, (c) for 0.02 deadline extension rate, (d) for 0.1 deadline extension rate, and (e) for 0.2 deadline extension rate

102

CHAPTER 5 STATIC ASSIGNMENT

As presented in Chapter 1, the following two step processes are generally used for

scheduling tasks with the goal of energy minimization while still meeting deadline constratins:

assignment and then slack allocation. In this chapter, we explore the assignment process at

compile time (i.e., static assignment) which determines the ordering to execute tasks and the

mapping of tasks to processors based on the computation time at the maximum voltage level.

Note that the finish time of DAG at the maximum voltage has to be less than or equal to the

deadline for any feasible schedule.

Most of the prior research on scheduling for energy minimization of DAGs on parallel

machines is based on deriving an assignment schedule that minimizes total finish time in the

assignment step. Simple list based scheduling algorithms are generally used for this purpose.

This may be a reasonable approach as minimizing finish time generally leads to more slack to be

allocated and finally reducing the energy requirements during the slack allocation step. However,

this approach is not enough to minimize total energy consumption because it cannot incorporate

the differential energy and time requirements of each task of the workflow on different

processors.

For the first step, we present a novel algorithm that has lower finish time compared to

existing algorithms for a heterogeneous environment. We show that the extra slack that this

algorithm generates can lead to overall reduction in energy after slack allocation as compared to

existing algorithms.

The main thrust of this chapter is to show that incorporating energy minimization during

the assignment process can lead to even better results. Genetic Algorithm (GA) based scheduling

algorithms [56, 57] have tried to partially address this issue by searching through a large number

103

of assignments. This approach was shown to outperform existing algorithms in terms of energy

consumption based on their experimental results. However, the assignment itself does not

consider the energy consumption after slack allocation during assignment. Furthermore, the

testing of energy requirements of multiple solutions each corresponding to a different assignment

requires considerable computational time. We present novel algorithms which can achieve

assignments with better energy requirements at lower computational times as compared to the

Genetic Algorithm based methods.

5.1 Overall Scheduling Process

In this section, we present the overall process of our proposed scheduling approach. A high

level description of our proposed scheduling approach is illustrated in Figure 5-1.

In the first step, tasks are assigned to processors with the goal of minimizing total finish

time of a DAG to derive a Baseline Assignment. This is done for two reasons:

• Check whether the deadline constraints can be met. Note that if the deadline is shorter than the finish time of the DAG, the DAG cannot be feasibly executed in the required time.

• Generate an initial time based task prioritization to determine the scheduling order of tasks. This “minimizing time” based prioritization is called Baseline Prioritization for the rest of this chapter.

If a feasible assignment is derived, then a DVS based slack allocation scheme is applied

with the goal of energy minimization. This energy serves as Baseline Energy. In general, having

a lower finish time can lead to a larger amount of slack that can be allocated to the appropriate

tasks during the slack allocation step. This can lead to a large reduction in energy requirements

as compared to algorithms that have a larger finish time. However, incorporating DVS based

energy minimization during the assignment process can provide better solutions for energy

minimization. Thus our goal is to derive assignments that have better energy requirements than

the baseline energy.

104

The baseline prioritization along with the energy requirements of each task is used to

generate multiple prioritizations. Each prioritization is based on a parameter α that weighs the

importance of time versus energy for the assignment. For each such prioritization, a time

minimization assignment algorithm is applied to minimize total finish time. Note that if the

finish time for a given prioritization is larger than the deadline constraint, it cannot be feasibly

executed in the required time constraint and the prioritization is abandoned.

For all the feasible prioritizations, the following steps are applied:

• Step 1: An estimated deadline is assigned to each task. This estimated deadline is based on the criticality of the task in the schedule in order to meet the deadline constraints.

• Step 2: An assignment for the estimated DVS based energy minimization is now applied such that the estimated deadline constraints defined in step 1 are generally met.

If the above provides a feasible assignment (i.e., the one whose finish time is less than or

equal to the deadline), a DVS based slack allocation scheme is applied to minimize energy. The

estimated deadline assigned to each task in step 1 as described above is parameterized based on a

parameter β. Higher value of β allows for potentially lower energy requirements as providing

higher flexibility for processor selection, but a higher probability of deriving an assignment that

does not meet the deadline constraints. The above steps are executed for each value of the

parameter β each potentially resulting in a different assignment. The feasible assignment with the

least energy for different values of α and β is chosen.

105

Figure 5-1. A high level description of proposed scheduling approach

Assignment to minimize finish time


Time based task prioritization

Energy based task prioritization

Task prioritization

Weighting factor on time, α

(a)

Feasible task prioritization

Feasible solution?



Time based task prioritization

Energy based task prioritization

Task prioritization

Weighting factor on time, α

(a)


Feasible solution?

Task deadline based on the assignment to minimize time

DVS

Assignment to minimize energy

Get a schedule with the minimum energy

Weighting factor on latest finish time, β

Feasible solution?

(b)


Task deadline based on the assignment to minimize time

DVS

Assignment to minimize energy

Get a schedule with the minimum energy

Weighting factor on latest finish time, β

Feasible solution?

(b)


106

It is worth noting that the above methodology is independent of the time minimization

assignment algorithm and the DVS scheme for slack allocation. As the time minimization

assignment algorithm, we have used ICP based assignment (which will be presented in the next

section) as it is shown to have superior performance over prior algorithms. Also, as the DVS

scheme, we have chosen PathDVS (presented in Chapter 3) which provides near optimal

solutions for slack allocation with smaller computational time requirements.

5.2 Proposed Static Assignment to Minimize Finish Time

Several scheduling algorithms for generating assignment that minimizes the finish time of

DAGs for a heterogeneous environment have been recently proposed [44, 62, 64]. Most of them

are based on static list scheduling heuristics to minimize the finish time of DAGs, for example,

Dynamic Level Scheduling (DLS) [62], Heterogeneous Earliest Finish Time (HEFT) [64], and

Iterative List Scheduling (ILS) [44]. The DLS algorithm selects a task to schedule and a

processor where the task will be executed at each step using earliest task first. The HEFT

algorithm reduces the cost of scheduling by using pre-calculated priorities of tasks in scheduling

and uses the earliest finish time for the selection of a processor. This can in general provide

better performance as compared to the DLS algorithm. However, since the HEFT uses the

average of computation time across all the processors for a given task to determine tasks’

priorities, it may lead an inaccurate ordering for executing tasks. To address the problem, the ILS

algorithm generates an initial schedule by using HEFT and iteratively improves it by updating

priorities of tasks.

Our approach is based on the fact that task prioritization can be improved by using a group

based approach. There are two main features of the proposed assignment algorithm, called

Iterative Critical Path (ICP). First, assign multiple independent ready tasks simultaneously. The

computation of priority of a task depends on estimating the execution path from this task to the

107

last task of the DAG representing the workflow. Since the mapping of tasks yet to be scheduled

is unknown and the cost of task execution depends on the processor that is assigned, the priority

has to be approximated during scheduling. Hence, it is difficult to explicitly distinguish the

execution order of tasks with similar priorities. Using this intuition, the proposed algorithm

forms independent ready tasks whose priorities are similar into a group and finds an optimal

solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready

tasks that can be assigned consists of tasks for which all the predecessors have already been

assigned. Second, iteratively refine the scheduling. The scheduling is iteratively refined by using

the cost of the critical path based on the assignment generated in the previous iteration.

Assuming that the mappings of the previous iteration are good, it provides a better estimate of

the cost of the critical path than using the average or median computation and communication

time as the estimate in the first iteration.

5.2.1 Task Selection

To determine the scheduling order of tasks in our algorithm, the priority of each task is

computed using its critical path, which is the length of the longest path from the task to an exit

task. The critical path of each task is computed by traversing the graph from an exit task. The

critical path of task τi, cpi, is defined by

( )jijsuccii cpeavgCommTimeavgCompTimcpij

++=∈τ

max

where avgCompTimei is the average computation time of task τi, avgCommTimeij is the average

communication time between task τi and task τj, and succi is the set of direct successors of task τi

in a DAG.

Using the critical path of each task, the tasks are sorted into non-increasing order of their

critical path values. The composition of this generated task ordering list preserves the original

108

precedence constraints among tasks of the given DAG. During the assignment process, at each

step a list of ready tasks is used for the next set of tasks that can be assigned. The list of ready

tasks consists of tasks for which all the predecessors have already been assigned. ICP finds a

subset of these ready tasks whose values of critical paths are similar among ready tasks for

resource assignment, but which have no precedence relationships with each other. In other

words, the subset is composed of independent tasks whose predecessors are all assigned to

processors. The size of the selected subset is bounded by a pre-specified threshold value.

The average values of computation time and communication time are used at the initial

step. After the first assignment, the actual computation time and communication time based on

the previous assignment are used for the computation of critical path.

5.2.2 Processor Selection

ICP optimally assigns multiple independent ready tasks in the previous steps

simultaneously on the available processors. For a list of independent ready tasks, ICP finds the

best processor for each task in the list such that the total finish time of the selected subset of

tasks is minimized (i.e., Option 1) or the sum of finish time on processors is minimized (i.e.,

Option 2). For the goal to reduce the finish time of a DAG, Option 1 is to apply the goal directly

and Option 2 is to increase the possibility of minimizing the finish times of next tasks as leaving

more space for next tasks. Any of both methods and their combination can be applied for the

processor selection.

The optimal solution for the processor selection with the selected subset of tasks is

generated using ILP (Integer Linear Programming) formulation. The formulation for Options 1 is

as follows:

109

( )⎟⎟⎠

⎞

⎜⎜

⎝

⎛

++=

∈∈∀

∈ )(,,,)(,max,

max subject to

,max Minimize

kpkjikpkpred

ij

ijij

js

iij

commTimeftimeimeavailableT

compTimeftime

Ρ, pΓτfitme

ikτ

Here, ftimeij is the finish time of task τi on processor pj, Γs is the subset of ready tasks, P is the

set of processors, compTimeij is the computation time of task τi on processor pj, p(k) is the

processor where task τk is assigned, commTimei,j,k,p(k) is the communication time between task τi

on processor pj and task τk on processor p(k), and predi is the set of direct predecessors of task τi

in a DAG. The available start time of task τi from the free slot of processor pj is represented by

availableTimeij.

In the case of Option 2, only the objective function is changed from Option 1 and the

constraints are same with Option 1. The formulation for Options 2 is as follows:

( )⎟⎟⎠

⎞

⎜⎜

⎝

⎛

++=

∈∈∀

∈

∑

)(,,,)(,max,

max subject to

, Minimize

kpkjikpkpred

ij

ijij

js

iij

commTimeftimeimeavailableT

compTimeftime

Ρ, pΓτfitme

ikτ

We found that the schedules generated with either of these two options were comparable.

Thus, in the following, we limit ourselves to Option 1.

5.2.3 Iterative Scheduling

The ICP assignment method is based on an iterative scheduling in order to provide a better

estimate of the cost of the critical path. Figure 5-2 presents a high level description of the ICP

assignment procedure.

110

Figure 5-2. The ICP procedure

In the first iteration, the estimation of the critical path is based on average computation

time across all processors for the tasks yet to be scheduled. This can result in inaccuracies in

estimating the critical path. To reduce or eliminate the possibility of inappropriate assignment

due to an inaccurate critical path estimate, ICP iteratively reschedules tasks using a critical path

Initialize 1. minFinishTime = maxValue 2. Compute the average of computation time and communication time for each task 3. Compute the critical path value for each task based on the average values Procedure ICP 4. While there is a continuous improvement of performance do 5. Generate the list of tasks, Γ, sorted by non-increasing order of the critical path values 6. While the list of tasks is not empty do 7. Find tasks τi∈ Γ whose priorities are close, where τi ∉ succk and τk∈ Γs 8. Insert them into the list of ready tasks, Γs 9. Assign the ready tasks based on ILP formulation 10. If the finish time of each assigned task >= minFinishTime then 11. Break 12. End If 13. Delete tasks in Γs from Γ and empty Γs, Γs = {} 14. End While 15. If total finish time is less than minFinishTime then 16. Update minFinishTime 17. Assign each task τi ∈ Γ to its selected processor 18. End If 19. Compute the critical path based on the current assignment 20. If the times that total finish time is not improved over a current threshold is greater than k times or the critical path is same with one of previous assignment then 21. Change the number of ready tasks, threshold 22. End If 23. End While End Procedure

111

which is determined based on the assignment from the previous iteration of the scheduling

algorithm. In other words, the critical path of each task depends on the previous assignment (i.e.,

The computation time of each task for the computation of critical path is its computation time on

its assigned processor, not average computation time across all processors and also the

communication time among tasks is also based on the specified value based on their assigned

processors). This iterative refinement continues till the total finish time does not decrease or the

prespecified number of iteration times is completed. The value of the threshold for the subset of

tasks starts with a fixed value and is decremented by one if no reduction in finish time (i.e.,

schedule length) is seen after a few iterations. The change of threshold value increases the

possibility to improve the performance in terms of finish time.

5.3 Proposed Static Assignment to Minimize Energy

As described earlier, the prior research on scheduling for energy minimization has

concentrated on the slack allocation step to minimize the energy requirements during a given

phase while using simple list based scheduling approaches to minimize total finish time for the

assignment step. Unlike these methods, our proposed assignment algorithm considers the energy

requirements based on potential slack during the assignment step.

The main features of our assignment algorithm are as follows. First, utilize expected DVS

based energy information during assignment. Our algorithm assigns the appropriate processor for

each task such that the total energy expected after slack allocation is minimized. The expected

energy after slack allocation (i.e., expected DVS based energy) for each task is computed by

using the estimated deadline for each task so that the overall DAG can be executed within the

deadline of the DAG. Second, consider multiple task prioritizations. We test multiple

assignments using multiple task prioritizations based on tradeoffs between energy and time for

112

each task. The execution of these assignments can be potentially done in parallel to minimize the

computational time (i.e., runtime to execute algorithm).

The details on task prioritization, estimated deadline for each task, and processor selection

for our assignment algorithm to minimize DVS based energy are described in the subsequent

subsections.

5.3.1 Task Prioritization

In the time minimization assignment methods, the priorities of tasks which are used to

determine the scheduling order of tasks are based on only using time information without paying

any attention to energy requirements. The task prioritization in our algorithm is based on a

weighted sum between the time and energy requirements. After applying an assignment

algorithm to minimize finish time (i.e., baseline assignment), the baseline prioritization (i.e., time

based prioritization) is generated and used to determine the task prioritization for reapplying an

assignment algorithm. Appropriate choice of weight provides tradeoffs between energy and

deadline constraints.

To compute the time based priority of each task (i.e., baseline prioritization), we use its

critical path which is the length of the longest path from the task to an exit task. The critical path

of each task is computed in the same way with one for ICP assignment presented in the previous

section. The composition of the task ordering list generated based on the critical path of tasks

preserves the original precedence constraints among tasks of the given DAG. The critical path of

task τi, cpi, is defined by

( )jijsuccii cpcommTimecompTimecpij

++=∈τ

max

where compTimei is the computation time of task τi, commTimeij is the communication time

between task τi and task τj, and succi is the set of direct successors of task τi in a DAG.

113

Given the baseline prioritization, the priority of each task used in our algorithm is

recomputed by incorporating the energy information. The priority of task τi, priorityi, is defined

by

( ) 10 where,/1/ ≤≤×−+×= ∑Γ∈

ααατ k

kiii energyenergyCPcppriority

where CP is the critical path of a DAG (i.e., total finish time of a DAG), cpi is the critical path of

task τi, energyi is the energy consumed to execute task τi, α is the weight of time, and Γ is the

set of all tasks in a DAG.

If the weighting factor α is closer to zero, the task which requires the higher energy to be

executed is assigned to the appropriate processor with the higher priority than any other tasks

with the lower energy consumption. It is expected to lead to better performance in terms of

energy. However, due to the ignorance of time information, the finish time of the DAG may be

larger and even the deadline constraints may not be satisfied. If the weighting factor α is closer

to one, the probability of a feasible assignment of the DAG is higher, but the lack of

consideration on energy information may lead to lower energy performance.

The above prioritization is modified to accommodate the precedence relationships among

tasks during assignment, i.e., a successor task is always assigned after its predecessor tasks. For

instance of Figure 1-1, assume that the ordering of tasks based on the priority values is

τ5−τ3−τ1−τ4−τ2−τ6−τ7. Due to the precedence relationships among tasks, the actual execution

ordering for assignment is changed into τ1−τ3−τ2−τ5−τ4−τ6−τ7. Tasks τ, τ3, and τ2 precede task

τ5 although their priorities are lower and also task τ2 precedes task τ4 to execute task τ5 ahead of

task τ4 based on their priorities.

114

5.3.2 Estimated Deadline for a Task

The goal of the assignment is to minimize the expected total energy consumption after

slack allocation while still satisfying deadline constraints. Consider a scenario where the

assignment of a subset of tasks has been already completed and a given next task in the

prioritization list has to be assigned. The choice of the processors that can be assigned to this task

should be limited to the ones where expected finish time from the overall assignment will lead to

meeting the deadline constraints (else this will result in an infeasible assignment). Clearly, there

is no guarantee that the schedule derived will be is a feasible schedule (i.e., the schedule meeting

deadline) at the time when the assignment for a given task is being determined because the

feasibility of the schedule depends on the assignment of the other remaining tasks whose

assignment is not determined.

The proposed algorithm calculates the estimated deadline for each task, that is, deadline

expected to enable a feasible schedule if the task’s finish time satisfies its estimated deadline.

The estimated deadline of a task is an interpolated value between the earliest finish time to the

latest finish time using a weighting factor β. The latest finish time of task τi, LFTi, its earliest

finish time, EFTi, its estimated deadline, di, are respectively defined by

( )( )⎟

⎟

⎠

⎞

⎜⎜

⎝

⎛

−−

−=

∈ ijjjsucc

pSuccpSucci

i commTimecompTimeLFT

compTimeLFTdeadlineLFT

ij

ii

τmin

, ,min

( ) iijjpredpPredii compTimecommTimeEFTEFTstartEFTij

i+⎟

⎠⎞

⎜⎝⎛ +=

∈τmax,,max

( ) 10 where,1 ≤≤×−+×= βββ iii EFTLFTd

where deadlinei is the deadline of task τi, starti is the start time of task τi, compTimei is the

computation time of task τi on its assigned processor, commTimeij is the communication time

115

between task τi and task τj on their assigned processors, succi is the set of direct successors of

task τi in a DAG, pSucci is the task put next to task τi on the same assigned processor, predi is the

set of direct predecessors of task τi in a DAG, pPredi is the task put prior to task τi on the same

assigned processor, and β is the weight of the latest finish time.

If the weighting factor β is closer to one, the task is allowed more flexibility for processor

assignment as the task can take a longer time to complete. However, the probability of feasible

assignment of the DAG may be lower. If the weighting factor β is closer to zero, there is less

flexibility in assigning the task to a processor. However, the probability of a feasible assignment

of the DAG is higher. Also, as this potentially generates more slack after assignment, the slack

can be allocated by the DVS algorithm for energy minimization.


Figure 5-3 presents a high level description of the assignment procedure for a given task

prioritization. The task is assigned to a processor such that the total energy consumption

expected after applying DVS scheme for the tasks that have already been assigned so far (and

including the new task that is being considered for assignment) is minimized while trying to

meet estimated deadline of the task. The candidate processors for the task are selected such that

the task can execute within its estimated deadline. Once selecting the candidate processors for

the task, the next process is followed depending on the following conditions:

First, if no processor is available to satisfy the estimated deadline for the task, the

processor with the earliest finish time is selected. It is possible that it later becomes a feasible

schedule as the assignment is based on the estimated times for future tasks whose assignment is

yet to be determined. When the task finishes within the range of its earliest finish time and its

latest finish time, we assume that the deadline of a DAG can be met with a high probability. By

116

selecting a processor where the task finishes earlier, the chance to meet deadline becomes

increased.

Second, if there is only one candidate processor that meets the above constraint, the task is

assigned to that processor. It is also in order to increase the chance to meet deadline constraints.

Finally, if there are more than one candidate processors that meet the above constraint, a

processor is selected such that the total energy expected after slack allocation is minimized. The

expected total energy is the sum of expected energy of already assigned tasks and the task

considered for assignment. For the computation of the expected energy for a given processor

assignment in this step a faster heuristic based strategy (as compared to PathDVS which provides

nearly optimal solutions) is used. This procedure is described in the next subsection.

The above selection process is iteratively performed until all tasks are assigned. However, if the

finish time of a task exceeds the deadline, the process stops.

5.3.3.1 Greedy approach for the computation of expected energy

The unit slack allocation used in PathDVS algorithm (described in Chapter 3) finds the

subset of tasks which maximally reduces the total energy consumption. This corresponds to the

maximum weighted independent set (MWIS) problem [7, 53, 65]. This is computationally

intensive. Our approach requires the use of a DVS scheme during the assignment of each task in

order to compute expected DVS based energy to select the best processor in the processor

selection step. This is an intermediate step where exact energy estimates are not as important as

in the slack allocation step. To reduce the time requirements of the optimal branch and bound

strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS

problem [53] can be used while providing good estimates of energy. The greedy algorithm in our

approach is as follows:

117

• Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already assigned tasks and a task considered for assignment)

• Select a task with the maximum energy reduction among the independent tasks of the previously selected task

• Iteratively select a task until there is no independent task of the selected tasks

The above greedy approach for unit slack allocation is iteratively performed until there is no

slack or no task for slack allocation under the estimated deadline constraints. In the proposed

greedy approach, the independent tasks can be easily identified using compatible task matrix or

lists which represent the list of tasks which can share unit slack together for each task or vice

versa like in PathDVS.

Figure 5-3. The DVSbasedAssignment procedure

Procedure DVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task τi can execute within its estimated deadline di Condition 1: If there is no processor 4.1. If the finish time of the task τi > deadline 4.2. Stop the procedure 4.3. Else 4.4. Select a processor such that the finish time of the task τi is minimized 4.5. End If Condition 2: If there is only one processor 4.1. Select the processor for the task τi Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem for the task τi and the already assigned tasks 4.2. Select a processor such that the total energy is minimized 5. End For End Procedure

118

5.3.3.2 Example for assignment

In the following, we briefly describe the benefit of considering DVS based expected

energy for tasks during the assignment process by a simple example. Figure 5-4 (a) and (b) show

a DAG with 4 tasks and the execution time and the energy consumption for each task on each

processor at the maximum voltage level. There is large variation in the energy requirements of

the task (This was done mainly to keep the example simple in terms of the number of tasks). An

assignment that minimizes total finish time is presented in Figure 5-4 (c). The total finish time is

7. The corresponding energy consumption before slack allocation is 27.

The time based task prioritization corresponding to this assignment is as follows:

τ1−τ2−τ3−τ4. Consider that the deadline to complete the execution of the DAG is 9. The

prioritization is obviously feasible since the finish time is less than this deadline. The estimated

deadline for each task is determined using the assignment shown in Figure 5-4 (c). The estimated

deadlines for tasks τ1, τ2, τ3, and τ4 are 4, 6, 7, and 9 respectively based on weighting factor of

latest finish β equal to one.

The proposed assignment method to minimize energy is now applied (Note that the energy

model follows a quadratic function and the unit slack is one unit). In the following, we show the

assignment process based on the above prioritization order.

First, consider task τ1. If task τ1 is assigned to processor p1, there is estimated slack of two

units since its finish time is 2 and its estimated deadline is 4. After slack is allocated to the task,

the energy consumption is 0.25. If the task is assigned to processor p2, the expected energy is 2.5

after allocating the estimated slack of two units to the task. Thus, the task τ is assigned to

processor p1.

119

Second, consider task τ2. If this task is assigned to processor p1, the estimated slack is two

units. The entire slack can be allocated to task τ1or τ2, or a slack of one unit is allocated to tasks

τ and τ2 respectively. The better solution is to allocate the whole slack to task τ2. Then the total

energy for tasks τ1 and τ2 based on this assignment is 2.25. If this task is assigned to processor

p2, there is no estimated slack (since the estimated deadline for task τ2 is 6). However, the total

expected energy based on this assignment is 2, so the task τ2 is assigned to processor p2.

Next, consider task τ3. Processor p2 is not considered for task τ3 because the finish time of

task τ3 on processor p2 exceeds its estimated deadline. Therefore, the task τ3 is assigned to

processor p1.

Finally, consider task τ4. If this task is assigned to processor p1, the estimated slack is three

units. In this case, the entire slack is allocated to task τ3 and then the total expected energy is 7.2.

If this task is assigned to processor p2, then a slack of two units is allocated to task τ3 and a slack

of one unit is allocated to task τ2 and the total expected energy is 7.6. Thus, the task τ4 should be

assigned to processor p1 even though its energy requirements on processor p2 (i.e., energy before

slack allocation) is less than that on processor p1.

Figure 5-4 (d) shows the assignment to minimize DVS based energy. Here the total finish

time is 9 and the total energy consumption before slack allocation is 24.

Once the assignment is completed, a slack allocation algorithm is applied to minimize the

total energy requirements. Let us now compare the two assignments of Figure 5-4 (c) and (d)

after the slack allocation. The assignment in Figure 5-4 (c) (i.e. assignment that minimizes finish

time), a slack of two units is allocated to tasks τ2 and τ3 resulting in the total energy is 8.25. For

the assignment in Figure 5-4 (d) (i.e., assignment that minimizes energy), the total energy after

slack allocation is 7.2 - this corresponds that the slack of three units is allocated to task τ3. This

120

represents a 12.7% improvement in overall energy requirements. The algorithm was able to

achieve this improvement by focusing the potential slack on task 3 which had higher energy

requirements.

Figure 5-4. Example of assignment to minimize finish time and assignment to minimize DVS based energy: (a) DAG, (b) Execution time and energy information for each task on two processors, (c) Assignment to minimize finish time, (d) Assignment to minimize DVS based energy (i.e., our assignment)

5.4 Experimental Results for Assignment Algorithms that Minimize Finish Time

In this section, we present comparisons of our algorithm with algorithms that minimize

total finish time followed by slack allocation. We compare the performance of the combination

with ILS [44] and HEFT [64]. The latter two algorithms have been shown to be superior to

existing algorithms for minimizing time for heterogeneous environments. We combined these

algorithms with three DVS algorithms, PathDVS which was presented in Chapter 3,

EProfileDVS [48, 55], and GreedyDVS [13] in order to see if the DVS algorithm makes a

difference in the relative comparison of the three assignment algorithms. The size of unit slack

for PathDVS (i.e., unitSlack) is set to the best size obtained empirically in the experiments shown

in Chapter 3: unitSlack is equal to 0.001 * total finish time.

1

20

1

10

P2

2

20

5

1

P1

Energy

2

2

3

2

P2

Time

2

2

2

2

P1

4

3

2

1

Task

1

2 3

4

(a) (b)

1

20

1

10

P2

2

20

5

1

P1

Energy

2

2

3

2

P2

Time

2

2

2

2

P1

4

3

2

1

Task

1

2 3

4

(a) (b)

0 1 2 3 4 5 6 7 8 9

P1 1

2

3 4

P2

(d)

0 1 2 3 4 5 6 7 8 9

P1 1

2

3 4

P2

0 1 2 3 4 5 6 7 8 9

P1 1

2

3 4

P2

(d)

0 1 2 3 4 5 6 7 8 9

P1 1

3

2

4P2

(c)

0 1 2 3 4 5 6 7 8 9

P1 1

3

2

4P2

0 1 2 3 4 5 6 7 8 9

P1 1

3

2

4P2

(c)

121



experiments.


We randomly generated a large number of graphs with 50 and 100 tasks. The execution

time of each task on each processor at the maximum voltage is varied from 10 to 40 units (given

that we are targeting a heterogeneous environment) and the communication time between a task

and its child task for a pair of processors is varied from 1 to 4 units. The energy consumed to

execute each task on each processor is varied from 10 to 80. The execution of graphs is

performed on 4, 8, and 16 processors. For each combination of values of number of tasks and

processors, 20 different synthetic graphs are generated.


We used total finish time and improvement in total energy consumption for comparing the

different algorithms. The deadline extension rate is the fraction of the total finish time that is

added to the deadline (i.e., deadline = (1+deadline extension rate) * maximum total finish time

from assignments before applying DVS). We provide experimental results for deadline extension

rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6, 0.8, and 1.0. The total iteration times and

the iteration times of unimproved state for the same threshold for ICP are set to 10 and 3 and the

threshold varies 1 to 4.

5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms

We compared our algorithm, ICP, with ILS [44] and HEFT [64] which outperform any

other existing algorithms in terms of total finish time. They are compared in terms of total finish

time and total energy consumption after applying slack allocation in order to show the

relationship between minimizing finish time and minimizing energy consumption.

122

A comparison of the three different algorithms shows that ICP was slightly better than ILS

and considerably better than HEFT in terms of total finish time. The average total finish time of

ICP is reduced by 3.95% and 9.31% compared to ILS and HEFT respectively.

Tables 5-1, 5-2, 5-3, 5-4, 5-5, and 5-6 show the improvement of ICP-PathDVS over the

remaining three assignment algorithms (i.e., ICP, ILS, and HEFT) and using three DVS

algorithms (i.e., EProfileDVS, GreedyDVS, and PathDVS) in terms of energy consumption with

respect to different deadline extension rates for each combinations of 50 and 100 tasks on 4, 8,

and 16 processors, respectively. Based on the results, our assignment algorithm, ICP, leads to

lower energy requirements as compared to other assignment algorithms regardless of any DVS

algorithms. For instance, using PathDVS algorithm, the energy on ICP assignment reduces by

11-14% over ILS and 13-17% over HEFT. We believe the main reason is that having a lower

finish time leads to a large amount of slack that can be allocated optimally to the appropriate

tasks during the slack allocation step. This leads to a large reduction in energy requirements as

compared to an algorithm that has a larger finish time.

The results also show that the performance of PathDVS (which is presented in Chapter 3)

outperforms compared to any other DVS algorithms regardless of using any assignment

algorithms in terms of minimizing energy. For instance, given ICP assignment, PathDVS

improves by 4-18% over EProfileDVS and 19-84% over Greedy depending on the values of

deadline extension rate.

Finally, the combination of ICP and PathDVS outperforms compared to any other

combinations. For instance, the combined effects of ICP along with PathDVS provide an

improvement of 13-26% over the combination of ILS and EProfileDVS.

123

Table 5-1. Results for 50 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage)


0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 2.83% 5.97% 6.75% 7.08% 7.31% 7.36%ICP

GreedyDVS 19.82% 47.24% 61.90% 71.08% 77.29% 81.70%

PathDVS 12.15% 11.68% 11.93% 12.10% 12.28% 12.33%

EProfileDVS 13.86% 16.05% 16.81% 17.14% 17.34% 17.38%ILS

GreedyDVS 24.98% 50.41% 64.19% 72.83% 78.66% 82.80%

PathDVS 21.80% 17.94% 17.88% 17.98% 18.14% 18.19%

EProfileDVS 21.88% 21.83% 22.25% 22.42% 22.62% 22.65%HEFT

GreedyDVS 26.00% 50.08% 63.93% 72.63% 78.51% 82.68%

Table 5-2. Results for 50 tasks and 8 processors: Improvement of ICP-PathDVS in terms of

energy consumption with respect to different deadline extension rates (unit: percentage)


0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 3.72% 10.08% 12.08% 13.14% 13.94% 14.20%ICP

GreedyDVS 20.40% 49.71% 64.46% 73.37% 79.29% 83.40%

PathDVS 12.20% 11.97% 12.80% 13.52% 14.17% 14.49%

EProfileDVS 14.82% 20.55% 22.29% 23.36% 23.99% 24.31%ILS

GreedyDVS 26.64% 53.44% 67.10% 75.36% 80.85% 84.65%

PathDVS 20.64% 17.64% 17.75% 18.26% 18.84% 19.11%


GreedyDVS 27.09% 52.62% 66.47% 74.87% 80.46% 84.34%

124



0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 5.04% 11.73% 13.00% 13.93% 14.35% 14.60%ICP

GreedyDVS 20.99% 49.48% 63.85% 72.81% 78.80% 83.07%

PathDVS 13.96% 12.44% 12.40% 12.91% 13.20% 13.43%

EProfileDVS 16.26% 22.29% 23.60% 24.45% 24.88% 25.18%ILS

GreedyDVS 24.92% 51.16% 64.97% 73.66% 79.46% 83.60%

PathDVS 17.44% 14.93% 14.59% 14.89% 15.08% 15.24%


GreedyDVS 25.46% 50.97% 64.74% 73.45% 79.29% 83.45%


energy consumption with respect to different deadline extension rates (unit: percentage)


0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 2.92% 7.31% 9.18% 10.81% 11.48% 11.97%ICP

GreedyDVS 16.33% 47.45% 62.65% 72.04% 78.15% 82.46%

PathDVS 9.16% 8.40% 9.16% 10.29% 10.71% 11.13%

EProfileDVS 10.63% 14.09% 15.70% 17.15% 17.82% 18.30%ILS

GreedyDVS 19.35% 49.22% 63.90% 72.99% 78.89% 83.06%

PathDVS 17.11% 13.48% 13.15% 14.15% 14.28% 14.57%


GreedyDVS 19.61% 48.82% 63.62% 72.78% 78.73% 82.93%

125



0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 4.36% 12.88% 16.61% 18.29% 18.69% 19.43%ICP

GreedyDVS 17.30% 50.16% 65.40% 74.29% 79.91% 83.99%

PathDVS 8.86% 8.58% 10.39% 11.76% 12.02% 12.83%

EProfileDVS 11.38% 19.16% 22.67% 24.27% 24.62% 25.29%ILS

GreedyDVS 20.15% 51.73% 66.53% 75.15% 80.59% 84.53%

PathDVS 14.07% 11.12% 12.52% 13.82% 14.08% 14.87%


GreedyDVS 19.57% 50.94% 65.98% 74.75% 80.28% 84.30%


energy consumption with respect to different deadline extension rates for 100 tasks on 16 processors (unit: percentage)


0 0.2 0.4 0.6 0.8 1.0

EProfileDVS 5.06% 16.17% 18.78% 19.73% 20.22% 20.40%ICP

GreedyDVS 19.28% 52.75% 67.13% 75.46% 80.93% 84.77%

PathDVS 9.65% 9.41% 9.88% 10.28% 10.57% 10.82%

EProfileDVS 12.85% 23.59% 26.09% 26.97% 27.44% 27.71%ILS

GreedyDVS 23.23% 54.82% 68.55% 76.52% 81.75% 85.43%

PathDVS 13.39% 11.41% 11.49% 11.74% 13.01% 13.91%


GreedyDVS 21.74% 53.52% 67.62% 75.82% 81.20% 84.99%

126

5.4.3 Comparison between CPS (Used in Prior Scheduling for Energy Minimization) and ICP

We also compared our algorithm to the CPS assignment algorithm that is typically used in

the energy minimization literature [48]. Here we show the performance for a large number of

graphs with 100 and 200 tasks on 4 and 8 processors. The other experimental settings (e.g.,

execution time, communication time, etc.) are same with the above. The performance is also

measured in terms of total finish time and total energy consumption after applying slack

allocation in order to show the relationship between minimizing finish time and minimizing

energy consumption

The average ratio of total finish time of ICP to CPS is 0.71 and 0.59 on 4 and 8 processors

respectively. Figure 5-5 shows the result of comparison of ICP and CPS followed by slack

allocation (i.e., PathDVS) in terms of total energy consumption. The results show that ICP

assignment algorithm gives more energy savings compared to CPS assignment algorithm. It is

because ICP gives more slack that can be used to save energy due to the earlier total finish time.

For instance, the results for 100 tasks on 8 processors showed that that ICP required 40% less

time and 67-75% less energy as compared to CPS. And, the results for 100 tasks on 4 processors

showed that that ICP required 29% less time and 48-56% less energy as compared to CPS. From

these results, we can see that the assignment is one of critical factors to minimize energy

consumption because less finish time makes more slack potentially used for energy

minimization.

127

100 Tasks on 4 Processors

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

ICP-PathDVSCPS-PathDVS


00.10.20.30.40.50.60.70.80.9

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy



00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy



00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy


Figure 5-5. Normalized energy consumption of ICP and CPS using PathDVS with respect to

different deadline extension rates for different number of tasks and processors: (a) 100 tasks on 4 processors, (b) 100 tasks on 8 processors, (c) 200 tasks on 4 processors, and (d) 200 tasks on 8 processors

5.5 Experimental Results for Assignment Algorithms that Minimize Energy

We have conducted a number of simulations to evaluate the benefits of our algorithm to

other algorithms that do not consider energy profiles in the assignment. We also compared our

proposed scheduling algorithm with GA based algorithms that consider multiple assignments

[56, 57]. The performance of our energy based assignment algorithm is relatively independent of

the slack allocation and the time minimization assignment algorithms. Given that PathDVS and

ICP perform better than other related algorithms (as presented in Chapter 3 and Section 5.4

respectively), we use these algorithms for slack allocation and time minimization assignment

respectively.

128

The experimental results are presented into two broad subsections. In the first subsection,

we assume that the energy requirements of a task on a processor are relatively independent of the

execution time requirements. In the second subsection, we assume that there is a strong

correlation between time and energy requirements of executing the task on a processor.



experiments.



time of each task on each processor at the maximum voltage is varied from 10 to 40 units and the


4 units. The energy consumed to execute each task on each processor is varied from 10 to 80.

The execution of graphs is performed on 4, 8, 16, and 32 processors. For each combination of

values of number of tasks and processors, 20 different synthetic graphs are generated.


The performance is measured in terms of normalized total energy consumption and

computational requirements (i.e., runtime of algorithms). The former is defined as the total

energy consumption normalized by the energy consumption obtained from assignment algorithm

without a DVS scheme. We assume that the deadline is always larger than or equal to the finish

time of the DAG. Here the finish time of the DAG is based on the baseline assignment (i.e., time

minimization assignment using time based prioritization). The deadline extension rate is the

fraction of the total finish time that is added to the deadline (i.e., deadline = (1 + deadline

extension rate) * total finish time from assignment before applying DVS). We provide

129

experimental results for deadline extension rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6,

0.8, and 1.0.

5.5.1.3 Variations of our algorithms

We tested three variations of our algorithms to understand the impact of multiple

prioritizations (based on parameter α) and variable estimates on deadline for each task (based on

parameter β). The algorithms used in our experiments are classified into three categories: A0,

A1, and A2. First, A0 is an assignment for time based task prioritization (α = 1.0) and deadline

estimate equal to the latest finish time (β = 1.0). This is followed by a slack allocation and

corresponds to an assignment that is based on using base prioritization and allowing for

maximum allowable deadline for each task. Second, A1 is an ssignment for the weight of time

equal to one and the various weights of LFT (i.e., α = 1.0, β = 1.0, 0.75, and 0.5). For each of the

feasible assignment, a final slack allocation step is performed. This corresponds to assignments

that are based on using base prioritization. For this prioritization, attempt variable amounts of

estimated deadline given by β. The basic idea here is that choosing the maximum allowable

deadline for each task (i.e., higher value of β) may lead to infeasible assignments but may lead to

best energy requirements by providing more flexibility for processor selection. Finally, A2 is an

assignment for the various weights of time and LFT (i.e., α = 0, 0.2, 0.4, 0.6, 0.8, and 1.0, β =

1.0, 0.75, and 0.5). For all of the feasible assignments, a final slack allocation step is performed.

These correspond to assignments that are based on multiple prioritizations. For each

prioritization, attempt variable amounts of estimated deadline given by β.

The optimal values of α and β for A1 and A2 formulation are instance dependent. For each

instance all the values are attempted and the one that results in the minimal energy is chosen. We

chose the range of values of α and β as discussed above based on initial experimentation.

130

5.5.1.4 Variations of GA based algorithms

Genetic algorithms consist of a population of individuals that go through several

generations. The algorithms in [56, 57] use a nested set of individuals. The first set corresponds

to multiple mapping of tasks to processors. For each mapping, there is a population consisting of

multiple individuals corresponding ordering or prioritization of tasks. Each generation is used to

generate the next generation using crossover and mutation. The former combines two individuals

to generate a new set of two individuals. The latter modifies one of the individual. The fitness of

an individual is measured by the total energy requirements after applying a slack allocation

scheme and the satisfaction of deadline constraints. There are several parameters including the

number of individuals of the population, the crossover rate, and the mutation rate. The values of

parameters used in GA are set as suggested in [56, 57]. We terminate the GA algorithm if the

improvement is less than 1% after 10 generations as suggested in [56, 57]. The performance of

GA based algorithms depends on the slack allocation method and the initial seeding of the

population. To show the comparison between our algorithms and GA based approaches, we

conducted experiments with four variations of GA based algorithms: GARandNonOptimal,

GARandOptimal, GASolNonOptimal, and GASolOptimal.

• GA using DVS scheme in [56, 57] with randomly generated solutions for an initial population (i.e., GARandNonOptimal). This is the scheme that is presented in [56, 57]

• GA using PathDVS with randomly generated solutions for an initial population (i.e., GARandOptimal)

• GA using DVS scheme in [56, 57], with randomly generated population consisting of A0 as one of the solution (i.e., GASolNonOptimal)

• GA using PathDVS with randomly generated population consisting of A0 as one of the solution (i.e., GASolOptimal)

We chose different DVS schemes as the GA requires fitness calculations (in our case energy

required) for each solution that is generated. We wanted to find out if a less computationally

131

intensive DVS scheme during the GA process can lead to similar solutions as a more

computationally intensive DVS scheme.

5.5.2 DVS Schemes to Compute Expected Energy in Processor Selection Step

As discussed in the algorithm section, our approach requires the use of a DVS scheme

during the assignment of each task in order to compute expected DVS based energy to select the

best processor in the processor selection step. This is an intermediate step where exact energy

requirement is not needed. To reduce the time requirements of the optimal branch and bound

strategy for unit slack allocation as described in Chapter 3, we used a greedy strategy. To test

whether this strategy leads to inferior assignments, we compared the energy requirements using

these two methods for slack allocation during this intermediate step. Figure 5-6 shows this

comparison for different deadline extension rates. Since the performance difference in terms of

energy was not significant and the greedy scheme is one to two orders of magnitude faster, we

chose a greedy based scheme for this step.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4


Nor

mal

ized

Ene

rgy

A0-Optimal A0-Greedy

0

10000

20000

30000

40000

50000

60000

70000

80000

0 0.2 0.4


Run

time

A0-Optimal A0-Greedy

Figure 5-6. Comparison between optimal scheme and greedy scheme for processor selection of

A0 for 50 tasks on 4 and 8 processors: (a) with respect to normalized energy consumption and (b) with respect to runtime (unit: ms)

5.5.3 Independence between Time and Energy Requirements

In this section, we present the experimental results for the cases that the energy

requirement of a task on a processor is relatively independent of the execution time requirement.

132

5.5.3.1 Comparison of energy requirements of proposed algorithms

Figures 3-7 and 3-8 show the results of comparison of energy consumption for our

algorithms (i.e., A0, A1, and A2) and baseline algorithm (i.e., Base: the combination of ICP and

PathDVS) with respect to different deadline extension rates for different number of processors

(i.e., 4, 8, 16, and 32 processors) and tasks (i.e., 50 and 100 tasks). Based on the results, all of

our algorithms lead to significant energy reduction compared to baseline algorithm. Furthermore,

A2 is better than A1, while A1 is better than A0. For instance, using 1.0 deadline extension rate

for 32 processors, A0, A1, and A2 improves by 30.9%, 32.8%, and 36.8% over baseline

algorithm, respectively.

(a) 50 Tasks on 4 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(b) 50 Tasks on 8 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(c) 50 Tasks on 16 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(d) 50 Tasks on 32 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

Figure 5-7. Results for 50 tasks: Normalized energy consumption of our algorithms with respect

to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors

133

(a) 100 Tasks on 4 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(b) 100 Tasks on 8 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(c) 100 Tasks on 16 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

(d) 100 Tasks on 32 Processors

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseA0A1A2

Figure 5-8. Results for 100 tasks: Normalized energy consumption of our algorithms with respect

to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors

Figure 5-9 shows the improvement of our algorithms over baseline algorithm (i.e., Base:

ICP-PathDVS) with respect to different number of processors. Based on the results, as the

number of processors increases, the performance of our algorithms shows increased

improvement over baseline algorithm. For instance, with 0.4 deadline extension rate, A0

improves by 8.4%, 11.3%, 21.4%, and 31%, A1 improves by 10.6%, 12.7%, 23.1%, and 33%,

and A2 improves by 16.8%, 18.9%, 27.2%, and 35.8%, for 4, 8, 16, and 32 processors,

respectively, as compared to baseline algorithm.

134


0.00%1.00%2.00%3.00%4.00%5.00%6.00%7.00%8.00%9.00%

4 8 16 32

Number of Processors

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

4 8 16 32


Impr

ovem

ent

A0A1A2


0.00%5.00%

10.00%15.00%

20.00%25.00%

30.00%35.00%

40.00%

4 8 16 32


Impr

ovem

ent

A0A1A2


0.00%5.00%

10.00%15.00%

20.00%25.00%

30.00%35.00%

40.00%

4 8 16 32


Impr

ovem

ent

A0A1A2


0.00%5.00%

10.00%15.00%

20.00%25.00%

30.00%35.00%

40.00%

4 8 16 32


Impr

ovem

ent

A0A1A2

(f) 1.0 Deadline Extension Rate

0.00%5.00%

10.00%15.00%

20.00%25.00%

30.00%35.00%

40.00%

4 8 16 32


Impr

ovem

ent

A0A1A2

Figure 5-9. Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) with respect to different number of processors for variable deadline extension rates (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate

5.5.3.2 Comparison of energy requirements with GA based algorithms

We found that the GA (Genetic Algorithm) based algorithms have relatively poor

performance and do not always generate a feasible schedule (i.e., schedule that completes by a

given deadline), especially when the deadline is tight (i.e., small values of deadline extension

135

rate). Based on the results, in general, A0 was considerably better than the GA - the

improvements ranged anywhere from 50%-70% of the energy requirements of the GA. In the

following, we present the results of the comparison of our algorithms with four variations of GA

based algorithms (i.e., GARandNonOptimal, GARandOptimal, GASolNonOptimal,

GASolOptimal).

Comparison with GARandNonOptimal

Figure 5-10 shows the result of comparison between our algorithms and

GARandNonOptimal in terms of energy consumption with respect to different number of tasks

and processors. The GA based algorithm using initial solutions for task ordering and mapping

which are randomly generated does not provide good performance in terms of energy

consumption. Furthermore, it cannot even generate a feasible schedule (i.e., schedule meeting

deadline), especially under the tight deadline, when using the limited initial solution pool (i.e., 25

individuals for ordering and 50 individuals for mapping) and the constraint for the termination of

GA algorithm (i.e., repeat until no improvement of at least 1% is made for 10 generations) as

suggested in [56, 57]. We then provide the results for deadline extension rate equal to 1.0 to

fairly compare the energy with the feasible solutions generated. Based on the results,

GARandNonOptimal gives even worse performance than the baseline algorithm (i.e., Base: ICP-

PathDVS) – for example, 65% improvement of Base over GARandNonOptimal. Our algorithms,

A0, A1, and A2, respectively improve by 68.7%, 70.0%, and 73.1% in terms of energy

consumption compared to GARandNonOptimal, for 8 processors. As the number of processors

increases, our algorithms provide much more benefit. While A0 improves by 48.5% for 4

processors, it improves by 68.7% for 8 processors. Our algorithms also provide better

136

performance as the number of tasks increases. For instance, A0 improves by 58.3% for 50 tasks

and 61.5% for 100 tasks.

0

0.1

0.2

0.3

0.4

0.5

0.6

4 Processors 8 Processors


Norm

aliz

ed E

nerg

y

BaseGARandNonOptimalA0A1A2

0

0.1

0.2

0.3

0.4

0.5

50 Tasks 100 Tasks

Number of Tasks

Norm

aliz

ed E

nerg

y

BaseGARandNonOptimalA0A1A2

Figure 5-10. Normalized energy consumption of GARandNonOptimal and our algorithms for

different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks

Comparison with GARandOptimal

The performance did not significantly improve by using a better slack allocation scheme

like PathDVS which provides near optimal solutions for energy minimization. Like

GARandNonOptimal, due to the use of randomly generated initial solutions, the limited number

of individuals and the constraint for termination, GARandOptimal does not give good

performance. Figure 5-11 shows the results of comparison between our algorithms and

GARandOptimal in terms of energy consumption with respect to different number of tasks and

processors for 1.0 deadline extension rate. Based on the results, our algorithms, A0, A1, and A2,

respectively improve by 69.1%, 70.5%, and 73.5% in terms of energy consumption compared to

GARandOptimal, for 8 processors. Also, our algorithms become better over GARandOptimal as

the number of processors and tasks becomes increased. For instance, A0 improves by 46.5% and

69.1% for 4 and 8 processors, and 58.8% and 60.5% for 50 and 100 tasks, respectively.

137

0

0.1

0.2

0.3

0.4

0.5

0.6



Norm

aliz

ed E

nerg

y

BaseGARandOptimalA0A1A2

0

0.1

0.2

0.3

0.4

0.5

50 Tasks 100 Tasks

Number of Tasks

Norm

aliz

ed E

nerg

y

BaseGARandOptimalA0A1A2

Figure 5-11. Normalized energy consumption of GARandOptimal and our algorithms for

different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks

Comparison with GASolNonOptimal

Next we tried the other approach that seeds the population with a good solution (from A0)

because using all the randomly generated initial solutions leads to poor performance. Figure 5-12

shows the result of comparison between our algorithms and GASolNonOptimal in terms of

energy consumption with respect to different deadline extension rates for different number of

tasks and processors. Although GASolNonOptimal uses one good solution from A0, no

significant improvement was achieved as compared to A0. It is because their DVS scheme used

in GASolNonOptimal does not provide good performance in terms of energy consumption, while

our algorithms use PathDVS, optimal DVS scheme. Based on the results, our algorithms, A0,

A1, and A2, respectively improve by 11.1%, 15.0%, and 20.5% compared to GASolNonOptimal,

for 100 tasks on 8 processors with 1.0 deadline extension rate. Figure 5-13 shows the results of

comparison between our algorithms and GASolNonOptimal in terms of energy consumption

with respect to different number of tasks and processors for 1.0 deadline extension rate.

138


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy

BaseGASolNonOptimalA0A1A2


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1


Nor

mal

ized

Ene

rgy


Figure 5-12. Normalized energy consumption of GASolNonOptimal and our algorithms with

respect to different extension rates for different number of tasks and processors: (a) 50 tasks and 4 processors, (b) 50 tasks and 8 processors, (c) 100 tasks and 4 processors, and (d) 100 tasks and 8 processors

0

0.05

0.1

0.15

0.2

0.25



Nor

mal

ized

Ene

rgy

Base GASolNonOptimal A0 A1 A2

0

0.05

0.1

0.15

0.2

0.25

50 Tasks 100 Tasks

Number of Tasks

Nor

mal

ized

Ene

rgy

Base GASolNonOptimal A0 A1 A2

Figure 5-13. Normalized energy consumption of GASolNonOptimal and our algorithms: (a) with

respect to different number of processors and (b) with respect to different number of tasks

139

Comparison with GASolOptimal

Figure 5-14 shows the results of comparison between our algorithms and GASolOptimal in

terms of energy consumption with respect to different number of tasks and processors for 1.0

deadline extension rate. Although GASolOptimal uses one good solution from A0 and a near

optimal DVS scheme, no significant improvement was achieved as compared to A0. The

performance of A0 and GASolOptimal is very similar – the fractional difference between energy

requirements of A0 and GASolOptimal was between 0.00009 and 0.002. Furthermore, our

algorithms with iteration (i.e., A1, A2) provide the improved performance. Based on the results,

our algorithms, A1 and A2, respectively improve by 4.6% and 14.3% for 8 processors, 5.3% and

16.3% for 100 tasks.

0

0.05

0.1

0.15

0.2

0.25



Norm

aliz

ed E

nerg

y

BaseGASolOptimalA0A1A2

0

0.05

0.1

0.15

0.2

0.25

50 Tasks 100 Tasks

Number of Tasks

Norm

aliz

ed E

nerg

y

BaseGASolOptimalA0A1A2

Figure 5-14. Normalized energy consumption of GASolOptimal and our algorithms: (a) with

respect to different number of processors and (b) with respect to different number of tasks


Figure 5-15 shows the results of runtime requirements of our algorithms in terms of

runtime with respect to different deadline extension rates. The total runtime for A1 and A2 is

proportional to the number of different values of α and β times the runtime of A0. It is worth

noting that, since A1 and A2 can effectively execute in parallel, their runtime can be reduced

significantly in a parallel environments.

140

(a) 50 Tasks

0

10000

20000

30000

40000

50000

60000

70000

80000

0 0.2 0.4 0.6 0.8 1


Run

time A0

A1A2

(b) 100 Tasks

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

0 0.2 0.4 0.6 0.8 1


Run

time A0

A1A2

Figure 5-15. Runtime to execute our algorithms with respect to variable deadline extension rates for different number of tasks (unit: ms): (a) 50 tasks and (b) 100 tasks

Figure 5-16 shows the result of comparison of A0 and GA based algorithms in terms of

computational time (i.e., runtime taken to execute algorithms) for 1.0 deadline extension rate

with respect to different number of tasks. Based on the results, A0 is two orders of magnitude

faster than GA based algorithms using a suboptimal DVS scheme (i.e., RandNonOptimal and

GASolNonOptimal). Furthermore, A0 is 2237, 2406 times faster than GARandOptimal and

GASolOptimal which a nearly optimal DVS scheme is used.

1

10

100

1000

10000

100000

1000000

10000000

100000000

50 Tasks 100 Tasks

Number of Tasks

Run

time

A0GARandNonOptimalGASolNonOptimalGARandOptimalGASolOptimal

Figure 5-16. Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms – logarithmic scale)

141

5.5.4 Dependence between Time and Energy Requirements

In the experimental results presented in the previous section, we assumed that the time and

energy requirements of a task were independent of each other. We also conducted experiments to

see the performance with various degrees of correlation between time and energy consumption

for tasks on a given processor. We define a parameter γ that controls this correlation (i.e.,

correlation rate). The energy for a task is proportional to the execution time multiplied a value

varied from (1 – γ) to (1 + γ) (i.e., energy of each task = execution time of each task * [1-γ,

1+γ]). We experimented with a number of values for γ and present results for γ equal to 0, 0.4,

and 0.8. We also compare the results with the independent case as defined in the previous

section. This corresponds to rand.

Figure 5-17 shows the energy improvement of our algorithms over ICP-PathDVS (i.e.,

baseline algorithm) for variable values of γ and different deadline extension rates, for 8

processors respectively. Based on the results, as the parameter γ increases, the relative

improvement of our algorithms increases. For instance, with 1.0 deadline extension rate, A0

improves by 4%, 5.7%, and 17.9%, A1 improves by 7.6%, 9.5%, and 20.5%, and A2 improves

by 15.5%, 19.2%, and 30.3%, for γ equal to 0, 0.4, and 0.8 respectively. For the case of time-

independent energy consumption (based on our experimental setting), the improvement is

between value of γ between 0.4 and 0.8. For instance, using the ‘rand’ option (i.e., time-

independent energy consumption), A0 improves by 12.5%, A1 improves by 16.9%, and A2

improves by 23.2%, for 1.0 deadline extension rate.

142


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Energy Heterogeneity Rate

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2

Figure 5-17. Results for 4 processors: Improvement of our algorithms over ICP-PathDVS (i.e.,

baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate

143


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Energy Heterogeneity Rate

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2


0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

0 0.4 0.8 rand

Correlation Rate γ

Impr

ovem

ent

A0A1A2

Figure 5-18. Results for 8 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate

144

CHAPTER 6 DYNAMIC ASSIGNMENT

We assume that a static scheduling algorithm has already been applied before executing

tasks and the schedule needs to be adjusted whenever a task finishes before its scheduled time.

Thus this schedule is updated whenever a dynamic scheduling is applied. When a task finishes

before its estimated time, two changes may occur for all the remaining tasks (i.e., tasks that have

not yet executed) in the schedule. Its processor mapping may change (along with the start time

and end time). Also, the amount of slack (time over minimum execution time for that processor

based on executing the task at maximum voltage) may change.

Most prior research on scheduling for energy minimization does not focus on the

assignment process, in particular, in dynamic environments. We have shown that reallocating the

slack at runtime (i.e., dynamic slack allocation) leads to better energy minimization in Chapter 4.

We also showed that applying our dynamic slack allocation method at runtime not only

outperforms the existing greedy method but also is comparable to static near optimal methods

applied at runtime in terms of energy requirements in Chapter 4.

In this chapter, we explore whether reassignment of tasks along with reallocation of slack

during runtime can lead to even better performance in terms of energy minimization. For an

approach that is effective at runtime, its overhead should be small for it to be useful. The

proposed dynamic scheduling algorithm utilizes several threads to generate a schedule:

• One set for reallocating slack while keeping the assignment in the current schedule.

• Another set for changing the assignment and then reallocating slack.

Then a schedule providing the minimum energy is selected.

As described in Chapter 4, for the dynamic scheduling (i.e., rescheduling), there are two

steps that need to be addressed. First, select the subset of tasks for rescheduling. The potentially

145

rescheduled tasks via the dynamic scheduling algorithm are tasks which have not yet started

when the algorithm is applied. We assume that the voltage can be selected before a task starts

executing. The dynamic scheduling is applied to the subset of tasks among the tasks. The tasks

considered for rescheduling are limited in order to minimize the overhead of reassigning

processors and reallocating the slack during runtime. Clearly, this should be done so that the

other goal of energy reduction is also met simultaneously. Second, determine the time range for

the selected tasks. The time range of the selected tasks has to be changed as some of the tasks

have completed earlier than expected. Based on the computation time in the schedule and

assignment-based dependency relationships among tasks, we recompute the time range (i.e.,

earliest start time and latest finish time) where the selected tasks should be executed. The time

range is defined differently for reassignment and slack reallocation – time range over processors

for reassignment and time range for the selected tasks given an assignment for slack reallocation.

However, the main concept is same as the selected tasks have to be reassigned and reallocated

slack within this time range in order to meet deadline constraints.

At this stage our proposed reassignment algorithm and slack reallocation approach are

applied to the subset of tasks within the time range as described above. The computational time

(i.e., runtime overhead) is kept small due to the limited number of tasks selected for

rescheduling. While several assignment methods can be applied using threads, we propose a

reassignment method based on our method described in Chapter 5. This incorporates the

expected DVS based energy information during the reassignment process. The dynamic

assignment algorithm is described in detail in the next section.

6.1 Proposed Dynamic Assignment

This section presents a novel dynamic assignment algorithm which reassigns processors

for the reschedulable tasks at runtime. The main feature of our proposed reassignment algorithm

146

is to consider the energy requirements based on potential slack during the assignment step. In

other words, the algorithm assigns an appropriate processor for each reschedulable task such that

the total energy expected after slack allocation is minimized. The expected energy after slack

allocation for each reschedulable task is computed by using the estimated deadline for the task so

that the overall DAG can be executed by the deadline.

6.1.1 Choosing a Subset of Tasks for Rescheduling

The proposed dynamic scheduling algorithm, k lookahead approach, is based on choosing

a subset of tasks for which the schedule will be readjusted. The schedule for the remaining tasks

(i.e., tasks not selected for the rescheduling) is not affected. Figure 5-1 shows the subset of tasks

for rescheduling in an assignment DAG when task τ2 finishes early.

Using k lookahead approach, all tasks within a limited range of time are considered for the

readjustment of schedule. The range of time is limited with the value of k (i.e., k * maximum

computation time of tasks). In the example of Figure 5-1, assume that the computation time of

each task is one unit, the communication time among tasks is zero, and the tasks in the same

depth finish at the same time for ease of presentation of the key concepts. In this case, if k is

equal to 2, the time range would be 2 units (2 * one unit) and then tasks within the time range

from the finish of task τ2, e.g., τ4, τ5, τ6, τ7, τ8, τ9, and τ10, are considered. The set of tasks

selected for the rescheduling is defined by

s.t. where

},max

lll

jΓτliliiallocation

estaticFTimftimeτ

compTimek*fimeime, staticFTftimeme|staticSTi{τΓj

≠

+≤≥=∈

where staticSTimei is the start time of task τi in the static or previous schedule, staticFTimei is the

finish time of task τi in the static or previous schedule, ftimel is the actual finish time of task τl at

147

runtime, and compTimej is the computation time of task τj on its assigned processor, a.k.a., the

estimated execution time at the maximum voltage.

The approach with ‘all’ option for k (i.e., k-all lookahead approach) corresponds to the

static scheduling approach without the limitation on the time range for tasks considered for

rescheduling. Thus, the k-all lookahead approach is same as applying the static scheduling

algorithm to all the remaining tasks at runtime. One would expect this to be close to the best that

can be achieved. The set of tasks selected for the rescheduling is defined by

lllliiallocation estaticFTimftimeτftimeestaticSTimΓ ≠≥= s.t. where},|{τ

6.1.2 Time Range for Selected Tasks

The schedule for tasks not in the set of reschedulable tasks is kept to be the same (this is

based on static schedule or schedule generated by last rescheduling). For the set of reschedulable

tasks, the range of time to execute them is defined for feasible solutions before dynamic

scheduling algorithm. The time range is differently defined for reassignment and slack

reallocation – time range over each processor for reassignment and time range for the set of

reschedulable tasks given an assignment for slack reallocation. It is because reassignment can

map a task to any processor. Even in a case that there is a processor where no reschedulable task

is assigned, the time range over the processor for reassignment may be limited based on the

assignment of other tasks no in the set of reschedulable tasks. Meanwhile, for slack reallocation,

there is no need to define the time range for all processors, but only for the set of selected tasks

because the slack is reallocated based on a given assignment. For reassignment, the time range of

processors is defined as follows.

First, the minimum computation time of a task is set to its estimated time at the maximum

voltage (i.e., staticCTimei = compTimei where τi ∈ Γallocation. Here staticCTimei is the computation

148

time of task τi in the static or previous schedule generated by the last rescheduling). This is the

same time that was used during static assignment process. This effectively ensures that

maximum flexibility is available for reassignment.

Second, the available start time of each processor is the possible earliest start time of each

processor for the tasks. It is set to the expected finish time (i.e., the finish time in the current

schedule) of the last task that is not in the set of reschedulable tasks and already started when

applying an algorithm (it is still executing or finished) on each processor (i.e., a task with the

latest finish time on each processor among tasks not in the set of reschedulable tasks). It is worth

noting that it is not the earliest start time of reschedulable tasks on each processor. The earliest

start times of the tasks on a processor are different due to the precedence relationships among

other tasks. The available start time of a processor pj, procSTimej, is defined by

iiliji

ij

estaticSTimftimeestaticSTimpprocwhereestaticFTimprocSTime

max&& ,

<=

=

Finally, the deadline of each processor is the possible latest finish time of each processor

for the tasks. It is set to the expected start time (i.e., the start time in the current schedule) of the

first task that is not in the set of reschedulable tasks and is not started yet when applying an

algorithm on each processor (i.e., a task with the earliest start time on each processor among

tasks not in the set of reschedulable tasks). It is worth noting that it is not the latest finish time of

reschedulable tasks on each processor. Like the earliest start time, the latest finish times of the

tasks on a processor are different due to the precedence relationships among other tasks. The

deadline of a processor pj, procDeadlinej, is defined by

iijicallocationiij estaticSTimpprocΓτwhereestaticSTimneprocDeadli min&& , =∈=

149

6.1.3 Estimated Deadline and Energy

The goal of the assignment is to minimize the expected total energy consumption after

slack allocation while still satisfying deadline constraints. Consider a scenario where the

assignment of a subset of tasks has already been completed and a given next task in the

prioritization list has to be assigned. The choice of the processors that can be assigned to this task

should be limited to the ones where expected finish time from the overall assignment will lead to

meeting the deadline constraints (else this will result in an infeasible assignment). Clearly, there

is no guarantee that the schedule derived will be a feasible schedule (i.e., a schedule meeting

deadline) at the time when the assignment for a given task is being determined because the

feasibility of the schedule depends on the assignment of the other remaining tasks whose

assignment is not determined.

The proposed algorithm calculates the estimated deadline for each task, that is, deadline

expected to enable a feasible schedule if the task’s finish time satisfies its estimated deadline.

The estimated deadline of a task is set to the latest finish time in order to allow more flexibility

for processor assignment as the task can take a longer time to complete (while the probability of

feasible schedule for DAG may be lower). The latest finish time of task τi, LFTi, is defined by

( )( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

−−

−=

∈ ijjjsucc

pSuccpSucci

i commTimeestaticCTimLFT

estaticCTimLFTdeadlineLFT

ij

ii

τmin

, ,min

Here the latest finish time of a task is different based on its potential assigned processor due to

the assignment-based dependency relationship among tasks. From this fact, the time limit which

a task should be completed within will vary for processors.

Using this estimated deadline, the estimated energy of reschedulable tasks is computed

while selecting processors for reassignment. The estimated energy is the energy expected after

150

slack allocation. For the computation of the estimated energy, we apply the principle of unit

slack allocation used in PathDVS algorithm which is a static slack allocation algorithm providing

near optimal solutions. The unit slack allocation used in PathDVS algorithm (described in

Chapter 3) finds the subset of tasks which maximally reduces the total energy consumption. This

corresponds to the maximum weighted independent set (MWIS) problem [7, 53, 65]. This is

computationally intensive. Our approach requires the use of a DVS scheme during the

assignment of each task in order to compute expected DVS based energy to select the best

processor in the processor selection step. This is an intermediate step where exact energy

estimates are not as important. To reduce the time requirements of the optimal branch and bound

strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS

problem [53] can be used. The greedy algorithm in our approach is as follows:

• Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already assigned tasks and a task considered for assignment).

• Select a task with the maximum energy reduction among the independent tasks of the previously selected task.

• Iteratively select a task until there is no independent task of the selected tasks.

The above greedy approach for unit slack allocation is iteratively performed until there is no

slack or no task for slack allocation under the estimated deadline constraints. In the proposed

greedy approach, the independent tasks can be easily identified using compatible task matrix or

lists which represent the list of tasks which can share unit slack together for each task or vice

versa like in PathDVS.


Figure 6-1 presents a high level description of the assignment procedure. The task is

assigned to a processor such that the total energy consumption expected after applying DVS

151

scheme for the tasks that have already been assigned so far (and including the new task that is

being considered for assignment) is minimized while trying to meet estimated deadline of the

task. The candidate processors for the task are selected such that the task can execute within its

estimated deadline. Note that the estimated deadline of a task may be different based on

processors. Once selecting the candidate processors for the task, the next process is followed

depending on the three following conditions.

First, if no processor is available to satisfy the estimated deadline, the processor with the

earliest finish time is selected (it is possible that it later becomes a feasible schedule as the

assignment is based on estimated times for future tasks whose assignment is yet to be

determined). When the task finishes within its latest finish time, we assume that the deadline of a

DAG can be met with a high probability. By selecting a processor where the task finishes earlier,

the chance to meet deadline becomes increased. However, if its finish time exceeds the time

range for reschedulable tasks or its specific deadline, the reassignment process stops because it

means the schedule will not meet deadline constraints obviously.

Second, if there is only one processor that meets the above constraint, the task is assigned

to that processor. It is also in order to increase the chance to meet deadline constraints.

Finally, if there are more than one candidate processors that meet the above constraint, a

processor is selected such that the total energy expected after slack allocation is minimized. The

expected total energy is the sum of expected energy of already assigned tasks and the task

considered for assignment. For the computation of the expected energy for a given processor

assignment in this step a faster heuristic based strategy (as compared to PathDVS which is nearly

optimal) is used as described in the previous subsection.

152

The above selection process is iteratively performed until all selected tasks for

rescheduling are assigned. However, if the finish time of a task exceeds the deadline, the process

stops and the previous assignment is kept for all reschedulable tasks.

Figure 6-1. The DynamicDVSbasedAssignment procedure


In this section, we compare the performance of the combination of dynamic assignment

and dynamic slack allocation proposed in this paper (i.e., DynamicAssgn) with the following two

main methods which outperform other existing in each given state:

Procedure DynamicDVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task τi can execute within its estimated deadline Condition 1: If there is no processor 4.1. If the finish time of the task τi > deadline 4.2. Stop the procedure 4.3. Else 4.4. If there is any processor such that the task can execute within processor’s deadline for reschedulable tasks 4.5 Select a processor such that the finish time of the task τi is minimized 4.6. Else 4.7. Stop the procedure 4.8. End If 4.9. End If Condition 2: If there is only one processor 4.1. Select the processor for the task τi Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem

for the task τi and the already assigned task 4.2. Select a processor such that the total energy is minimized 5. End For End Procedure

153

• Static scheduling (i.e., StaticDVS) presented in Chapter 3: This static scheduling provides near optimal solutions for energy minimization given an assignment. However, it keeps the schedule generated at compile time during runtime.

• Dynamic slack allocation (i.e., DynamicDVS) presented in Chapter 4: This dynamic slack allocation readjusts the schedule whenever a task finishes earlier than expected during runtime while keeping a given assignment. In our experiments, k-3 time lookahead slack allocation approach which gives good performance in terms of energy is used.

The dynamic algorithms (i.e., DynamicDVS, DynamicAssgn) are applied to a static

schedule that is based on a known assignment algorithm which assigns based on the early finish

time and a static slack allocation algorithm. We use a static scheduling algorithm presented in

Chapter 3. Also, for a fair comparison with DynamicDVS, k-3 time lookahead approach for

DynamicAssgn is used and PathDVS is used as a slack allocation method applied at runtime.

6.2.1 System Methodology

In this section, we describe DAG generation, dynamic environments generation, and

performance measure used in our experiments.



time of each task on each processor at the maximum voltage is varied from 10 to 40 units and the


4 units. The energy consumed to execute each task on each processor is varied from 10 to 80.

The execution of graphs is performed on 4, 8, 16, and 32 processors.

6.2.1.2 Dynamic environments generation

There are two broad parameters for dynamic environments:

• The number of tasks that finish earlier than expected (i.e., tasks whose the actual execution time is less than its estimated execution time) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyFinishedTaskRate * total number of tasks).

• The amount of decrease for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecreaseRate * estimated execution time).

154

We experimented with earlyFinishedTaskRate’s equal to 0.2, 0.4, 0.6, and 0.8 and

timeDecreaseRate’s equal to 0.1, 0.2, 0.3, and 0.4.


The deadline extension rate is the fraction of the total finish time that is added to the

deadline (i.e., deadline = (1 + deadline extension rate) * total finish time from assignments

without DVS scheme). We experimented with deadline extension rates equal to 0 (no extension),

0.01, 0.02, 0.05, 0.1, and 0.2, but only the results for no deadline extension are presented due to

space limitations since the results are similar. To compare algorithms, the normalized energy

consumption, that is, total energy normalized by the energy obtained from the static assignment

(before applying static slack allocation), is used. The computational time (i.e., runtime overhead)

is also performed as an important measure for algorithms in dynamic environments.

6.2.2 Comparison of Energy Requirements

Figures 6-2, 6-3, 6-4, and 6-5 show the comparison of our algorithm with static scheduling

and dynamic slack allocation in terms of energy consumption with respect to different time

decrease rates and different early finished task rates for 4, 8, 16, and 32 processors, respectively.

Based on the results, the combination of dynamic assignment and dynamic slack allocation (i.e.,

DynamicAssgn) significantly outperforms static scheduling and dynamic slack allocation in

terms of energy consumption. For instance, for 32 processors, DynamicAssgn improves energy

requirements by 15-26% and 8-12% compared to StaticDVS and DynamicDVS respectively.

These results show that adjusting the assignment at runtime as well as adjusting the slack at

runtime is necessary for minimizing the energy requirements. Furthermore, in general, the

improvement of DynamicAssgn over the other two algorithms increases as timeDecreaseRate

and earlyFinishedTaskRate increase.

155


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy

StaticDVSDynamicDVSDynamicAssgn


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy


Figure 6-2. Results for 4 processors: Normalized energy consumption of StaticDVS,

DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks

156


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy




157


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy




158


0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy



0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4

Time Decrease Rate

Nor

mal

ized

Ene

rgy




6.2.3 Comparison of Time Requirements

Figure 6-6 shows the average time requirement to readjust the schedule due to a single

task’s early finish (i.e., runtime overhead). The computational time of DynamicAssgn is an order

of magnitude larger than DynamicDVS since DynamicAssgn requires assignment process as

well as slack allocation process. However, DynamicAssgn requires 0.02-0.04 seconds in average

to readjust the schedule at runtime – this small time should make it useful for a large number of

computation intensive applications.

159

50 Tasks

1000

10000

100000

1000000

10000000

100000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim

e

DynamicDVSDynamicAssgn

100 Tasks

1000

10000

100000

1000000

10000000

100000000

0.1 0.2 0.3 0.4

Time Decrease Rate

Com

puta

tiona

l Tim

e

DynamicDVSDynamicAssgn

Figure 6-6. Computational time to readjust the schedule from an early finished task with respect

to different time decrease rates (unit: ns – via logarithmic scale)

160

CHAPTER 7 CONCLUSION AND FUTURE WORK

Energy consumption is a critical issue in parallel and distributed embedded systems. The

scheduling for DVS based energy minimization broadly consists of two steps: assignment and

slack allocation.

• Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level.

• Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the total energy consumption is minimized while the DAG can execute within a given deadline.

We have presented novel scheduling algorithms to minimize DVS based energy

consumption of DAG based applications under the deadline constraints for parallel systems. The

proposed scheduling algorithms are classified into four categories: static slack allocation,

dynamic slack allocation, static assignment, and dynamic assignment, presented in Chapter 3, 4,

5, and 6, respectively. In this chapter, we review our main contributions for scheduling

algorithms presented in this thesis.

7.1 Static Slack Allocation

In Chapter 3, we have presented a novel static slack allocation algorithm (i.e., static DVS

scheme) for DAG based application in parallel and distributed systems. There are three main

contributions of our method:

• The performance in terms of reducing energy is comparable to LP (Linear Programming) based algorithm which provides near optimal solutions.

• It requires significantly less memory as compared to the LP based algorithm and can be scaled to larger size problems.

• The time requirements of our algorithm are an order to two orders of magnitude faster than the LP based algorithm when the amount of total available slack is small (i.e., tight deadline).

161

Our experimental results also show that the energy reduction of our proposed algorithm is

considerably better than simplistic schemes. Furthermore, based on the efficient techniques for

search space reduction such as compatible task lists, compression, and lower bound, the branch

and bound search method can be effectively used to provide near optimal solutions for energy

minimization while requiring the low computational time.

7.2 Dynamic Slack Allocation

In Chapter 4, we have presented novel slack allocation algorithms to minimize energy

consumption/meet deadline constraints for DAG based applications in dynamic environments,

where the actual execution time of a task may be different from its estimated time. There are

three main contributions of our methods:

• They require significantly less computational time (i.e., runtime overhead) than applying the static algorithm at runtime for every instance when a task finishes early or late.

• The performance in terms of reducing energy and/or meeting a given deadline is comparable to applying the static algorithm at runtime.

• They are effective for cases when the estimated execution time is underestimated or overestimated.

The experimental results also show that our methods offer significant improvement over

simplistic greedy methods in terms of energy requirements and/or satisfying the deadline

constraints. Our methods have been shown to work for environments where the estimated time

for all tasks is greater than or equal to the execution time (i.e., underestimation) or where the

estimated time for all tasks is less than or equal to the execution time (i.e., overestimation).

However, they should be equally effective for hybrid environments where some tasks complete

before estimated time while some tasks complete after estimated time.

162

7.3 Static Assignment

In Chapter 5, we have presented novel static assignment algorithms to minimize DVS

based energy consumption of DAG based applications for parallel systems. The proposed

assignment algorithms effectively assign tasks to appropriate processors with the goal of energy

minimization by utilizing expected DVS based energy information during assignment and

considering multiple task prioritizations based time and energy. There are three main

contributions of our methods:

• Through the assignment method to minimize finish time, the deadline constraints are satisfied and also the energy can be reduced due to the generation of a larger amount of slack that can be allocated to tasks during the slack allocation step.

• The performance in terms of reducing energy requirements is significantly improved by incorporating energy minimization during the assignment process.

• They require two to three orders of magnitude less time as compared to the Genetic Algorithm based formulations which outperform other existing algorithms in terms of energy consumption.

Our experimental results show that our proposed algorithms significantly outperform in terms of

energy consumption with the lower computational time compared to existing algorithms.

7.4 Dynamic Assignment

In Chapter 6, we have presented a novel assignment algorithm to minimize energy

consumption for dynamic environments. The proposed algorithm adjusts the schedule by

reassigning tasks to processors and then reallocating slack to tasks, whenever a task finishes

earlier than expected at runtime. There are two main contributions of our method:

• The time requirements of our scheme are small enough that it should be useful for a large number of application workflows.

• It provides considerably better energy minimization compared to (a) static scheduling without any change of the schedule at runtime and (b) only reallocating the slack at runtime while keeping the assignment.

163

Our experimental results show that our proposed algorithms significantly outperform in terms of

energy consumption with the lower computational time. Our scheme can easily be modified to

cases when the actual execution time is greater than the estimated time like dynamic slack

allocation, although in these cases the deadline guarantees cannot be maintained.

7.5 Future Work

In this thesis, we have presented scheduling algorithms assuming that there is no resource

contention. However, in practice, resources such as buses, caches, and I/O devices may be shared

between multiple tasks. These types of resource conflict can have a significant impact on the

time and energy requirements and have to be effectively incorporated in scheduling. We will

develop algorithms that can model and encompass these issues for energy minimization.

164

LIST OF REFERENCES

1. AeA (formerly American Electronics Association) Report Cybernation, http://www.aeanet.org

2. R. K. Ahuja and J. B. Orlin, A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints, Operations Research, 49(5), Sept. 2001, pp. 784-789.

3. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics, Euromicro Conference on Real-Time Systems (ECRTS’01), Delft, Netherlands, June 2001, pp.225-232.

4. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Dynamic and Aggressive Scheduling Techniques for Power-Aware Real-Time Systems, Real-Time Systems Symposium (RTSS’01), London, UK, Dec. 2001, pp.95-105.

5. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Power-Aware Scheduling for Periodic Real-Time Tasks, IEEE Transactions on Computers, 53(5), May 2004, pp.584-600.

6. N. K. Bambha, S. S. Bhattacharyya, J. Teich, and E. Zitzier, A Hybrid Global/Local Search Strategies for Dynamic Voltage Scaling in Embedded Multiprocessors, International Symposium on Hardware/Software Codesign (CODES’01), Copenhagen, Denmark, Apr. 2001, pp.243-248.

7. S. Basagni, Finding a Maximal Weighted Independent Set in Wireless Networks, Telecommunication Systems, 18(1-3), Sept. 2001, pp.155-168.

8. T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems, Journal of Parallel and Distributed Computing, 61(6), June 2001, pp.810-837.

9. T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, Dynamic Voltage Scaled Microprocessor System, IEEE Journal of Solid-State Circuits, 35(11), Nov. 2000, pp.1571-1580.

10. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-Power CMOS Digital Design, IEEE Journal of Solid-State Circuits, 27(4), Apr. 1992, pp.473-484.

11. J. Chen, H. Hsu, K. Chuang, C. Yang, A. Pang, and T. Kuo, Multiprocessor Energy-Efficient Scheduling with Task Migration Considerations, Euromicro Conference on Real-Time Systems (ECRTS’04), Sicily, Italy, July 2004, pp.101-108.

http://www.aeanet.org/�

165

12. J. Chen and T. Kuo, Multiprocessor Energy-Efficient Scheduling for Real-Time Tasks with Different Power Characteristics, International Conference on Parallel Processing (ICPP’05), Oslo, Norway, June 2005, pp.13-20.

13. P. Chowdhury and C. Chakrabarti, Static Task-Scheduling Algorithms for Battery-Powered DVS Systems, IEEE Transactions on Very Large Scale Integration Systems, 13(2), Feb. 2005, pp.226-237.

14. CPLEX, http://www.ilog.com/products/cplex/

15. Dataquest, http://data1.cde.ca.gov/dataquest/

16. H. El-Rewini and T. G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target Machines, Journal of Parallel Distributed Computing, 9(2), June 1990, pp.138-153.

17. W. Felter, K. Rajamani, T. Keller, and C. Rusu, A Performance-conserving Approach for Reducing Peak Power Consumption in Server Systems, International Conference on Supercomputing (ICS’05), Cambridge, MA, USA, June 2005, pp.293-302

18. F. Franchetti, Y. Voronenko, and M. Pueschel, FFT Program Generation for Shared Memory: SMP and Multicore, Supercomputing (SC’06), Tampa, FL, USA, Nov. 2006, pp.51.

19. D. Geer, Chip Makers Turn to Multicore Processors, IEEE Computer, 38(5), May 2005, pp.11-13.

20. K. Govil, E. Chan, and H. Wasserman, Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU. International Conference on Mobile Computing and Networking, Berkeley, CA, USA, Nov. 1995, pp.13-25.

21. F. Gruian, Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors, International Symposium on Low Power Electronics and Design, Huntington Beach, CA, USA, Aug. 2001, pp.46-51.

22. F. Gruian and K. Kuchcinski, LEneS: Task Scheduling for Low-Energy Systems Using Variable Supply Voltage Processors, Asian South Pacific Design Automation Conference (ASP-DAC’01), Yokohama, Japan, Jan. 2001, pp.449-455.

23. F. Gruian and K. Kuchcinski, Uncertainty-Based Scheduling: Energy-Efficient Ordering for Tasks with Variable Execution Time, International Symposium on Low Power Electronics and Design, Seoul, Korea, Aug. 2003, pp.465-468.

24. D. S. Hochbaum and J. G. Shanthikumar, Convex Separable Optimization Is Not Much Harder than Linear Optimization, Journal of the ACM, 37(4), Oct. 1990, pp.843-862.

25. I. Hong, G. Qu, M. Porkonjak, and M. B. Srivastava, Synthesis Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors, Real-Time Systems Symposium (RTSS’98), Madrid, Spain, Dec. 1998, pp.178-187.

http://www.ilog.com/products/cplex/�

http://data1.cde.ca.gov/dataquest/�

166

26. I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava, Power Optimization of Variable-Voltage Core-Based Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(12), Dec. 1999, pp.1702-1714.

27. J. Hu and R. Marculescu, Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures under Real-Time Constraints, Design, Automation and Test in Europe Conference (DATE’04), Paris, France, Feb. 2004, pp.10234.

28. J. Hu and R. Marculescu, Communication and Task Scheduling of Application-Specific Networks-on-Chip, Computer and Digital Techniques, 152(5), Sept. 2005, pp.643-651

29. S. Hua and G. Qu, Power Minimization Techniques on Distributed Real-Time Systems by Global and Local Slack Management, Asia South Pacific Automation Conference (ASP-DAC’05), Shanghai, China, Jan. 2005, pp.830-835.

30. O. H. Ibarra and C. E. Kim, Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors, Journal of the ACM, 24(2), Apr. 1977, pp. 280-289.

31. T. Ishihara and H. Yasuura, Voltage Scheduling Problem for Dynamically Variable Voltage Processors, International Symposium on Low Power Electronics and Design (ISLPED’98), Monterey, CA, USA, Aug. 1998, pp.197-202.

32. M. Iverson, F. Ozuner, G. Follen, Parallelizing Existing Applications in a Distributed Heterogeneous Environment, Heterogeneous Computing Workshop (HCW’95), Santa Barbara, California, USA, Apr. 1995, pp.93-100.

33. R. Jejurikar and R. Gupta, Dynamic Slack Reclamation with Procrastination Scheduling in Real-Time Embedded Systems, Design Automation Conference (DAC’05), San Diego, California, USA, June 2005, pp.111-116.

34. R. Jejurikar and R. Gupta, Energy-Aware Task Scheduling With Task Synchronization for Embedded Real-Time Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6), June 2006, pp.1024-1037.

35. R. Jejurikar and R. Gupta, Optimized Slowdown in Real-Time Task Systems, IEEE Transactions on Computers, 55(12), Dec. 2006, pp.1588-1598.

36. A. Jerraya, H. Tenhunen, and W. Wolf, Multiprocessor Systems-on-Chips, IEEE Computer, 38(7), July 2005, pp.36-40.

37. P. V. Karzanov and S. T. McCormick, Polynomial Methods for Separable Convex Optimization in Unimodular Linear Spaces with Applications, SIAM Journal of Computing, 26(4), Aug. 1997, pp.1245-1275.

38. W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, Performance Comparison of Dynamic Voltage Scaling Algorithms for Real-Time Systems, Real-Time and Embedded Technology and Application Symposium (RTAS’02), San Jose, CA, USA, Sept. 2002, pp.219-228.

167

39. R. Kumarm K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, International Symposium on Microelectronics, Washington, DC, USA, Dec. 2003, pp. 81.

40. R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous Chip Multiprocessors, IEEE Computer, 38(11), Nov. 2005, pp. 32-38.

41. Y. Kwok and I. Ahmad, Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 7(5), May 1996, pp.506-521.

42. Y. Kwok and I. Ahmad, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Computing Surveys, 31(4), December 1999, pp.406-471.

43. W. Kwon and T. Kim, Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors, ACM Transactions on Embedded Computing Systems, 4(1), Feb. 2005, pp.211-230.

44. G. Q. Liu, K. L. Poh, and M. Xie, Iterative List Scheduling for Heterogeneous Computing, Journal of Parallel and Distributed Computing, 65(5), May 2005, pp.654-665.

45. J. Luo and N. K. Jha, Power-conscious Joint Scheduling of Periodic Task Graphs and Aperiodic Tasks in Distributed Real-time Embedded Systems, International Conference on Computer-Aided Design (ICCAD’00), San Jose, California, USA, Nov. 2000, pp.357-364.

46. J. Luo and N. K. Jha, Battery-Aware Static Scheduling for Distributed Real-Time Embedded Systems, Design Automation Conference (DAC’01), Las Vegas, NV, USA, June 2001, pp.444-449.

47. J. Luo and N. K. Jha, Static and Dynamic Variable Voltage Scheduling Algorithms for Real-Time Heterogeneous Distributed Embedded Systems, Asia South Pacific Design Automation Conference (ASP-DAC’02), Bangalore, India, Jan. 2002, pp.712-719.

48. J. Luo and N. K. Jha, Power-profile Driven Variable Voltage Scaling for Heterogeneous Distributed Real-time Embedded Systems, International Conference on VLSI Design (VLSI’03), Las Vegas, Nevada, USA, Jan. 2003, pp.369-375.

49. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling for Minimizing Energy or Minimizing Power, International Conference on Acoustic, Speech, and Signal Processing (ICASSP’00), Istanbul, Turkey, June 2000, pp.3239-3242.

50. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling Algorithms for Minimizing Energy, International Symposium on Low Power Electronic Design (ISLPED’01), Huntington Beach, California, USA, Aug. 2001, pp.279-282.

168

51. R. Mishra, N. Rastogi, D. Zhu, D. Mossé, and R. Melhem, Energy Aware Scheduling for Distributed Real-Time Systems, International Parallel and Distributed Processing Symposium (IPDPS’03), Nice, France, Apr. 2003, pp.21b.

52. P. Pillai and K. G. Shin, Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems, ACM Symposium On Operating Systems Principles, Banff, Alberta, Canada, Oct. 2001, pp.89-102.

53. S. Sakai, M. Togasaki, and K. Yamazaki, A Note on Greedy Algorithms for the Maximum Weighted Independent Set Problem, Discrete Applied Mathematics, 126(2-3), Mar. 2003, pp.313-322.

54. V. Sarkar, Partitioning and Scheduling Parallel Programs for Multi-processors, Cambirdge, Mass, MIT Press, 1989.

55. M. T. Schmitz and B. M. Al-Hashimi, Considering Power Variations of DVS Processing Elements for Energy Minimisation in Distributed Systems, International Symposium on System Synthesis, Montréal, P.Q., Canada, Oct. 2001, pp.250-255.

56. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Systems, Design, Automation, and Test in Europe Conference (DATE’02), Paris, France, Mar. 2002, pp.514-521.

57. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Iterative Schedule Optimization for Voltage Scalable Distributed Embedded Systems, ACM Transactions on Embedded Computing Systems, 3(1), Feb. 2004, pp.182-217.

58. S. Shankland and M. Kanellos, Intel to Elaborate on New Multicore Processor, http://news.zdnet.co.uk/hardware/0,1000000091,39116043,00.htm?r=1

59. Y. Shin and K. Choi, Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems, Design Automation Conference (DAC’99), New Orleans, Louisiana, USA, June 1999, pp.134-139.

60. Y. Shin, K. Choi, and T. Sakurai, Power Optimization of Real-Time Embedded Systems on Variable Speed Processors, International Conference on Computer-Aided Design (ICCAD’00), San Jose, California, USA, Nov. 2000, pp.365-368.

61. S. Shivel, H. J. Siegel, A. A. Maciejewski, P. Sugavanam, T. Banka, R. Castain, K. Chindam, S. Dussinger, P. Pichumani, P. Satyqsekaran, W. Saylor, D. Sendek, J. Sousa, J. Sridharan, and J. Velazco, Static Allocation of Resources to Communicating Subtasks in a Heterogeneous Ad Hoc Grid Environment, Journal of Parallel and Distributed Computing, 66(4), Apr. 2006, pp.600-611.

62. G. C. Sih and E. A. Lee, A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures, IEEE Transactions on Parallel and Distributed Systems, 4(2), Feb. 1993, pp.175-187.

http://news.zdnet.co.uk/hardware/0,1000000091,39116043,00.htm?r=1�

169

63. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, Reducing Power in High-Performance Microprocessors, Design Automation Conference (DAC’98), San Francisco, California, USA, June 1998, pp.732-737.

64. H. Topcuoglu, S. Hariri, and M. Wu, Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Transactions on Parallel and Distributed Systems, 13(3), Mar. 2002, pp.260-274.

65. D. Warrier, W. E. Wilhelm, J. S. Warren, I. V. Hicks, A Branch-and-Price Approach for the Maximum Weight Independent Set Problem, Networks, 46(4), Dec. 2005, pp. 198-209.

66. M. Weiser, B. Welch, A. Demers, and S. Shenker, Scheduling for Reduced CPU Energy, USENIX Conference on Operating Systems Design and Implementation, Monterey, CA, USA, Nov. 1994, pp.13-23.

67. S. Williams, L. Oliker, R. Vuduc, K. Yelick, J. Demmel, and J. Shalf, Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms, Supercomputing (SC’07), Reno, NV, USA, Nov. 2007, pp.38.

68. W. Wolf, The Future of Multiprocessor Systems-on-Chips, Design Automation Conference (DAC’04), San Diego, CA, USA, June 2004, pp.681-685.

69. M. Y. Wu and D. D. Gajski, Hypertool: A Programming Aid for Message-Passing Systems, IEEE Transactions on Parallel and Distributed Systems, 1(3), July 1990, pp.330-343.

70. C. Yang, J. Chen, T. Kuo, An Approximation Algorithm for Energy-Efficient Scheduling on A Chip Multiprocessor, Design, Automation, and Test in Europe Conference (DATE’05), Munich, Germany, Mar. 2005, pp.468-473.

71. T. Yang and A. Gerasoulis, DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors, IEEE Transactions on Parallel and Distributed Systems, 5(9), Sept. 1994, pp.951-967.

72. R. Yao, A. Demers, and S. Shenker, A Scheduling Model for Reduced CPU Energy, IEEE Symposium on Foundations of Computer Science (FOCS’95), Milwaukee, Wisconsin, USA, Oct. 1995, pp.374-382.

73. Y. Yu and V. K. Prasanna, Resource Allocation for Independent Real-Time Tasks in Heterogeneous Systems for Energy Minimization, Journal of Information Science and Engineering, 19(3), May 2003, pp.433-449.

74. Y. Yu and V. K. Prasanna, Energy-Balanced Task Allocation for Collaborative Processing in Wireless Sensor Networks, Mobile Networks and Applications, 10(1-2), Feb. 2005, pp.115-131.

170

75. Y. Zhang, X. (Sharon) Hu, and D. Z. Chen, Task Scheduling and Voltage Selection for Energy Minimization, Design Automation Conference (DAC’02), New Orleans, Louisiana, USA, June 2002, pp.183-188.

76. D. Zhu, R. Melhem, and B. R. Childers, Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 14(7), July 2003, pp.686-700.

77. D. Zhu, D. Mossé, and R. Melhem, Power-Aware Scheduling for AND/OR Graphs in Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 15(9), Sept. 2004, pp.849-864.

78. J. Zhuo and C. Chakrabarti, An Efficient Dynamic Task Scheduling Algorithm for Battery Powered DVS Systems, Asian South Pacific Design Automation Conference (ASP-DAC’05), Shanghai, China, Jan. 2005, pp.846-849.

79. J. Zhuo and C. Chakrabarti, System-Level Energy-Efficient Dynamic Task Scheduling, Design Automation Conference (DAC’05), San Diego, California, USA, June 2005, pp.628-631.

BIOGRAPHICAL SKETCH

Jaeyeon Kang obtained her Master of Science in computer science from University of

Southern California in 2002. She obtained her Master of Science and Bachelor of Science

degrees in electrical and computer engineering from Sungkyunkwan University, Korea in 1997

and 1999 respectively.

© 2008 Jaeyeon Kangufdcimages.uflib.ufl.edu/UF/E0/02/26/03/00001/kang_j.pdf · true researcher. I...

Documents

Transcript of © 2008 Jaeyeon Kangufdcimages.uflib.ufl.edu/UF/E0/02/26/03/00001/kang_j.pdf · true researcher. I...