© 2008 Jaeyeon Kangufdcimages.uflib.ufl.edu/UF/E0/02/26/03/00001/kang_j.pdf · true researcher. I...
Transcript of © 2008 Jaeyeon Kangufdcimages.uflib.ufl.edu/UF/E0/02/26/03/00001/kang_j.pdf · true researcher. I...
1
SCHEDULING ALGORITHMS FOR ENERGY MINIMIZATION
By
JAEYEON KANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2008
2
© 2008 Jaeyeon Kang
3
To my loved Mom and Dad
4
ACKNOWLEDGMENTS
First and foremost, I would like to thank my advisor, Sanjay Ranka, for his constant
support and guidance. He taught me passion, patience, and devotion which are necessary for a
true researcher. I would also like to thank my co-advisor, Sartaj Sahni, for his helpful advice and
guidance. He taught me thoroughness and attitude towards research and helped me think more
broadly about my work. My grateful thanks also go to my committee members, Jih-Kwon Peir,
Jose Fortes, and Paul Avery, for their valuable insights and comments.
I am grateful to all my colleagues for being my good friends and collaborators. They have
been very helpful and supportive academically and personally. They have made my journey
memorable. I wish to give special thanks to all of my friends in Korea for listening to me and
making me peaceful.
Finally, none of this would have happened without the full support of my loved family. I
would like to thank my mom who is in heaven, for always believing in me and supporting me.
She was the right person who helped me overcome a lot of difficulties throughout my PhD
program. I would like to thank my dad for motivating me to start this journey and encouraging
me to continue it. He has served as an excellent role model in my life. I would also like to thank
my brothers for their sincere support and encouragement. My deepest gratitude goes to my loved
husband, Hyuckchul, for being with me. Words are not enough to express my gratitude for
everything he has done for me. I love him and hope that I will be there for him when he needs
me. And, I thank my eight-month-old daughter, Katherine (Hyunseung), for coming to me. She
has made me the happiest person in the whole world. I love her and promise that I will always be
on her side.
5
TABLE OF CONTENTS page
ACKNOWLEDGMENTS ...............................................................................................................4
LIST OF TABLES...........................................................................................................................9
LIST OF FIGURES .......................................................................................................................11
ABSTRACT...................................................................................................................................15
CHAPTER
1 INTRODUCTION ..................................................................................................................16
1.1 Introduction...................................................................................................................16 1.2 Preliminaries .................................................................................................................18
1.2.1 Energy Model....................................................................................................18 1.2.2 Application Model ............................................................................................19 1.2.3 Dynamic Environments.....................................................................................20
1.2.3.1 Overestimation....................................................................................20 1.2.3.2 Underestimation..................................................................................20
1.3 Scheduling for Energy Minimization............................................................................21 1.3.1 Static Assignment .............................................................................................21
1.3.1.1 Assignment to minimize total finish time...........................................21 1.3.1.2 Assignment to minimize total energy consumption ...........................22
1.3.2 Static Slack Allocation......................................................................................22 1.3.3 Dynamic Assignment ........................................................................................22 1.3.4 Dynamic Slack Allocation ................................................................................22
1.4 Contributions.................................................................................................................24 1.4.1 Static Assignment to Minimize Total Finish Time:..........................................24 1.4.2 Static Assignment to Minimize Total Energy Consumption ............................25 1.4.3 Static Slack Allocation to Minimize Total Energy Consumption.....................25 1.4.4 Dynamic Slack Allocation to Minimize Total Energy Consumption ...............26 1.4.5 Dynamic Assignment to Minimize Total Energy Consumption.......................27
1.5 Document Layout..........................................................................................................28
2 RELATED WORK.................................................................................................................29
2.1 Static Slack Allocation..................................................................................................31 2.1.1 Non-optimal Slack Allocation ..........................................................................31 2.1.2 Near-optimal Slack Allocation..........................................................................32
2.2 Dynamic Slack Allocation ............................................................................................34 2.3 Static Assignment .........................................................................................................34 2.4 Dynamic Assignment ....................................................................................................37
6
3 STATIC SLACK ALLOCATION .........................................................................................38
3.1 Proposed Slack Allocation ............................................................................................38 3.2 Unit Slack Allocation....................................................................................................41
3.2.1 Maximum Available Slack for a Task ..............................................................41 3.2.2 Compatible Task Matrix ...................................................................................42 3.2.3 Search Space Reduction....................................................................................44
3.2.3.1 Fully independent tasks ......................................................................45 3.2.3.2 Fully dependent tasks .........................................................................45 3.2.3.3 Compressible tasks .............................................................................45
3.2.4 Branch and Bound Search.................................................................................47 3.2.5 Estimating the Lower Bound to Reduce the Search Space ...............................49
3.3 Experimental Results ....................................................................................................50 3.3.1 Simulation Methodology...................................................................................51
3.3.1.1 The DAG generation...........................................................................51 3.3.1.2 Performance measures ........................................................................51
3.3.2 Memory Requirements......................................................................................52 3.3.3 Determining the Size of Unit Slack and the Number of Intervals ....................53 3.3.4 Homogeneous Environments ............................................................................55
3.3.4.1 Comparison of energy requirements...................................................55 3.3.4.2 Comparison of time requirements ......................................................59
3.3.5 Heterogeneous Environments ...........................................................................60 3.3.5.1 Comparison of energy requirements...................................................60 3.3.5.2 Comparison of time requirements ......................................................64
3.3.6 Effect of Search Space Reduction Techniques for PathDVS............................65
4 DYNAMIC SLACK ALLOCATION ....................................................................................68
4.1 Proposed Dynamic Slack Allocation ............................................................................69 4.1.1 Choosing a Subset of Tasks for Slack Reallocation .........................................71
4.1.1.1 Greedy approach.................................................................................72 4.1.1.2 The k time lookahead approach ..........................................................72 4.1.1.3 The k descendent lookahead approach ...............................................73
4.1.2 Time Range for Selected Tasks ........................................................................75 4.2 Experimental Results ....................................................................................................79
4.2.1 Simulation Methodology...................................................................................79 4.2.1.1 The DAG generation...........................................................................79 4.2.1.2 Dynamic environments generation .....................................................79 4.2.1.3 Performance measures ........................................................................80
4.2.2 Overestimation ..................................................................................................81 4.2.2.1 Comparison of energy requirements...................................................81 4.2.2.2 Comparison of time requirements ......................................................87
4.2.3 Underestimation ................................................................................................89 4.2.3.1 Comparison of deadline requirements ................................................90 4.2.3.2 Comparison of energy requirements...................................................95 4.2.3.3 Comparison of time requirements ....................................................100
7
5 STATIC ASSIGNMENT......................................................................................................102
5.1 Overall Scheduling Process.........................................................................................103 5.2 Proposed Static Assignment to Minimize Finish Time...............................................106
5.2.1 Task Selection .................................................................................................107 5.2.2 Processor Selection .........................................................................................108 5.2.3 Iterative Scheduling ........................................................................................109
5.3 Proposed Static Assignment to Minimize Energy ......................................................111 5.3.1 Task Prioritization...........................................................................................112 5.3.2 Estimated Deadline for a Task ........................................................................114 5.3.3 Processor Selection .........................................................................................115
5.3.3.1 Greedy approach for the computation of expected energy...............116 5.3.3.2 Example for assignment ...................................................................118
5.4 Experimental Results for Assignment Algorithms that Minimize Finish Time .........120 5.4.1 Simulation Methodology.................................................................................121
5.4.1.1 The DAG generation.........................................................................121 5.4.1.2 Performance measures ......................................................................121
5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms ...121 5.4.3 Comparison between CPS (Used in Prior Scheduling for Energy
Minimization) and ICP....................................................................................126 5.5 Experimental Results for Assignment Algorithms that Minimize Energy .................127
5.5.1 Simulation Methodology.................................................................................128 5.5.1.1 The DAG generation.........................................................................128 5.5.1.2 Performance measures ......................................................................128 5.5.1.3 Variations of our algorithms.............................................................129 5.5.1.4 Variations of GA based algorithms ..................................................130
5.5.2 DVS Schemes to Compute Expected Energy in Processor Selection Step.....131 5.5.3 Independence between Time and Energy Requirements ................................131
5.5.3.1 Comparison of energy requirements of proposed algorithms...........132 5.5.3.2 Comparison of energy requirements with GA based algorithms......134 5.5.3.3 Comparison of time requirements ....................................................139
5.5.4 Dependence between Time and Energy Requirements...................................141
6 DYNAMIC ASSIGNMENT ................................................................................................144
6.1 Proposed Dynamic Assignment ..................................................................................145 6.1.1 Choosing a Subset of Tasks for Rescheduling................................................146 6.1.2 Time Range for Selected Tasks ......................................................................147 6.1.3 Estimated Deadline and Energy ......................................................................149 6.1.4 Processor Selection .........................................................................................150
6.2 Experimental Results ..................................................................................................152 6.2.1 System Methodology ......................................................................................153
6.2.1.1 The DAG generation.........................................................................153 6.2.1.2 Dynamic environments generation ...................................................153 6.2.1.3 Performance measures ......................................................................154
6.2.2 Comparison of Energy Requirements .............................................................154 6.2.3 Comparison of Time Requirements ................................................................158
8
7 CONCLUSION AND FUTURE WORK .............................................................................160
7.1 Static Slack Allocation................................................................................................160 7.2 Dynamic Slack Allocation ..........................................................................................161 7.3 Static Assignment .......................................................................................................162 7.4 Dynamic Assignment ..................................................................................................162 7.5 Future Work ................................................................................................................163
LIST OF REFERENCES.............................................................................................................164
BIOGRAPHICAL SKETCH .......................................................................................................171
9
LIST OF TABLES
Table page 3-1 Results for 100 tasks in homogeneous environments: Improvement of PathDVS over
EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............55
3-2 Results for 200 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............56
3-3 Results for 300 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............56
3-4 Results for 400 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............57
3-5 Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in homogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS) .....................................................................58
3-6 Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous environments......................................................................................................................59
3-7 Results for 100 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............61
3-8 Results for 200 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............61
3-9 Results for 300 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............62
3-10 Results for 400 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)...............62
3-11 Normalized energy consumption of PathDVS and LPDVS with respect to different deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)......................................................63
10
3-12 Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous environments......................................................................................................................64
3-13 Number of tasks participating in search with respect to different number of tasks and processors...........................................................................................................................66
3-14 Depth of search tree with respect to different number of tasks and processors.................66
3-15 Number of nodes explored in search with respect to different number of tasks and processors...........................................................................................................................67
4-1 Normalized energy consumption of k time lookahead and k descendent lookahead algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension.............................................................83
4-2 Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished task rates and time increase rates for 0.05 deadline extension rate ................................................................................91
5-1 Results for 50 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................123
5-2 Results for 50 tasks and 8 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................123
5-3 Results for 50 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................124
5-4 Results for 100 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................124
5-5 Results for 100 tasks and 8 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage).......................................................................................................................125
5-6 Results for 100 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates for 100 tasks on 16 processors (unit: percentage) .................................................................................125
11
LIST OF FIGURES
Figure page 1-1 Example of DAG and assignment DAG............................................................................20
1-2 Overall process of scheduling for energy minimization ....................................................23
3-1 Example of a DAG and assignment on two processors.....................................................41
3-2 Compatible task matrix and lists for an example in Figure 1-1.........................................44
3-3 Compression of assignment DAG .....................................................................................47
3-4 Compression of compatible task lists ................................................................................47
3-5 Reduced compatible task lists and search graph................................................................49
3-6 Runtime of PathDVS with respect to different size of DAGs (unit: ms)...........................52
3-7 Normalized energy consumption of PathDVS with respect to different unit slack rates for different number of tasks .....................................................................................53
3-8 Normalized energy consumption of LPDVS with respect to different interval rates for different number of tasks..............................................................................................54
3-9 Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for different number of tasks .......................................58
3-10 Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in homogeneous environments (unit: ms) ................................59
3-11 Normalized energy consumption of slack allocation algorithms with respect to different deadline extension rates for different number of tasks in heterogeneous environments......................................................................................................................63
3-12 Runtime to execute algorithms with respect to different deadline extension rates for different number of tasks in heterogeneous environments (unit: ms)................................64
4-1 Tasks selected for slack reallocation in an assignment DAG depending on dynamic slack allocation algorithms ................................................................................................74
4-2 Overestimation: Time range for selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach ...............................................................78
4-3 Underestimation: Time range for selected slack allocable tasks using k-time lookahead approach and k-descendent lookahead approach..............................................78
12
4-4 Normalized energy consumption of Greedy, dPathDVS, and kallDescendent with respect to different early finished task rates and time decrease rates for no deadline extension ............................................................................................................................82
4-5 Normalized energy consumption for no deadline extension..............................................84
4-6 Normalized energy consumption for 0.01 deadline extension rate....................................85
4-7 Normalized energy consumption for 0.02 deadline extension rate....................................85
4-8 Normalized energy consumption for 0.05 deadline extension rate....................................86
4-9 Normalized energy consumption for 0.1 deadline extension rate......................................86
4-10 Normalized energy consumption for 0.2 deadline extension rate......................................87
4-11 Computational time to readjust the schedule from an early finished task with respect to different time decrease rates for no deadline extension (unit: ns - via logarithmic scale) ..................................................................................................................................88
4-12 Results for variable deadline extension rates: Computational time to readjust the schedule from one early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale)........................................................................................89
4-13 Deadline miss ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate ................................................................................90
4-14 Deadline miss ratio for no deadline extension...................................................................92
4-15 Deadline miss ratio for 0.01 deadline extension rate.........................................................93
4-16 Deadline miss ratio for 0.02 deadline extension rate.........................................................93
4-17 Deadline miss ratio for 0.05 deadline extension rate.........................................................94
4-18 Deadline miss ratio for 0.1 deadline extension rate...........................................................94
4-19 Deadline miss ratio for 0.2 deadline extension rate...........................................................95
4-20 Energy increase ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate .........................................................................96
4-21 Energy increase ratio for no deadline extension ................................................................97
4-22 Energy increase ratio for 0.01 deadline extension rate ......................................................97
4-23 Energy increase ratio for 0.02 deadline extension rate ......................................................98
4-24 Energy increase ratio for 0.05 deadline extension rate ......................................................98
13
4-25 Energy increase ratio for 0.1 deadline extension rate ........................................................99
4-26 Energy increase ratio for 0.2 deadline extension rate ........................................................99
4-27 Computational time to readjust the schedule from a late finished task with respect to different time increase rates for no deadline extension (unit: ns - via logarithmic scale) ................................................................................................................................100
4-28 Results for variable deadline extension rates: Computational time to readjust the schedule from one late finished task with respect to different time decrease rates (unit: ns – via logarithmic scale)......................................................................................101
5-1 A high level description of proposed scheduling approach .............................................105
5-2 The ICP procedure...........................................................................................................110
5-3 The DVSbasedAssignment procedure ..............................................................................117
5-4 Example of assignment to minimize finish time and assignment to minimize DVS based energy.....................................................................................................................120
5-5 Normalized energy consumption of ICP and CPS using PathDVS with respect to different deadline extension rates for different number of tasks and processors.............127
5-6 Comparison between optimal scheme and greedy scheme for processor selection of A0 for 50 tasks on 4 and 8 processors .............................................................................131
5-7 Results for 50 tasks: Normalized energy consumption of our algorithms with respect to variable deadline extension rates for different number of processors .........................132
5-8 Results for 100 tasks: Normalized energy consumption of our algorithms with respect to variable deadline extension rates for different number of processors.............133
5-9 Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) with respect to different number of processors for variable deadline extension rates (unit: percentage).......................................................................................................................134
5-10 Normalized energy consumption of GARandNonOptimal and our algorithms for different number of tasks and processors.........................................................................136
5-11 Normalized energy consumption of GARandOptimal and our algorithms for different number of tasks and processors .......................................................................................137
5-12 Normalized energy consumption of GASolNonOptimal and our algorithms with respect to different extension rates for different number of tasks and processors...........138
5-13 Normalized energy consumption of GASolNonOptimal and our algorithms .................138
5-14 Normalized energy consumption of GASolOptimal and our algorithms ........................139
14
5-15 Runtime to execute our algorithms with respect to variable deadline extension rates for different number of tasks (unit: ms)...........................................................................140
5-16 Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms – logarithmic scale) ..............140
5-17 Results for 4 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................142
5-18 Results for 8 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage).......................................................................................................................143
6-1 The DynamicDVSbasedAssignment procedure................................................................152
6-2 Results for 4 processors: Normalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks ..................................................................155
6-3 Results for 8 processors: Normalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks ..................................................................156
6-4 Results for 16 processors: Normalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks ..................................................................157
6-5 Results for 32 processors: Normalized energy consumption of StaticDVS, DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks ..................................................................158
6-6 Computational time to readjust the schedule from an early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale) ...................................159
15
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
SCHEDULING ALGORITHMS FOR ENERGY MINIMIZATION
By
Jaeyeon Kang
August 2008 Chair: Sanjay Ranka Cochair: Sartaj Sahni Major: Computer Engineering
Energy consumption is a critical issue in parallel and distributed embedded systems. We
present novel algorithms for energy efficient scheduling of DAG (Directed Acyclic Graph) based
applications on DVS (Dynamic Voltage Scaling) enabled systems. The proposed scheduling
algorithms mainly consist of assignment and slack allocation. All schemes for the assignment
and the slack allocation effectively minimize energy consumption while meeting the deadline
constraints in static or dynamic environments. They are also equally applicable to the
homogenous and heterogeneous parallel machines. Experimental results show that the proposed
algorithms provide significantly good performance for energy minimization and require
considerably small computational time.
16
CHAPTER 1 INTRODUCTION
1.1 Introduction
Computers use a significant and growing portion of the energy consumption. Roughly 8%
of the electricity in the US is now being consumed by computers [1]. A study by Dataquest [15]
reported that the world-wide total power dissipation of processors in PCs was 160MW in 1992,
and by 2001 it had grown to 9000MW. It is now widely recognized that power-aware computing
is no longer an issue confined to mobile and real-time computing environments, but is also
important for desktop and conventional computing as well. In particular, high-performance
parallel and distributed systems, data centers, supercomputers, clusters, embedded systems,
servers, and networks consume considerable amount of energy. In addition to expenses related to
energy consumption of computers, significant additional costs have to be borne for cooling the
facility. Thus reducing the energy requirements of executing an application is very important for
both large scale systems that consume considerable amount of energy and embedded systems
that utilize battery for their power.
More recently, industry and researchers are eyeing multi-core processors, which can attain
higher performance by running multiple threads in parallel [18, 19, 36, 39, 40, 58, 67, 68]. By
integrating multiple cores on a chip, designers hope to sustain performance growth while
depending less on raw circuit speed and decreasing the power requirements per unit of
performance. These workhorses of the next generation of supercomputers and wireless devices
are poised to alter the horizon of high-performance computing. However, proper scheduling and
allocation of applications on these architectures is required [17].
Most effective energy minimization techniques are based on Dynamic Voltage Scaling
(DVS). The DVS technique assigns differential voltages to each task to minimize energy
17
requirements of an application [20, 63, 66]. Assigning differential voltages is the same as
allocating additional time or slack to a task. This technique has been found to be a very effective
method for reducing energy in DVS enabled processors. Scheduling algorithms without DVS
technique such as Energy Aware Scheduling [27, 28] and several heuristics in [61] do not
perform as well in DVS-enabled systems.
There is considerable research on DVS scheduling algorithms for independent tasks in a
single processor real time system [3, 4, 5, 11, 12, 21, 23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52,
59, 60, 70, 72, 73, 76, 78, 79]. Recently, several DVS based algorithms for slack allocation have
been proposed for tasks with precedence relationships in a multiprocessor real time system [6,
13, 22, 29, 31, 45, 46, 47, 48, 51, 55, 56, 57, 75, 77]. The precedence relationships are
represented as a Directed Acyclic Graph (DAG) consisting of nodes that represent computations
and edges that represent the dependency between the nodes. DAGs have been shown to be
representative of a large number of applications.
We explore novel scheduling algorithms for DVS based energy minimization of DAG
based applications on parallel and distributed machines. The proposed schemes are equally
applicable to homogenous and heterogeneous parallel machines. The scheduling of DAG based
applications with the goal of DVS based energy minimization broadly consists of two steps:
assignment and slack allocation.
• Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level. Note that the finish time of DAG at the maximum voltage has to be less than or equal to the deadline for any feasible schedule.
• Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the total energy consumption is minimized while the DAG can execute within a given deadline.
18
A scheduling algorithm can be classified into static scheduling algorithm (i.e., offline
algorithm) and dynamic scheduling algorithm (i.e., online algorithm). The static scheduling
algorithms for DAG execution use the estimated execution time of tasks. However, the estimated
execution time (ET) of tasks may be different from their actual execution time (AET) at runtime.
The dynamic environments can be divided into two broad categories based on whether the actual
execution time is less than or more than the estimated time: overestimation (AET < ET) and
underestimation (AET > ET). These dynamic environments may either potentially give a chance
to minimize energy requirements more or make deadline constraints missed. The dynamic
scheduling algorithms address these problems at runtime with the goals of minimizing energy
consumption and satisfying deadline constraints.
In this thesis, we present novel scheduling algorithms for energy minimization in both
static and dynamic environments. The algorithms can be mainly divided into four categories:
static slack allocation, dynamic slack allocation, static assignment, and dynamic assignment.
Algorithms for each of the four categories will be presented in Chapter 3, 4, 5 and 6,
respectively.
1.2 Preliminaries
In this section, we briefly describe the energy model, the application model, and the
dynamic environments used in this thesis.
1.2.1 Energy Model
The Dynamic Voltage Scaling (DVS) technique reduces the dynamic power dissipation by
dynamically scaling the supply voltage and the clock frequency of processors. The power
dissipation, Pd, is represented by fVCP ddefd ⋅⋅= 2 , where Cef is the switched capacitance, Vdd is
the supply voltage, and f is the operating frequency [9, 10]. The relationship between the supply
19
voltage and the frequency is represented by ( ) ddtdd VVVkf /2−⋅= , where k is the constant of
circuit and Vt is the threshold voltage. The energy consumed to execute task τi, Ei, is expressed
by iddefi cVCE ⋅⋅= 2 , where ci is the number of cycles to execute the task. The supply voltage can
be reduced by decreasing the processor speed. It also reduces energy consumption of task. Here
we use the task’s execution time at the maximum supply voltage during assignment to guarantee
deadline constraints, given as max/ fccompTime ii = .
1.2.2 Application Model
The Directed Acyclic Graph (DAG) represents the workflow among tasks. In a DAG
shown in Figure 1-1 (a), a node represents a task and a directed edge between nodes represents
the precedence relationship of tasks. Given a DAG, the assignment of tasks in a DAG to their
appropriate processors in a parallel architecture will be done through an assignment algorithm.
Figure 1-1 (b) depicts the assignment for the DAG of Figure 1-1 (a). The assignment is various
depending on mapping methods while it satisfies a given deadline of the DAG. Figure 1-1 (c)
represents the assignment DAG, which is the direct workflow among tasks generated after the
assignment. The direct precedence relationship of tasks may change from one in an original
DAG depending on the given assignment. For instance, task τ1 and task τ4 have a direct
dependency in the original DAG, but, in the assignment DAG, they have no direct dependency.
Furthermore, if task τ2 finishes at time 5, task τ5 has no more direct dependency with task τ2
while the dependency is indirectly presented in the assignment DAG. And, there may be
additional dependencies in the assignment DAG due to scheduling constraints within a
processor. For example, task τ3 and task τ4 have a dependency relationship in the assignment
DAG.
20
Figure 1-1. Example of DAG and assignment DAG: (a) DAG, (b) Assignment on two processors, (c) Assignment DAG
1.2.3 Dynamic Environments
The actual execution time (AET) of tasks may be different from their estimated execution
time (ET) used in static scheduling. We divide the tasks into two broad categories based on
whether the actual execution time is less than or more than the estimated time: overestimation
(i.e., AET < ET) and underestimation (i.e., AET > ET).
1.2.3.1 Overestimation
For most real time applications, an upper worst case bound on the actual execution time of
each task is used to guarantee that the application completes in a given time bound. Many such
tasks may complete earlier than expected during the actual execution. Also when historical data
is used to estimate the time requirements, the actual execution time of each task may be less than
its estimated execution time. This allows for dependent tasks to potentially begin at an earlier
time than what was envisioned during the static scheduling. The extra available slack can then be
allocated to tasks that have not yet begun execution with the goal of reducing the total energy
requirements while still meeting the deadline constraints.
1.2.3.2 Underestimation
For many applications that do not use the worst case execution time for estimation, the
actual execution time of a task may be larger than its estimated execution time. In this case, it
0 1 2 3 4 5 6 7 8 9 10 11 12
P0
1
2 3 4
5 6
7 deadline
1 2
3
5
4 6 7
7
1
2 3
5 4
6
P1
(a) (b) (c)
21
cannot be guaranteed that the deadline constraints will be always satisfied. However, slack can
be removed from future tasks with the hope of satisfying the deadline constraints as closely as
possible while trying to keep energy reduction.
1.3 Scheduling for Energy Minimization
Figure 1-2 shows the overall process of scheduling algorithm for energy minimization. The
following four step process for scheduling tasks in a DAG for energy minimization is broadly
required:
• Static assignment • Static slack allocation • Dynamic assignment • Dynamic slack allocation 1.3.1 Static Assignment
The static assignment process determines the ordering to execute tasks and the mapping of
tasks to processors based on the computation time at the maximum voltage level. The schedule
generated from this process is not completed because there may be slack until the deadline. The
assignment is performed by two different methods: assignment to minimize total finish time and
assignment to minimize total energy consumption.
1.3.1.1 Assignment to minimize total finish time
The assignment is performed in order to minimize total finish time of a DAG. The deadline
has to be greater than or equal to the total finish time for a feasible solution. An important side
effect of minimizing the total finish time is that for a given deadline, the total amount of
available slack is increased. In general, higher slack should lead to lower energy after the
application of slack allocation algorithms.
22
1.3.1.2 Assignment to minimize total energy consumption
The assignment is performed in order to minimize total energy consumption after slack
allocation (i.e., DVS based energy) while still meeting the deadline constraints. It can be done by
considering the energy consumption while determining the execution ordering of tasks and
expected energy after slack allocation while mapping tasks to processors. In general,
incorporating energy minimization during the assignment process should lead to better
performance in terms of reducing energy requirements.
1.3.2 Static Slack Allocation
The static slack allocation process allocates slack to tasks to minimize energy consumption
while meeting deadline constraints at compile time. The initial static schedule is generated after
static assignment and static slack allocation (i.e., static scheduling). The problem of slack
allocation can be posed as the following: Allocate a variable amount of slack to each task so that
the total energy consumption is minimized while the deadlines are met.
1.3.3 Dynamic Assignment
The dynamic assignment process reassigns tasks to processors whenever a task finishes
earlier or later than expected based on the current schedule (i.e., the initial static schedule or the
previous schedule updated at runtime) at runtime. The reassignment is performed to minimize
DVS based energy. However, if the deadline constraints are not satisfied, the reassignment is
ignored and the current assignment is kept. Once the reassignment is determined, slack is
reallocated to tasks (i.e., dynamic slack allocation) to minimize energy consumption while still
meeting the deadline constraints.
1.3.4 Dynamic Slack Allocation
The dynamic slack allocation process reallocates slack to tasks whenever a task finishes
earlier or later than expected based on the current schedule (i.e., the initial static schedule or the
23
previous schedule updated at runtime) at runtime. The current schedule is initialized to the static
schedule and updated whenever dynamic scheduling is applied from the occurrence of early or
late finished tasks at runtime. The assignment is not changed during slack reallocation. The main
goal of dynamic slack allocation algorithm is slightly different depending on dynamic
environments (i.e., whether the estimated execution time of a task is overestimated or
underestimated). For overestimation, the dynamic slack allocation algorithm minimizes energy
consumption while guaranteeing that the deadline constraints are always met. For
underestimation, it tries to reduce the possibility of the DAG not completing by the required
deadline while trying to keep energy reduction.
Figure 1-2. Overall process of scheduling for energy minimization
Static Scheduling
Static Assignment
Static Slack Allocation
runtime Dynamic Scheduling
Dynamic Assignment
Dynamic Slack Allocation
24
1.4 Contributions
In this section, we present the main contributions of the proposed scheduling algorithms
presented in this thesis.
1.4.1 Static Assignment to Minimize Total Finish Time:
While most of prior research on the scheduling for energy minimization of DAGs has not
concentrated on the assignment process, we show that the assignment itself is very important to
minimize energy requirements as much as the slack allocation process. In general, minimizing
the time (i.e., scheduling length of a DAG) and minimizing the energy are referred to as
conflicting goals. However, when using DVS techniques under a specified deadline, we show
that minimizing total finish time can lead to better energy requirements due to the increase of
total amount of available slack. The main features of the proposed static assignment algorithm
minimizing finish time are as follows:
• Assign multiple independent ready tasks simultaneously: The computation of priority of a task depends on estimating the execution path from this task to the last task of the DAG representing the workflow. Since the mapping of tasks yet to be scheduled is unknown and the cost of task execution depends on the processor that is assigned, the priority has to be approximated during scheduling. Hence, it is difficult to explicitly distinguish the execution order of tasks with similar priorities. Using this intuition, the proposed algorithm forms independent ready tasks whose priorities are similar into a group and finds an optimal solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready tasks that can be assigned consists of tasks for which all the predecessors have already been assigned.
• Iteratively refine the scheduling: The scheduling is iteratively refined by using the cost of the critical path based on the assignment generated in the previous iteration. Here the critical path is defined by the length of the longest path from a task to an exit task and it is used to determine the priority of the task. Assuming that the mappings of the previous iteration are good, it provides a better estimate of the cost of the critical path than using the average or median computation and communication time as the estimate in the first iteration.
25
1.4.2 Static Assignment to Minimize Total Energy Consumption
Most of the prior research on the scheduling for energy minimization of DAGs is based on
a simple list based assignment algorithm. The assignment that minimizes total finish time may be
a reasonable approach as minimizing time generally leads to more slack to be allocated and
finally reducing the energy requirements during the slack allocation step. However, this approach
cannot incorporate the differential energy and time requirements of each task of the workflow on
different processors. Our assignment algorithms mitigate the problem by considering the
expected effect of slack allocation during the assignment process. They significantly outperform
other existing algorithms in terms of energy consumption. Furthermore, they require small
computational time. The main features of the proposed static assignment algorithms minimizing
energy consumption are as follows:
• Utilize expected DVS based energy information during assignment: Our algorithm assigns the appropriate processor for each task such that the total energy expected after slack allocation is minimized. The expected energy after slack allocation (i.e., expected DVS based energy) for each task is computed by using the estimated deadline for each task so that the overall DAG can be executed within the deadline of the DAG. This leads to good performance in terms of energy minimization.
• Consider multiple task prioritizations: We test multiple assignments using multiple task prioritizations based on tradeoffs between energy and time for each task. This leads to good performance in terms of energy minimization. Furthermore, the execution of these assignments can be potentially done in parallel to minimize the computational time (i.e., runtime overhead).
1.4.3 Static Slack Allocation to Minimize Total Energy Consumption
The proposed scheduling algorithm, Path based DVS algorithm, finds the best task set that
can efficiently use unit slack for minimizing energy consumption. It incorporates assignment
based dependency relationships among tasks as well as different energy profiles of tasks on
different processors. It provides near optimal solutions for energy minimization with
considerably smaller computational time and memory requirements as compared to an existing
26
algorithm that provides near optimal solutions (i.e., linear programming based approach). The
main features of the proposed static slack allocation algorithm are as follows, in particular, in a
perspective of requiring small computation time:
• Utilize compatible task matrix: The compatible task matrix represents the list of tasks which can share unit slack (i.e., a minimum indivisible unit slack) together for each task. The matrix is composed based on the following two characteristics: First, each assignment-based path which consists of tasks with precedence relationships in an assignment DAG cannot have more than one unit slack. Second, this unit slack cannot be allocated to more than one task on each assignment-based path. Using the matrix, the branch and bound search method can be efficiently applied.
• Apply search space reduction techniques: In general, the branch and bound search method requires large computational time. Thus, to reduce the search space (which reduces the computational time as a result), we check whether each task is a divide a fully independent task, a fully dependent task, or a compressible task. Here only one representative of compressible tasks participates in the search. It dramatically reduces the search space while not reducing the quality of energy performance.
1.4.4 Dynamic Slack Allocation to Minimize Total Energy Consumption
Prior dynamic slack allocation algorithms for DAGs are based on using a simple greedy
approach that allocates the slack to the next ready task on the same processor where the task that
completes earlier than expected was executed. This slack forwarding based approach, although
fast, is shown not to perform well in our experiments in terms of energy reduction. A simple
option for adjusting slack at runtime is to reapply the static slack allocation algorithms for the
unexecuted tasks when a task finishes early or late. It can be expected to be close to the best that
can be achieved for energy minimization, particularly when applying near optimal static slack
allocation algorithms. However, the time requirements of static algorithms are large and they
may not be practical for many runtime scenarios. The proposed dynamic slack allocation
algorithms effectively reallocate the slack to unexecuted tasks to reduce more energy and/or
meet a given deadline at runtime. They are comparable to static algorithms applied at runtime in
terms of reducing energy and/or meeting a given deadline, but require considerably smaller
27
computational time. Also, they are effective for cases when the estimated execution time of tasks
is underestimated or overestimated. The main features of the proposed dynamic slack allocation
algorithms are as follows:
• Select the subset of tasks for slack reallocation: The potentially rescheduled tasks via the dynamic slack allocation algorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic slack allocation (i.e., rescheduling) is applied to the subset of tasks that depends on the algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead of reallocating the slack during runtime. Clearly, this should be done so that the other goal of energy reduction is also met simultaneously.
• Determine the time range for the selected tasks: The time range of the selected tasks has to be changed as some of the tasks have completed earlier or later than expected. Based on the computation time in the current schedule and assignment-based dependency relationships among tasks, we recompute the time range (i.e., earliest start time and latest finish time) where the selected tasks should be executed. Slack is allocated to the selected tasks within this time range in order to try to meet the deadline constraints.
1.4.5 Dynamic Assignment to Minimize Total Energy Consumption
There is very little research on the dynamic scheduling for DAGs with the goal of energy
minimization. We have shown that reallocating the slack at runtime (i.e., dynamic slack
allocation) leads to better energy minimization. However, it may not be enough to improve
energy requirements at runtime. We show that reassignment of tasks along with reallocation of
slack during runtime can lead to better performance in terms of energy minimization as
compared to only reallocating the slack at runtime. For an approach that is effective and useful at
runtime, its computational time (i.e., runtime overhead) is also small. The main features of the
proposed dynamic assignment algorithm are as follows:
• Select the subset of tasks for reassignment: Like in dynamic slack allocation, the potentially rescheduled tasks via the dynamic assignment algorithm are tasks which have not yet started when the algorithm is applied. We assume that the voltage can be selected before a task starts executing. The dynamic reassignment is applied to the subset of tasks among the tasks. The tasks considered for rescheduling are limited in order to minimize the overhead of reassigning processors during runtime.
28
• Determine the time range for the selected tasks: The time range of the selected tasks has to be determined in order to meet the deadline constraints. Based on the computation time in the current schedule and assignment-based dependency relationships among tasks, we recompute the time range where the selected tasks should be executed. While the time range is defined for the selected tasks given an assignment in the dynamic slack reallocation (i.e., earliest start time and latest finish time for the selected tasks on their assigned processors), for reassignment, it is defined over each processor for the selected tasks (i.e., available earliest start time and latest finish time for the selected tasks on each processor). The reassignment for the selected tasks is performed within this determined time range.
• Utilize expected DVS based energy information during reassignment: Our algorithm reassigns the appropriate processor for each selected task such that the total energy expected after slack allocation is minimized. The expected DVS based energy for each selected task is computed by using the estimated deadline for each task so that the selected tasks can be executed within the time range. This leads to good performance in terms of energy minimization while meeting deadline constraints.
1.5 Document Layout
The remainder of this document is organized as follows. Chapter 2 presents the related
work on scheduling for energy minimization. Chapter 3 presents the static slack allocation
algorithm to minimize total energy consumption under the deadline constrains. Chapter 4
presents the dynamic slack allocation to minimize total energy consumption under the deadline
constraints at runtime. Chapter 5 presents the static assignment algorithms. Chapter 6 presents
the dynamic assignment algorithm to minimize total energy consumption. In Chapter 7,
conclusion and future work are described.
29
CHAPTER 2 RELATED WORK
There has been significant interest in the development of energy aware scheduling
algorithms has been actively conducted as the energy is important in many systems. The energy
aware scheduling algorithms can be divided depending on their goal: scheduling to minimize
overall energy consumption, scheduling to balance energy consumption for each processor, and
so on. The scheduling with the goal of balancing energy is usually applicable in wireless sensor
networks [74]. For most other cases, the scheduling is done with the goal of energy minimization
and is the focus of this dissertation.
The scheduling algorithms for energy minimization can be broadly divided depending on:
• Whether Dynamic Voltage Scaling (DVS) technique is used or not? • Whether it is for independent tasks or dependent tasks (i.e., tasks with precedence
relationships)? • Whether it is for single processor systems or multiprocessor systems? • Whether it is for homogeneous systems or heterogeneous systems? • Whether it is applied at compile time or runtime?
In the following, we briefly describe the current work that addresses the above issues.
Several algorithms have developed to minimize energy consumption without DVS
technique [27, 28, 61]. However, they do not perform well in DVS-enabled systems. Also, the
DVS technique has been found to be a very effective method for reducing energy in DVS
enabled processors. The proposed scheduling algorithms in this thesis focus on the DVS
technique.
The scheduling algorithm for energy minimization can be divided depending on the
characteristics of tasks consisting of target applications: scheduling for independent tasks and
scheduling for dependent tasks (i.e., tasks with precedence relationships). The precedence
relationships are represented as a Directed Acyclic Graph (DAG) consisting of nodes that
30
represent computations and edges that represent the dependency between the nodes. There is
considerable research on DVS scheduling algorithms for independent tasks [3, 4, 5, 11, 12, 21,
23, 25, 26, 33, 34, 35, 38, 43, 49, 50, 52, 59, 60, 70, 72, 73, 76, 78, 79]. However, many
applications are represented by DAG. The proposed scheduling algorithms in this thesis are
focused on DAG based applications.
The scheduling algorithms for energy minimization can be also be categorized based on
whether the target system is a single processor system or a multiprocessor system. There is
considerable research on DVS scheduling algorithms in a single processor real time system [3, 4,
5, 25, 49, 50, 52, 72]. However, in practice, a multiprocessor real time system is used to execute
many applications. The proposed scheduling algorithms in this thesis focus on a multiprocessor
system. In addition, the multiprocessor system can be divided into a homogeneous
multiprocessor system and a heterogeneous multiprocessor system. While several prior
scheduling algorithms in a multiprocessor system can only apply for a homogeneous system, the
proposed scheduling algorithms in this thesis are applicable for both homogeneous and
heterogeneous systems.
Finally, the scheduling algorithms for energy minimization can be also divided depending
on whether it is applied at compile time (i.e., static algorithms) or at runtime (i.e., dynamic
algorithms). Several runtime approaches have been studied in the literature [4, 5, 21, 23, 33, 35,
47, 51, 59, 60, 76, 77, 78, 79 ]. However, most of these approaches have been developed for
independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. The proposed scheduling algorithms in
this thesis focus on the dynamic algorithms for DAG based applications in a multiprocessor
system as well as the static algorithms.
31
As described in Chapter 1, the scheduling algorithm for energy minimization broadly
consists of two steps: assignment and then slack allocation. Most of the prior research on the
scheduling for energy minimization of DAGs on parallel machines has not focused on the
assignment process, but more on the slack allocation process. However, the assignment process
is very important to minimize energy consumption in addition to the slack allocation process.
The proposed scheduling algorithms in this thesis focus on both the assignment algorithms and
the slack allocation algorithms for energy minimization.
In the following sections, we present related work for static slack allocation, dynamic slack
allocation, static assignment, and dynamic assignment, for the scheduling of DAG based
applications on homogenous and heterogeneous parallel processors respectively, in detail.
2.1 Static Slack Allocation
There is considerable research on DVS scheduling algorithms for independent tasks [3, 11,
12, 25, 26, 34, 38, 43, 49, 50, 52, 59, 70, 72, 73]. Recently, several DVS based algorithms for
slack allocation have been proposed for tasks with precedence relationships in a multiprocessor
real time system [6, 13, 22, 29, 45, 46, 48, 55, 57, 75]. The slack allocation algorithms (i.e., DVS
scheme) can be mainly divided into two categories: non-optimal slack allocation and near-
optimal slack allocation.
2.1.1 Non-optimal Slack Allocation
The slack is greedily allocated to tasks based on decreasing or increasing order of their
finish time [13], or allocated evenly to all possible tasks [45]. In [22], the scheduling algorithm
iteratively assigns slack based on dynamic recalculation of priorities. The algorithms in [13, 22,
45] ignore the various energy profiles of tasks on different processors during slack allocation and
lead to poor energy reduction. Using these energy profiles can lead to reduction in potential
32
energy saving [48, 55]. The static slack allocation algorithms described in [48, 55] work as
follows:
• Divide the total slack available into an equal partition called “unit slack”
• Iteratively execute the following till all the available slack is used: Allocate the unit slack to a task(s) that leads to maximum reduction in energy
However, because of the dependency relationships among tasks in an assignment, the sum of
energy reduction of several tasks (i.e., tasks executed in parallel) may be higher than the highest
energy reduction of a single task(s). In this case, the allocation of slack to a single task(s) with
the highest energy reduction one at a time as used in [48, 55] leads to suboptimal slack
allocation. Our scheme effectively exploits this fact to determine a set of multiple independent
tasks which cumulatively have the maximum energy reduction.
2.1.2 Near-optimal Slack Allocation
As a near-optimal slack allocation algorithm, Linear Programming (LP) based approach
has been developed [75]. The formulation in [75] for the continuous voltage case is formulated
as Linear Programming (LP) problem where the objective is the minimization of total energy
consumption. The constraints include deadline constraints for each task and the relationships
among tasks from an original DAG and the relationships among tasks on the same processor
after assignment. Since the formulation in [75] does not consider the communication time among
tasks, we extend the version by considering the communication time when representing
precedence relationships among tasks. The linear based formulation for the continuous voltage
case is as follows:
33
( )
iii
ii
iiii
sourcesink
jijiiiijij
Γii
compTimedeadlinestartTimecompTimex
deadlinexstartTimedeadlinestartTimestartTime
pPredpredwherexcommTimestartTimestartTime
xf
−≤≤≤≤
Γ∈∀≤+≤−
∈∈∀≥−−−
∑∈
0 0
,
, , ,0 subject to
Minimize
τ
τττ
where xi is the computation time of task τi that can be slowed, f(xi) is the energy model
depending on computation time, startTimei is the start time of task τi on its assigned processor,
predi is the set of direct predecessors of task τi in a DAG, pPredi is the task assigned prior to task
τi on the same assigned processor, compTimei is the computation time of task τi on its assigned
processor, and commTimeij is the communication time between task τi and task τj on their
assigned processors. The sink and source nodes are dummy nodes representing the start and end
of a DAG, respectively. Their computation time and communication time connected to both
nodes are zero.
The function f(x), in general is a nonlinear function. As an effective approximation, the
convex objective function that minimizes energy can be formulated as a piecewise linear
function. The accuracy of this approximation increases with the larger number of intervals (or
smaller length of intervals). This effectively leads to choices that are more energy efficient.
Convex optimization problems for the target application with linear constraints and objective
function that is the sum of convex function of independent variables can be solved in polynomial
time [2, 24, 37]. This is based on using a piecewise linear approximation of the energy functions
for each variable. In [24], the number of intervals for the piecewise linear function is
proportional to 8n (In our case, n will be the number of tasks). This processor has to be repeated
multiple times to achieve the required level of accuracy. In practice, we found that significantly
34
less number of intervals and a single iteration is sufficient to achieve acceptable level of
accuracy (i.e. level after which the reduction in energy plateaus).
The LP based algorithm provides near optimal solutions but requires much time and
memory requirements. Our scheme addresses the problems (i.e., time and memory) by
combining compatible task matrix, search space reduction techniques, and lower bound while
providing near optimal solutions.
2.2 Dynamic Slack Allocation
Several runtime approaches for slack allocation have been studied in the literature [4, 5,
21, 23, 33, 35, 47, 51, 59, 60, 76, 77, 78, 79]. Most of these approaches have been developed for
independent tasks [4, 5, 21, 23, 33, 35, 59, 60, 76, 79]. For tasks with precedence relationships in
a multiprocessor real time system, the algorithm in [51] uses greedy technique (i.e., slack
forwarding) that allocates the generated slack to the next ready task on the same processor where
an early finished task was executed. Although the time requirement of the greedy approach is
small, the performance in terms of reducing energy is significantly lower than applying the static
methods at runtime. Our methods show that the use of more intelligent methods can lead to
improved reduction in energy requirements.
2.3 Static Assignment
The assignment algorithms used in the scheduling for energy minimization can be mainly
classified into the following two broad categories: assignment to minimize finish time and
assignment to minimize energy.
• Assignment to minimize finish time: The goal of this assignment is to minimize total finish time of a DAG. If the deadline constraints are met, appropriate slack is allocated in the second phase to tasks to minimize energy.
• Assignment to minimize energy: This method tries to make assignments that lead to lower energy (before slack allocation) but may not meet deadline constraints. Furthermore, even if they minimize total energy consumption before slack allocation, they may not minimize
35
the energy consumption after slack allocation. This is because the energy after slack allocation depends on the execution time, available slack, and energy profiles of the tasks.
Most prior scheduling algorithms for energy minimization use simple list assignment
algorithms. Parallel computing literature consists of a variety of algorithms that minimize the
finish time of DAG on a parallel machine. Prior research on task scheduling in DAGs to
minimize total finish time has mainly focused on algorithms for a homogeneous environment
[16, 41, 42, 54, 69, 71]. Scheduling algorithms such as Dynamic Critical Path (DCP) algorithm
[41] that give good performance in a homogeneous environment may not be efficient for a
heterogeneous environment as the computation time of a task may be dependent on the processor
to which the task is mapped. Several scheduling algorithms for a heterogeneous environment
have been recently proposed [8, 32, 44, 62, 64]. Most of them are based on static list scheduling
heuristics to minimize the finish time of DAGs, for example, Dynamic Level Scheduling (DLS)
[62], Heterogeneous Earliest Finish Time (HEFT) [64], and Iterative List Scheduling (ILS) [44].
The DLS algorithm selects a task to schedule and a processor where the task will be executed at
each step. It has two features that can have an adverse impact on its performance. First, it uses
the earliest start time to select a processor for a task to be scheduled. This may not be effective
for a heterogeneous environment as the completion of the task may depend on the processor
where the task is assigned. Second, it uses the average of computation time across all the
processors for a given task to determine a critical task. This can cause an inaccurate estimation of
task’s priority.
The HEFT algorithm reduces the cost of scheduling by using pre-calculated priorities of
tasks in scheduling and uses the earliest finish time for the selection of a processor. This can, in
general, provide better performance as compared to the DLS algorithm. However, since the
algorithm uses the average of computation time across all the processors for a given task to
36
determine tasks’ priorities, it may lead an inaccurate ordering for executing tasks. To address the
problem, the ILS algorithm generates an initial schedule by using HEFT and iteratively improves
it by updating priorities of tasks. While it has been shown to have good performance [44], we
show that the determination of task’s priority can be improved by using group based assignment.
This is because the calculated priorities of tasks have a degree of inaccuracy on a heterogeneous
environment as the assignment of future tasks is unknown.
Most of existing algorithms for energy minimization are based on one execution of
assignment and slack allocation. To improve performance in terms of energy, an iterative
execution of assignment and slack allocation based on genetic algorithms or simulated annealing
has been proposed. They are based on trying out several assignments (or iteratively refining the
assignment). Each assignment is followed by a slack allocation algorithm to determine the
energy requirements. The Genetic Algorithm (GA) based approach in [56, 57] consists of two
interleaved steps:
• Processor selection for tasks based on GA
• For each processor selection, derive the best scheduling which includes the execution ordering of tasks using another GA
Each GA evolves the solutions via two point crossover and mutation from randomly
generated initial solutions and explores the large search space for getting better solution. Given
each schedule from the processor selection and the task ordering, a DVS based slack allocation
scheme is applied. This approach was shown to outperform existing algorithms in terms of
energy consumption based on their experimental results. However, the assignment itself still
does not consider the energy consumption after slack allocation. Also, the testing of energy
requirements of multiple solutions each corresponding to a different assignment requires
considerable computational time.
37
2.4 Dynamic Assignment
There is little research on the dynamic scheduling for DAGs with the goal of energy
minimization. Furthermore, the existing dynamic scheduling algorithms have concentrated only
on dynamic slack reallocation. However, as shown in this thesis, reassignment of tasks (i.e.,
dynamic assignment) along with reallocation of slack during runtime can be expected to lead to
better performance in terms of energy minimization.
38
CHAPTER 3 STATIC SLACK ALLOCATION
The slack allocation algorithms assume that an assignment of tasks to processors has
already been made. The problem of slack allocation can be posed as the following:
Allocate variable amount of slack to each task so that the total energy is minimized while the deadlines can still be met.
Most prior slack allocation algorithms provide non-optimal solutions for energy
minimization. They ignore the various energy profiles of tasks on different processors during
slack allocation. While some of algorithms use the energy profiles for better energy
minimization, they still ignore the dependency relationships among tasks in an assignment. All
of them lead to poor energy reduction. To address these problems, our slack allocation
algorithm incorporates assignment based dependency relationships among tasks as well as
different energy profiles of tasks. Unlike most algorithms, a Linear Programming (LP) based
approach provides near optimal solutions for energy minimization. However, it requires large
computational time and memory. We introduce a slack allocation algorithm which provides close
to optimal solutions for energy minimization but requires less computational time and memory
compared to LP based approach.
3.1 Proposed Slack Allocation
The Path based algorithm, our novel approach for energy minimization, is an iterative
approach that allocates a small amount of slack (called unit slack) in each iteration and asks the
following question:
Find the subset of tasks that can be allocated this unit slack so that the total energy consumption is minimized while the deadline constraint is also met.
39
The above process is iteratively applied till all the slack is used. We show that each iteration of
the problem can be reduced to finding a weighted maximal independent set of tasks, where the
weight is given by the amount of energy reduction by allocating unit slack.
The dependency relationships in an assignment DAG constrain the total slack which can be
allocated to the different tasks. For instance, in Figure 1-1, consider an example in which one
unit of slack can be allocated (i.e. the deadline is 12 units). The total unit slack that can be
allocated for the one unit of slack is one or two:
• If task τ7 (or τ1) is allocated the slack, no other task can use this slack in order to satisfy the deadline constraints.
• Tasks τ2 andτ3 (or τ2 & τ4, τ2 & τ6, τ4 & τ5, τ5 & τ6) can use this slack concurrently as they are not dependent on each other and both can be slowed down.
The appropriate option to choose between the two choices depends on the energy reduction in
task τ7 versus the sum of energy reduction for tasks τ2 and τ3.
Our slack allocation algorithm considers the overall assignment-based dependency
relationships among tasks, while the most existing algorithms ignore them. We define two
phases:
• Phase 1: Slack allocation from start time to total finish time based on a given assignment - in this case the slack can be allocated to only a subset of tasks that are not on the critical path.
• Phase 2: Slack allocation from total finish time to deadline - in this case the slack can potentially be allocated to all the tasks.
For instance, while, in Figure 1-1, there is no slack from start time to total finish time, in
Figure 3-1, the slack of time 5 to 6 is considered for the slack allocation from start time to total
finish time. The slack can be allocated only to task τ2. However, the slack of time 8 to 9 at Phase
2 can be allocated to a subset of tasks (e.g., τ1, τ2 & τ3, or τ4).
40
The execution of Phase 1 precedes the execution of Phase 2 to expect more energy saving
by reducing the possibility of redundant slack allocation to the same tasks. In the example of
Figure 3-1, assume that the energy of tasks τ1, τ2, τ3, and τ4 reduced by allocating one time unit of
slack is 1, 10, 1, and 10, respectively and the energy model follows a quadratic function. The
total energy saving is 20 by allocating slack to task τ2 at Phase 1 and then task τ4 at Phase 2.
Meanwhile, when allocating slack to tasks τ2 and τ3 at Phase 2 and then task τ2 at Phase 1, the
total energy saving is 16.6. It gives a difference of 17%.
For each of the two phases, our algorithm iteratively allocates one unit of slack (the size of
this unit called unitSlack is a parameter). For Phase 1, at each iteration over unitSlack, only tasks
with the maximum available slack are considered because of the limited number of slack
allocable tasks and the different amount of available slack for each task. Thus tasks considered at
each iteration may be changed. For instance, consider an example where only three tasks have
available slack of 5, 4, and 3 respectively. In the first iteration, only one task with a slack of 5
will be considered. In the next iteration, two tasks will be considered as both of them have a
slack of 4. This process is iteratively executed till there is no task which can use slack until total
finish time. Meanwhile, at Phase 2, all tasks are considered for slack allocation at each iteration.
The number of iterations at Phase 2 is equal to totalSlack divided by unitSlack, where totalSlack
is defined by the difference of actual deadline and total finish time. At each iteration, one
unitSlack is allocated to one or more tasks that lead to maximum sum of energy reduction over
the full use of the unitSlack. The characteristic that each task is allocated the entire unitSlack or
no slack during each iteration allows for the use of branch and bound techniques to find the
optimal slack allocation. The size of the unitSlack can be reduced to a level where the further
reducing it does not significantly improve the energy requirements.
41
Figure 3-1. Example of a DAG and assignment on two processors
3.2 Unit Slack Allocation
In this section, we present our slack allocation algorithm over a minimum indivisible unit
slack, called as unitSlack, which finds the best task set that can efficiently use unitSlack for
minimizing energy consumption. A key requirement of the slack allocation algorithm is to
incorporate assignment-based dependency relationships among tasks as well as different energy
profiles of tasks on different processors.
The slack allocation algorithm is motivated from the characteristic that each assignment-
based path which consists of tasks with precedence relationships in an assignment DAG cannot
have more than one unitSlack. Furthermore, this slack cannot be allocated to more than one task
on each path. In Figure 1-1, there are three assignment-based paths: τ1-τ2-τ5-τ7 (Path1), τ1-τ3-τ5-
τ7, (Path2), and τ1-τ3-τ4-τ6-τ7 (Path3). The maximum amount of unitSlack that can be allocated to
tasks is the number of paths and only one task along each of these three paths can be allocated
the unitSlack. An implication of the above is that two tasks on the same path of an assignment
DAG cannot both be allocated unitSlack. Using a matrix which represents tasks that can share
slack for given tasks, the branch and bound search method is efficiently applied.
3.2.1 Maximum Available Slack for a Task
Each task has differential amount of maximum available slack. This is due to the fact that
the assignment algorithm has to maintain the precedence relationships among tasks in an original
0 1 2 3 4 5 6 7 8 9
deadline
1
2 3
4
P0
P1 3
4 2 1
42
DAG. This slack is divided by unitSlack for normalization, i.e., the maximum number of
unitSlack’s that can be allocated to a task is equal to maximum available slack divided by
unitSlack. The maximum available slack of task τi, slacki, is defined by the difference of the
latest start time of τi, LSTi, and the earliest start time of τi, ESTi. The latest start time of task τi,
the earliest start time of task τi, and the slack of task τi are respectively defined by
( ) iijjsuccjpSuccii compTimecommTimeLSTLSTdeadlineLSTi
i−⎟
⎠⎞⎜
⎝⎛ −=
∈min,,min
( )( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
++
+=
∈ ijjjpred
pPredpPredi
i commTimecompTimeEST
compTimeESTstartEST
ij
ii
τmax
,,max
iii ESTLSTslack −=
where deadlinei is the deadline of task τi, starti is the start time of τi, succi is the set of direct
successors of τi in a DAG, pSucci is the task assigned next to τi on the same assigned processor,
predi is the set of direct predecessors of τi in a DAG, and pPredi is the task assigned prior to τi on
the same assigned processor. Note that at Phase 1 the deadline is assumed to be equal to total
finish time unless the specified deadline of a task is earlier than the total finish time.
3.2.2 Compatible Task Matrix
The matrix represents the list of tasks which can share unitSlack together for each task or
vice versa. If task τi and task τj are in the same assignment-based path, elements mij and mji in
compatible task matrix M are set to zero. Otherwise, the elements are set to one. The elements
related to the same task (i.e., mij where i = j) are set to zero. If the value of element indicating the
relationship of two tasks is equal to one, the two tasks can share unitSlack together because they
are independently (or in parallel) executed. However, if the value is equal to zero, the two tasks
cannot share unitSlack because only one task in each assignment-based path can have unitSlack.
43
The assignment-based dependency relationships among tasks may be changed after slack
allocation over unitSlack. The change of assignment-based dependency relationships also lets
compatible task matrix modified. The compatible task matrix M is defined by
⎩⎨⎧ =∩
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=otherwise ,0
if ,1 where,
... : :
... ...
21
22221
11211
φjiij
nnnn
n
n
ΠΠm
mmm
mmmmmm
M
where mij indicates whether task τi and task τj can be slack-sharable and Πi is the set of
assignment-based paths including task τi. While n (matrix size: n by n) is the total number of
tasks at Phase 2, it is the number of tasks whose maximum available slack is the greatest size at
Phase 1. This matrix can be easily generated by performing a transitive closure on the
assignment DAG and then taking the complement of that matrix. The DAG structure can also be
used to derive a list of ancestors for each task. This list can be updated by performing a level
wise search of the DAG.
In most cases, it generates a sparse matrix. This can be effectively represented by an array
of lists (one for each task). The compatible task list of task τi consists of tasks not in the same
paths with the task τi. Thus tasks included in compatibleTaski are ones which can share unitSlack
together with task τi. The compatible task list of task τi, compatibleTaski, is defined by
{ }Γ∈Γ∈=Π∩Π= ikiki wherekTaskcompatible ττφ ,,|
where Γ is the set of all tasks in a DAG.
Figure 3-2 shows the compatible matrix and lists for the example in Figure 1-1. Using the
compatible task matrix/lists, the set of tasks which can share unitSlack together is found such that
the sum of energy reduction of tasks is maximized. It corresponds to the maximum weighted
independent set (MWIS) problem which is known to be NP-hard [7, 53, 65]. Our approach on
44
task scheduling for energy minimization addresses this problem using a branch and bound search
and demonstrates its efficiency.
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=======
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
[]]5 ,2[]6 ,4[]5 ,2[
]2[]6 4, ,3[
[]
0000000001001001010000010010000001001011000000000
7
6
5
4
TaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatibleTaskcompatible
M3
2
1
Figure 3-2. Compatible task matrix and lists for an example in Figure 1-1
3.2.3 Search Space Reduction
We reduce the search space by performing the following checks for each task: fully
independent tasks, fully dependent tasks, and compressible tasks. The rule to distinguish task τi
using compatible task matrix and lists is as follows:
searchin consider Else
candidate allocable as consider then 0 , If Else
to allocate then 1 , If
i
iij
iij
τ
unitSlackτmj
τunitSlackmjj, i
=∀
=≠∀
The rule to distinguish task τi using compatible task lists is as follows:
( ) ( )( )
searchin consider Elsecandidate allocable as consider then 0 If Else
to allocate then 1 If
i
ii
ii
unitSlackTaskcompatibleNτunitSlack-ΓNTaskcompatibleN
ττ=
=
where N(compatibleTaski) is the number of tasks in compatibleTaski and N(Γ) is the number of
tasks.
45
3.2.3.1 Fully independent tasks
If a task is included only in an assignment-based path consisting of only the task (i.e.,
independent task from all of other tasks), unitSlack is certainly allocated to the task without
search regardless of the slack allocation of other tasks.
3.2.3.2 Fully dependent tasks
If a task are in all assignment-based paths (i.e., dependent task with all tasks), the task is
one of candidate task sets which unitSlack can be allocated to. Thus, the energy reduction of the
task is compared with those of other candidates without including this during the search. In
Figure 3-2, tasks τ1 and τ7 are the examples of fully dependent tasks.
3.2.3.3 Compressible tasks
The tasks on the same assignment-based paths can be represented by a single task for the
purpose of slack allocation. The representative of compressible tasks is a task with the maximum
energy reduction among the compressible tasks. This can lead to substantial reduction in runtime
without decreasing energy performance. In the assignment DAG of Figure 3-3 (a), tasks τ3, τ5,
and τ11 can be compressed and represented by a single representative task (e.g., τ3) since the
paths where they are included are all same. The representative of the compressed tasks is a task
with the maximum energy reduction among compressed tasks. Using compatible task lists, we
can check if tasks can be compressed instead of seeing assignment-based paths including tasks.
The rule of compression from a compatible task list is as follows:
jijiick ererTaskcompatibleTaskcompatible >=← & if ττ
where τkc is the k-th compressed task including a representative of the compressed task and eri is
energy reduction of task τi.
46
Figure 3-3 illustrates an initial assignment DAG and its compressed DAG for a given
application with 12 tasks. In Figure 3-4, the compression process of compatible task lists for the
example of Figure 3-3 is illustrated. In other words, (a) and (b) in Figure 3-4 represent the
assignment DAG of (a) and the compressed assignment DAG of (b) in Figure 3-3, respectively,
using compatible task lists. From the initial compatible task lists, the following tasks are
compressed: (τ1, τ12), (τ2, τ9, τ10), (τ3, τ5, τ11), and (τ4, τ8). Each compressible task list is
represented by one task with the maximum energy reduction (e.g., τ1, τ2, τ3, τ8). The second
column in Figure 3-4 (b) shows compatible task lists after the compression. Any fully
independent task (e.g., τ2) is automatically allocated unitSlack and excluded for the search by
removing the task from the compatible task lists of other tasks and itself. Once the fully
independent task is removed from the compatible task lists, task τ1 is identified as a fully
dependent one and also excluded for the search. It is considered as a feasible solution without
any search. The remaining tasks except for the fully independent tasks and the fully dependent
tasks participate in search.
Tasks that can be effectively merged with other tasks are removed (i.e., tasks with greater
index are removed) from compatible task lists to avoid redundant traversal in search. For
instance, task τ7 has task τ6 in the compressed compatible task list of task τ7, but the task τ6 is
removed since the compatible task list of task τ6 includes task τ7. The third column in Figure 3-4
(b) shows the reduced compatible task lists after compression. The search is finally performed
with tasks τ3, τ6, τ7, τ8 based on the reduced compatible task lists. Through the search, two
solutions, {τ3, τ8} and {τ6, τ7, τ8}, are considered in addition to fully dependent tasks (e.g., τ1) as
feasible solutions.
47
Figure 3-3. Compression of assignment DAG: (a) Assignment DAG, (b) Compressed assignment DAG
Figure 3-4. Compression of compatible task lists: (a) Compatible task lists in a given assignment, (b) Compressed and reduced compatible task lists
3.2.4 Branch and Bound Search
The energy reduction of a task is defined by the difference of its original energy and its
energy expected after allocating a unitSlack to the task. A branch and bound algorithm is used to
search all the possible compatible solutions to determine the one that has the maximum energy
reduction. The feasible states in the state space consist of all the compatible subsets of tasks. We
(a) [2, 9, 10] 12
[2, 4, 8, 9, 10] 11 [1, 3, 4, 5, 6, 7, 8, 12]10 [1, 3, 4, 5, 6, 7, 8, 12] 9 [2, 3, 5, 6, 7, 9, 10, 11]8
[2, 4, 6, 8, 9, 10] 7 [2, 4, 7, 8, 9, 10] 6 [2, 4, 8, 9, 10] 5
[2, 3, 5, 6, 7, 9, 10, 11]4 [2, 4, 8, 9, 10] 3
[1, 3, 4, 5, 6, 7, 8, 12] 2 [2, 9, 10] 1
Compatible Tasks task
[ ] [2, 3, 6, 7] 8 [8] [2, 6, 8] 7
[7, 8] [2, 7, 8] 6 [8] [2, 8] 3
Allocate unitSlack[1, 3, 6, 7, 8] 2 Feasible solution [2] 1
Reduction Compression Compatible Tasks task
(b)
(a)
: start (dummy) node
: compressed task 1
3 4
5
86
2
9
10
7
11
12
S
1
3 8
6
2
7
S5
(b)
48
use a Depth First Search (DFS) to effectively search through all possible subset of compatible
tasks. The advantage of using a DFS is that it only stores one search path representing a
candidate task set which unitSlack can be allocated to during search. By maintaining a running
lower bound from the energy reduction of traversed search paths so far, we apply bounding
heuristics that eliminate search spaces where a better solution cannot be found.
At any given node of the state space tree, the set of possible search options is limited to the
list of available tasks corresponding to the intersection of all the list of tasks from the root to that
particular node. Each node in a search graph has its own explorable task list indicating tasks
which can be explored as child nodes of the node. The explorable task list of node νi including
task τk, explorableTaski, is defined by
⎩⎨⎧
≠→∩=→
=φ
φ
iikkparent
iikki , parentν τ,TaskcompatibleTaskexplorable
, parentν τ,TaskcompatibleTaskexplorable
i if
if
The cost of node νx, c(x), is defined by c(x) = f(x) + g(x), where f(x) is the sum of energy
reduction of tasks from the root to node νx and g(x) is the estimate on the sum of energy
reduction of tasks of child nodes from node νx. g(x) is obtained as the sum of energy reduction of
tasks in the explorable task list of the node and represents an upper bound to the amount of
energy reduction of tasks of child nodes. Thus, when exploring nodes in search, if c(x) is lower
than the lower bound, the node νx is pruned, otherwise, it is expanded. The cost value c(x) on leaf
node νx indicates the actual sum of energy reduction of tasks in the search path. If c(x) on leaf
node νx is greater than lower bound, the lower bound is updated as c(x) and the search path
becomes a candidate solution. The optimal task set over unitSlack is finally found. Figure 3-5
illustrates the reduction of compatible task list in Figure 3-2 and its application to explore a
search graph. Through the search, five solutions, {τ2, τ3}, {τ2, τ4}, {τ2, τ6}, {τ4, τ5}, and {τ5, τ6},
49
are considered in addition to fully dependent tasks (e.g., τ1, τ7) as feasible solutions which
unitSlack can be allocated to.
Figure 3-5. Reduced compatible task lists and search graph
3.2.5 Estimating the Lower Bound to Reduce the Search Space
Finding the set of tasks which makes the sum of energy reduction of tasks maximized by
allocating unitSlack can be referred to the maximum weighted independent set problem (MWIS).
The authors in [53] showed that simple greedy algorithms for the MWIS guarantee to find a task
set whose weight is at least ( ) ( )[ ]( )∑ ∈+
GVvvdvW 1/ where W(v) is the weight of vertex v in a
graph G and d(v) is the degree of vertex v. We modify the guaranteed minimum weight for
MWIS problem to apply it to our problem as an initial lower bound. The lower bound,
lowerbound, is initialized as follows:
( ) ( )∑Γ∈ +−Γ
=si i
si
TaskcompatibleNNer
lowerbound1
,
where Γs is the set of tasks participating in the search and N(Γs)is the number of tasks
participating in the search, N(compatibleTaski) is the number of tasks included in
compatibleTaski, and eri is the energy reduced by allocating unitSlack to task τi.
[ ] 6
[6] 5
[5] 4
[ ] 3
[3, 4, 6] 2
Compatible Tasks Task
6 6
2 5
3 4
[3, 4, 6]
[ ] [ ] [ ] [ ]
4
5 [ ]
[ 6 ] [5]
[5] ∩ [6]= [ ] [ 6 ] ∩ [ ]= [ ]
[3, 4, 6] [ ]= [ ] ∩
50
If the set of fully dependent tasks is nonempty, the lower bound is compared with the
energy reduction of each fully dependent task. In the example of Figure 3-2, before the search,
the lower bound is updated by the maximum energy reduction among fully dependent tasks τ1
and τ7 if the initial lower bound is lower. Then the fully dependent task (τ1 or τ7) with the
maximum energy reduction becomes a feasible solution for slack allocation. Furthermore, at
each iteration, unless the assignment-based dependency relationships among tasks are changed
from the previous step, the energy reduction of the solution of the previous step (i.e., the sum of
energy reduction of tasks which unitSlack is allocated to at the previous step) can be used as the
lower bound for the next unit slack allocation.
3.3 Experimental Results
We compare the performance of our DVS algorithm (i.e., PathDVS), DVS algorithm to
allocate slack to task(s) with the highest energy reduction in [48, 55] (i.e., EProfileDVS), and
greedy slack allocation based DVS algorithm in [13] (i.e., GreedyDVS). All the DVS algorithms
assume that the assignment of tasks to processor is already completed. The following two
different assignment strategies are used: ICP which assigns based on the earliest finish time
(presented in Chapter 5) and CPS which assigns based on the earliest possible start time [48]. We
also compare the performance of PathDVS and LPDVS, an extension to the formulation in [75]
to incorporate communication costs. PathDVS and LPDVS algorithms provide close to the
optimal solution and controlled by the size of the unitSlack and the number of intervals
respectively. The size of unitSlack and the number of intervals are also set to the best size and
length obtained empirically in this experiment. For LPDVS, CPLEX v.10.0 [14], was used to
solve the LP problem by using a piecewise linear function for convex objective function.
51
3.3.1 Simulation Methodology
In this section, we describe DAG generation and performance measure used in our
experiments.
3.3.1.1 The DAG generation
In order to show the performance of the proposed static slack allocation algorithm in both
heterogeneous and homogeneous environments, we randomly generated a large number of
synthetic graphs with 100, 200, 300, and 400 tasks. For heterogeneous systems, the execution
time of each task on each processor at the maximum voltage is varied from 10 to 40 units. The
communication time between a task and its child task for a pair of processors is varied from 1 to
4 units. For homogeneous systems, within the similar extent, the execution time of each task on
all processors at the maximum voltage is varied from 10 to 40 units and all of the communication
time among tasks on different processors is set to 2 units. The energy consumed to execute each
task is varied from 10 to 80. The execution of graphs is performed on 4, 8, and 16 processors.
For each combination of values of number of tasks and processors, 20 different synthetic graphs
are generated.
3.3.1.2 Performance measures
The performance is measured in terms of normalized total energy consumption, that is,
total energy normalized by the energy obtained from an assignment algorithm without a DVS
scheme. The deadline is determined by: deadline = (1 + deadline extension rate) * maximum
total finish time from assignments without DVS scheme. We provide experimental results for
deadline extension rate equal to 0.0 (no deadline extension), 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, and
0.4.
52
3.3.2 Memory Requirements
The size of the compatible task matrix is O(n2). Generally this matrix is sparse and can be
reduced into O(kn) using lists, where n is the number of tasks and k is the constant representing
the number of compatible tasks. At every level the list of explorable tasks of size bounded by
O(n) is stored, but its size becomes zero at the leaf node as reduced gradually at each level. Our
branch and bound method uses DFS and only stores one path whose length is the number of
tasks that can be allocated slack together and should be O(min(n,p)), where p is the number of
processors. Thus the number of variables stored during search is O(n) and the overall memory
requirement of our algorithm is O(kn+n) – it can be reduced by using search space reduction
techniques. The number of variables required for LPDVS is proportional to O(n * number of
intervals) and its memory requirement depends on the actual implementation of linear
programming. Using CPLEX on a machine with 2 Gigabyte memory, the maximum number of
tasks that LPDVS can reliably execute was around 200 for 0.4 deadline extension rate (i.e. 400
piecewise linear intervals per a task). Meanwhile, we were able to execute DAGs of size 1000
using PathDVS as shown in Figure 3-6.
8 Processors with 0.4 Deadline Extension Rate
0
50000
100000
150000
200000
250000
300000
350000
0 200 400 600 800 1000 1200
Number of Tasks
Run
time
Figure 3-6. Runtime of PathDVS with respect to different size of DAGs (unit: ms)
53
3.3.3 Determining the Size of Unit Slack and the Number of Intervals
Figure 3-7 shows the results of comparison of energy consumption of PathDVS with
respect to different sizes of unitSlack. The size of unitSlack is determined by the rate of total
finish time (i.e., unitSlack = totalFinishTime * unitSlackRate). The performance of our slack
allocation algorithm in terms of energy depends on the size of unitSlack. In general, the smaller
size of unitSlack leads more energy saving while it makes runtime increased. However, the size
of the unitSlack can be limited to a level where further reducing it does not significantly improve
energy requirements. Based on the results, the size of unitSlack corresponding to 0.0005 unit
slack rate does not give significant improvement on energy. While there is 7-10% improvement
of energy with 0.001 unitSlackRate over 0.01 unitSlackRate, there is less than 0.3% difference of
energy between 0.001 and 0.0005 unitSlackRates. Thus the size of unitSlack corresponding to
0.001 unitSlackRate is a reasonable choice.
100 Tasks
0
0.2
0.4
0.6
0.8
1
0.1 0.01 0.001 0.0005
Unit Slack Rate
Nor
mal
ized
Ene
rgy
0 deadline extension0.1 deadline extension0.2 deadline extension0.3 deadline extension0.4 deadline extension
200 Tasks
0
0.2
0.4
0.6
0.8
1
0.1 0.01 0.001 0.0005
Unit Slack Rate
Nor
mal
ized
Ene
rgy
0 deadline extension0.1 deadline extension0.2 deadline extension0.3 deadline extension0.4 deadline extension
Figure 3-7. Normalized energy consumption of PathDVS with respect to different unit slack rates
for different number of tasks: (a) 100 tasks and (b) 200 tasks
The authors in [24] suggest that the LP problem with a convex objective function and
linear constraints can be optimally solved using 8n intervals for the piecewise linear function that
approximates that the convex function, where n is the number of tasks. However, we found that
in practice, the smaller number of intervals is sufficient for our target applications. The total time
54
amount which will be divided by interval for the piecewise linear function for each task is equal
to the amount of total maximum available slack (i.e., deadline extension rate * total finish time +
slack available until total finish time or available slack based on minimum voltage) – further
dividing the time amount is unnecessary and requires more computational time. The total slack
available to each task can be approximately bounded by total available slack (i.e., deadline - total
finish time before slack allocation). The number of intervals is proportional to the deadline
extension rate divided by the interval rate (i.e., the number of intervals ∝ deadline extension rate
/ intervalRate). Figure 3-8 shows the result of comparison of energy consumption of LPDVS
with respect to different interval rates by which the objective function is divided. Based on the
results, the length of interval corresponding to 0.0005 intervalRate does not give significant
improvement on energy compared to 0.001 intervalRate. However, there is 2-8% improvement
of energy with 0.001 intervalRate over 0.01 intervalRate while there is 0.05% difference of
energy between 0.001 and 0.0005 intervalRates. Thus the length of interval corresponding to
0.001 intervalRate is a reasonable choice.
100 Tasks
0
0.2
0.4
0.6
0.8
1
0.1 0.01 0.001 0.0005
Interval Rate
Nor
mal
ized
Ene
rgy
0 deadline extension0.1 deadline extension0.2 deadline extension0.3 deadline extension0.4 deadline extension
200 Tasks
0
0.2
0.4
0.6
0.8
1
0.1 0.01 0.001 0.0005
Interval Rate
Nor
mal
ized
Ene
rgy
0 deadline extension0.1 deadline extension0.2 deadline extension0.3 deadline extension0.4 deadline extension
Figure 3-8. Normalized energy consumption of LPDVS with respect to different interval rates for
different number of tasks: (a) 100 tasks and (b) 200 tasks
55
3.3.4 Homogeneous Environments
In this section, we show the performance of the proposed static slack allocation algorithm
in homogeneous environments where the computation time of each task and the communication
time among tasks on all processors are same.
3.3.4.1 Comparison of energy requirements
Tables 4-1, 4-2, 4-3, and 4-4 show the improvement of PathDVS over EProfileDVS and
GreedyDVS for different number of processors with respect to different assignments and
different number of processors for different number of tasks in homogeneous environments.
PathDVS considerably outperforms other existing DVS algorithms regardless of using any
assignment algorithms. For instance, given ICP assignment, PathDVS improves by 12-29% over
EProfileDVS and 60-70% over GreedyDVS with 0.4 deadline extension rate. The results show
that the performance improvement of PathDVS is higher for larger number of processors.
Table 3-1. Results for 100 tasks in homogeneous environments: Improvement of PathDVS over
EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.26 1.16 1.82 3.55 5.86 8.10 9.03 9.81
ICP Greedy 2.18 5.96 9.34 17.69 28.19 42.41 51.90 59.08EProfile 0.08 1.02 1.88 3.67 5.53 7.60 9.02 9.94
4 Processors
CPS Greedy 2.19 6.08 9.53 18.04 28.49 42.60 52.24 59.41EProfile 1.00 2.59 3.97 7.24 11.24 16.06 18.97 20.64
ICP Greedy 4.97 9.40 13.35 22.80 34.35 49.20 58.70 65.34EProfile 0.20 1.98 3.47 7.37 11.31 15.77 18.25 19.86
8 Processors
CPS Greedy 3.32 8.11 12.26 22.06 33.59 48.34 57.75 64.44EProfile 1.67 4.99 7.54 13.55 19.59 24.12 26.24 27.37
ICP Greedy 7.15 13.43 18.78 30.45 42.80 56.23 64.35 70.02EProfile 0.54 3.75 6.40 12.67 18.37 23.89 26.09 27.27
16 Processors
CPS Greedy 6.41 13.27 18.81 30.61 42.62 56.23 64.34 70.06
56
Table 3-2. Results for 200 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.21 1.32 2.17 4.08 6.24 8.59 10.03 11.14
ICP Greedy 1.30 5.84 9.59 18.36 28.95 43.07 52.65 59.84EProfile 0.08 1.23 2.14 4.32 6.86 10.43 12.41 14.07
4 Processors
CPS Greedy 1.23 5.78 9.53 18.46 29.23 44.00 53.77 61.06EProfile 0.37 2.66 4.23 8.04 12.72 18.39 21.51 23.54
ICP Greedy 2.71 8.29 12.74 23.01 34.98 50.05 59.49 66.18EProfile 0.20 2.13 4.08 8.26 13.11 18.31 20.79 22.74
8 Processors
CPS Greedy 2.18 7.77 12.29 22.72 34.82 49.73 58.88 65.55EProfile 1.25 3.42 5.37 9.93 15.58 22.37 26.04 28.15
ICP Greedy 4.88 10.50 15.21 26.04 38.52 53.85 63.17 69.44EProfile 0.25 2.76 4.95 10.06 15.80 22.61 26.41 28.53
16 Processors
CPS Greedy 3.81 9.80 14.60 25.52 38.06 53.52 62.88 69.16
Table 3-3. Results for 300 tasks in homogeneous environments: Improvement of PathDVS over
EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.10 0.80 1.22 2.30 3.86 6.49 8.91 10.50
ICP Greedy 1.36 5.58 8.98 17.14 27.36 41.88 52.17 59.63EProfile 0.02 0.81 1.27 2.36 3.64 5.41 6.34 7.19
4 Processors
CPS Greedy 0.94 5.30 8.77 17.02 27.10 41.23 50.91 58.24EProfile 0.47 3.34 5.69 11.45 17.89 25.14 30.68 34.62
ICP Greedy 1.89 8.80 14.05 25.91 39.25 54.74 64.68 71.46EProfile 0.06 3.15 5.44 10.72 16.87 23.58 27.77 30.60
8 Processors
CPS Greedy 1.57 8.65 13.94 25.52 38.45 53.65 63.04 69.52EProfile 0.68 4.31 7.01 13.23 19.72 26.67 29.95 31.73
ICP Greedy 3.76 11.13 16.74 28.82 41.93 56.90 65.51 71.28EProfile 0.62 4.53 7.02 13.15 19.64 26.58 29.73 31.54
16 Processors
CPS Greedy 3.21 10.90 16.60 28.69 41.70 56.59 65.20 71.00
57
Table 3-4. Results for 400 tasks in homogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 0.09 1.20 2.05 4.40 7.25 11.30 14.59 16.66
ICP Greedy 0.83 5.91 9.84 18.93 29.90 44.91 55.24 62.49EProfile 0.02 1.22 2.02 3.82 6.03 8.54 9.62 10.18
4 Processors
CPS Greedy 0.62 5.70 9.59 18.40 29.00 43.14 52.59 59.56EProfile 0.22 2.20 3.69 7.06 11.32 17.02 21.29 24.70
ICP Greedy 1.48 7.27 11.70 21.78 33.85 49.29 59.53 66.85EProfile 0.04 1.90 3.23 6.90 11.33 18.70 23.77 27.56
8 Processors
CPS Greedy 1.51 7.21 11.59 21.86 33.88 50.36 60.90 68.19EProfile 0.56 4.91 8.09 14.42 21.52 28.41 31.38 33.31
ICP Greedy 2.47 10.37 16.17 28.53 41.76 56.85 65.52 71.28EProfile 0.29 4.75 7.58 13.62 20.33 27.92 31.63 33.46
16 Processors
CPS Greedy 1.76 10.39 16.26 28.75 42.03 57.09 65.76 71.52
Figure 3-9 shows the energy comparison of DVS algorithms (i.e., PathDVS, EProfileDVS,
GreedyDVS) using ICP for different number of tasks. The results show that the performance
improvement of PathDVS over the other DVS algorithms generally increases as the deadline
extension rate increases.
Table 3-5 shows the energy comparison between PathDVS and LPDVS in homogeneous
environments. Note that the comparison is limited to 200 tasks as this was the largest problem
that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS
and the intervalRate for LPDVS are set to 0.001. These results show that the two algorithms are
comparable in energy minimization or PathDVS is slightly better for most cases.
58
100 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
200 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
300 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
400 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
Figure 3-9. Normalized energy consumption of slack allocation algorithms with respect to
different deadline extension rates for different number of tasks: (a) 100 tasks, (b) 200 tasks, (c) 300 tasks, and (d) 400 tasks
Table 3-5. Normalized energy consumption of PathDVS and LPDVS with respect to different
deadline extension rates in homogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)
100 Tasks 200 Tasks
LPDVS PathDVS Difference LPDVS PathDVS Difference 0 0.962454 0.962451 0.000003 0.978646 0.978653 -0.000007
0.01 0.921541 0.921532 0.000009 0.924348 0.924422 -0.000074 0.02 0.888233 0.888223 0.000010 0.883956 0.884003 -0.000047 0.05 0.809833 0.809764 0.000069 0.793064 0.793112 -0.000048 0.1 0.713714 0.713611 0.000103 0.686825 0.685985 0.00084 0.2 0.579983 0.579758 0.000225 0.547527 0.543398 0.004129 0.3 0.487378 0.48714 0.000238 0.455571 0.447699 0.007872 0.4 0.417437 0.417173 0.000264 0.388111 0.377237 0.010874
59
3.3.4.2 Comparison of time requirements
Table 3-6 and Figure 3-10 show the comparison of computational time for PathDVS and
LPDVS in homogeneous environments. The PathDVS requires less runtime because it
substantially reduces the search space by using compatible task lists, their compression, and the
lower bound. In particular, the time requirements of PathDVS are substantially smaller as the
deadline extension rate decreases (i.e., tight deadline) while it increases linearly as the deadline
extension rate increases due to the iterative search over unitSlack. For many practical real time
systems, the tight deadline is true. Based on the results shown in Table 3-6, for no deadline
extension (i.e., deadline extension rate equal to 0), the runtime of PathDVS is one to two orders
magnitude less than that of LPDVS.
Table 3-6. Runtime ratio of LPDVS to PathDVS for no deadline extension in homogeneous
environments 100 Tasks 200 Tasks
4 Processors 61.97 210.32 8 Processors 19.46 52.74
100 Tasks
0
200
400
600
800
1000
0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Run
time
LPDVSPathDVS
200 Tasks
0
500
1000
1500
2000
2500
3000
0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Runt
ime
LPDVSPathDVS
Figure 3-10. Runtime to execute algorithms with respect to different deadline extension rates for
different number of tasks in homogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks
60
3.3.5 Heterogeneous Environments
In this section, we show the performance of the proposed static slack allocation algorithm
in heterogeneous environments where the computation time of each task and the communication
time among tasks are different on different processors.
3.3.5.1 Comparison of energy requirements
Tables 4-7, 4-8, 4-9, and 4-10 show the improvement of PathDVS over EProfileDVS and
GreedyDVS for different number of processors with respect to different assignments (i.e., ICP
and CPS assignments) and different number of processors (i.e., 4, 8, and 16 processors) for
different number of tasks (i.e., 100, 200, 300, and 400 tasks) in heterogeneous environments.
Like in homogeneous environments, PathDVS considerably outperforms other existing DVS
algorithms regardless of using any assignment algorithms. For instance, given ICP assignment,
PathDVS improves by 7-36% over EProfileDVS and 80-93% over GreedyDVS with 0.4
deadline extension rate. The results also show that the performance improvement of PathDVS is
higher for larger number of processors and larger number of tasks. Figure 3-11 shows the energy
comparison of DVS algorithms (i.e., PathDVS, EProfileDVS, GreedyDVS) using ICP for
different number of tasks (i.e., 100, 200, 300, and 400 tasks). Based on the results, in general, the
performance improvement of PathDVS over the other DVS algorithms generally increases as the
deadline extension rate increases.
61
Table 3-7. Results for 100 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 6.70 6.74 6.78 6.88 6.91 7.05 7.12 7.22
ICP Greedy 58.63 59.44 60.27 62.63 66.09 71.68 75.97 79.35EProfile 0.67 1.19 1.76 2.66 3.75 5.37 6.32 6.71
4 Processors
CPS Greedy 3.45 6.86 10.02 17.62 27.67 41.63 51.21 58.35EProfile 19.43 19.47 19.50 19.48 19.49 19.67 19.60 19.61
ICP Greedy 76.32 76.78 77.26 78.61 80.60 83.82 86.29 88.24EProfile 0.53 2.63 4.20 8.08 12.28 16.47 18.08 18.90
8 Processors
CPS Greedy 8.28 12.66 16.56 25.88 36.92 50.65 59.23 65.38EProfile 23.56 23.53 23.62 23.54 23.62 23.57 23.69 23.69
ICP Greedy 84.19 84.48 84.80 85.69 87.02 89.18 90.83 92.14EProfile 1.81 4.37 6.42 10.97 15.10 20.03 22.14 23.24
16 Processors
CPS Greedy 13.04 17.37 21.38 30.40 40.71 53.99 62.40 68.33
Table 3-8. Results for 200 tasks in heterogeneous environments: Improvement of PathDVS over
EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 10.22 10.28 10.37 10.47 10.54 10.79 10.91 10.88
ICP Greedy 60.40 61.21 62.01 64.27 67.58 72.93 77.03 80.27EProfile 0.16 1.51 2.44 4.41 6.62 8.64 9.57 10.00
4 Processors
CPS Greedy 2.73 7.23 10.91 19.53 29.90 43.70 52.96 59.85EProfile 15.75 15.75 15.91 15.86 16.10 16.18 16.30 16.28
ICP Greedy 73.25 73.78 74.31 75.83 78.07 81.69 84.47 86.65EProfile 0.42 1.39 2.28 4.84 7.78 11.43 13.59 14.97
8 Processors
CPS Greedy 5.46 9.33 12.88 21.54 32.15 46.42 55.87 62.69EProfile 26.89 26.87 26.96 26.85 26.80 26.83 26.95 26.89
ICP Greedy 83.45 83.77 84.09 85.02 86.40 88.65 90.37 91.73EProfile 1.21 4.43 7.15 12.29 18.45 23.50 25.46 26.45
16 Processors
CPS Greedy 9.94 15.37 19.70 29.47 41.40 54.65 62.60 68.26
62
Table 3-9. Results for 300 tasks in heterogeneous environments: Improvement of PathDVS over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 7.75 7.79 7.86 7.93 8.07 8.22 8.31 8.34
ICP Greedy 58.83 59.67 60.49 62.81 66.23 71.77 76.02 79.37EProfile 0.00 0.83 1.35 2.82 4.42 6.40 7.33 7.85
4 Processors
CPS Greedy 1.47 5.98 9.60 18.02 28.30 42.28 51.78 58.85EProfile 18.96 18.91 18.92 19.06 19.12 19.18 19.20 19.26
ICP Greedy 74.63 75.14 75.65 77.09 79.21 82.63 85.27 87.34EProfile 0.08 1.84 3.03 5.99 9.54 14.04 16.38 17.78
8 Processors
CPS Greedy 4.00 8.90 12.85 22.20 33.43 48.05 57.45 64.09EProfile 35.29 35.37 35.41 35.26 35.36 35.44 35.47 35.39
ICP Greedy 85.29 85.59 85.89 86.72 87.96 89.96 91.50 92.71EProfile 0.50 4.50 7.73 14.79 22.20 29.25 32.34 33.64
16 Processors
CPS Greedy 10.19 16.75 21.91 33.27 45.46 59.14 66.96 72.21
Table 3-10. Results for 400 tasks in heterogeneous environments: Improvement of PathDVS
over EProfileDVS and GreedyDVS in terms of energy consumption with respect to different assignments and different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4 EProfile 9.30 9.37 9.45 9.53 9.68 9.90 9.97 10.01
ICP Greedy 59.27 60.11 60.94 63.26 66.67 72.16 76.38 79.70EProfile 0.10 1.23 1.97 3.84 5.86 8.10 9.16 9.72
4 Processors
CPS Greedy 1.69 6.53 10.30 19.00 29.44 43.36 52.72 59.64EProfile 22.48 22.49 22.51 22.53 22.66 22.73 22.71 22.75
ICP Greedy 75.45 75.96 76.45 77.84 79.90 83.22 85.78 87.79EProfile 0.89 3.31 4.99 9.02 13.36 18.08 20.40 21.65
8 Processors
CPS Greedy 5.36 10.98 15.35 25.34 36.79 50.98 59.84 66.08EProfile 36.18 36.09 36.09 36.05 36.07 36.05 36.14 36.05
ICP Greedy 85.34 85.64 85.93 86.77 88.00 89.99 91.52 92.73EProfile 1.28 4.83 8.34 16.16 23.10 29.95 32.73 34.07
16 Processors
CPS Greedy 7.83 14.56 20.18 32.27 44.82 58.80 66.82 72.20
63
100 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
200 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
300 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
400 Tasks
0
0.2
0.4
0.6
0.8
1
0 0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
GreedyDVSEProfileDVSPathDVS
Figure 3-11. Normalized energy consumption of slack allocation algorithms with respect to
different deadline extension rates for different number of tasks in heterogeneous environments: (a) 100 tasks, (b) 200 tasks, (c) 300 tasks, and (d) 400 tasks
Table 3-11. Normalized energy consumption of PathDVS and LPDVS with respect to different
deadline extension rates in heterogeneous environments (Positive difference indicates that PathDVS performs better than LPDVS)
100 Tasks 200 Tasks
PathDVS LPDVS Difference PathDVS LPDVS Difference 0 0.922500 0.922383 - 0.000116 0.947985 0.947781 -0.000203
0.01 0.885561 0.885098 -0.000462 0.906328 0.906165 -0.000162 0.02 0.851380 0.850993 -0.000386 0.870523 0.870373 -0.000149 0.05 0.770131 0.769853 -0.000277 0.785193 0.785067 -0.000126 0.1 0.671220 0.671066 -0.000154 0.681989 0.682629 0.000639 0.2 0.537683 0.537611 -0.000072 0.543238 0.545835 0.002597 0.3 0.447022 0.447104 0.000081 0.449777 0.453937 0.004160 0.4 0.379900 0.380161 0.000261 0.382132 0.385930 0.003798
64
Table 3-11 shows the energy comparison between PathDVS and LPDVS in heterogeneous
environments. Note that the comparison is limited to 200 tasks as this was the largest problem
that we were able to solve using LPDVS on our workstation. The unitSlackRate for PathDVS
and the intervalRate for LPDVS are set to 0.001. Like in homogeneous environments, these
results show that the two algorithms are comparable in energy minimization.
3.3.5.2 Comparison of time requirements
Table 3-12 and Figure 3-12 show the runtime comparison between PathDVS and LPDVS.
Like in homogeneous environments, PathDVS requires less runtime because it substantially
reduces the search space by using compatible task lists, their compression, and the lower bound.
In particular, the time requirements of PathDVS are substantially smaller as the deadline
extension rate decreases (i.e., tight deadline). For instance, the runtime ratio of LPDVS to
PathDVS is 56.49 for 200 tasks on 4 processors for no deadline extension.
Table 3-12. Runtime ratio of LPDVS to PathDVS for no deadline extension in heterogeneous
environments 100 Tasks 200 Tasks
4 Processors 37.38 56.49 8 Processors 13.27 12.22
100 Tasks
0
200
400
600
800
1000
0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Run
time
LPDVSPathDVS
200 Tasks
0
500
1000
1500
2000
2500
0.01 0.02 0.05 0.1 0.2 0.3 0.4
Deadline Extension Rate
Run
time
LPDVSPathDVS
Figure 3-12. Runtime to execute algorithms with respect to different deadline extension rates for
different number of tasks in heterogeneous environments (unit: ms): (a) 100 tasks and (b) 200 tasks
65
3.3.6 Effect of Search Space Reduction Techniques for PathDVS
The main factor that determines the cost of PathDVS algorithm is the size of search space.
In this section, we present the effect of search space reduction techniques introduced in this
paper (i.e., compression, compatible task matrix/lists, and lower bound). The experiments are
performed on 50 different synthetic graphs for each combination of values of number of tasks
and processors with 0.01 deadline extension rate. We present results the average values of the
different metrics for Phase 2 as it is considerably more computation intensive than Phase 1. The
cost of Phase 1 is small as the number of slack allocable tasks considered is smaller.
The size of search space depends on the depth of search tree and the number of tasks
participating in the search. The size of search space is O(n^d) where n is the number of total
tasks and d is the depth of search tree. By using compression, the size can be reduced to O(t^d)
where t is the number of tasks participating in the search.
Table 3-13 shows the average number of tasks after compression. Note that the
compression technique classifies tasks into three categories: fully independent tasks, fully
dependent tasks, and compressible tasks, and then makes only a representative for each
compressible task participate in the search. These results show that the compression methods
reduce the number of task significantly (58-94%) leading to a much smaller search space. The
amount of compression decreases as the number of processors increase. This is because the
amount of compression achieved is based on the assignment-based dependency relationship
among tasks in the assignment DAG (not the actual DAG). This relationship generally becomes
more complex with the increase of number of processors.
Table 3-14 shows the depth of search tree. Based on the results, the depth is proportional to
the number of processors (i.e., depth ≈ number of processors) and the size can be referred to as
66
O(t^p) where p is the number of processors. Thus, the maximum number of independent tasks
which unitSlack can be allocated together is approximately same to the number of processors.
Although the worst case size of search space is O(t^p), the use of compatible task
matrix/lists can lead to a substantially smaller number of tasks that are expanded (i.e., explorable
tasks) at each level. Furthermore, the maximum level that is searched is generally much smaller
than the depth. This makes the search space significantly smaller and is further reduced by the
use of branch and bound techniques. Table 3-15 shows the number of nodes explored in the
search. The number of nodes explored is considerably smaller than the total search space.
Table 3-13. Number of tasks participating in search with respect to different number of tasks and
processors Number of Tasks Number of Processors Number of Tasks Participating in Search
4 11.8 8 24.5 100 Tasks 16 42.4 4 12.1 8 24.8 200 Tasks 16 53.6
Table 3-14. Depth of search tree with respect to different number of tasks and processors
Number of Tasks Number of Processors Depth of Search Tree 4 4 8 8.2 100 Tasks 16 17.3 4 4 8 7.9 200 Tasks 16 17.4
67
Table 3-15. Number of nodes explored in search with respect to different number of tasks and processors
Number of Tasks Number of Processors Number of Node Explored in Search 4 22 8 1114 100 Tasks 16 141342 4 27 8 1000 200 Tasks 16 415924
68
CHAPTER 4 DYNAMIC SLACK ALLOCATION
Static scheduling algorithms for DAG execution use the estimated execution time. The
estimated execution time (ET) of tasks may be different from their actual execution time (AET)
at runtime. We divide the dynamic environments into two broad categories based on whether the
actual execution time is less than or more than the estimated time: overestimation (AET < ET)
and underestimation (AET > ET).
For most real time applications, an upper worst case bound on the actual execution time
(i.e., worst case execution time) of each task is used to guarantee that the application completes
in a given time bound. This corresponds to overestimation of actual execution time. Therefore,
many tasks may complete earlier than expected during the actual execution. This allows for
assignment-based dependent tasks to potentially start earlier than what was envisioned during the
static scheduling. The extra available slack can then be allocated to tasks that have not yet begun
execution with the goal of reducing the total energy requirements while still meeting the deadline
constraints.
For many applications that do not use the worst case time for estimation, historical data is
used to estimate the time requirements of each task and the estimated execution time may be less
than the actual execution time. This corresponds to underestimation of actual execution time. In
this case, for tasks where the time is underestimated, many future tasks may complete later than
expected during the actual execution. Thus, it cannot be guaranteed that the deadline constraints
will be always satisfied. However, slack can be removed from future tasks with the hope of
satisfying the deadline constraints as closely as possible.
A simple option for adjusting slack at runtime is to reapply the static slack allocation
algorithms for the unexecuted tasks when a task finishes early or late. However, the time
69
requirements of static algorithms applied at runtime are generally large and they may not be
practical for many runtime scenarios. We explore novel dynamic (or runtime) algorithms for
achieving these goals.
In this chapter, we present novel dynamic algorithms that lead to good performance in
terms of both computational time (i.e., runtime overhead) and energy requirements. The main
intuition behind our methods is that the slack allocation can be restricted to a small subset of
tasks so that the static slack allocation algorithms can be applied to a small subset rather than all
the tasks. There are three main contributions of our methods:
• They require significantly less computational time (i.e., runtime overhead) than applying the static algorithm at runtime for every instance when a task finishes early or late.
• The performance in terms of reducing energy and/or meeting a given deadline is comparable to applying the static algorithm at runtime.
• They are effective for cases when the estimated execution time of tasks is underestimated or overestimated.
4.1 Proposed Dynamic Slack Allocation
We assume that a static algorithm has already been applied before executing tasks and the
schedule needs to be adjusted whenever a task finishes early or late. The dynamic slack
allocation algorithm reallocates the slack whenever a task finishes earlier or later than expected
based on the current schedule. The current schedule is initialized to the static schedule and
updated whenever dynamic slack allocation is applied from the occurrence of early or late
finished tasks at runtime. Our algorithms do not change the assignment of tasks to the
processors.
The requirements of dynamic slack allocation algorithm depend on whether the execution
time is overestimated (AET < ET) or underestimated (AET > ET).
70
• Overestimation: The extra slack can be potentially allocated to tasks that are not yet executed. Here the goal of dynamic slack allocation algorithms is to reduce energy while still meeting deadline constraints.
• Underestimation: In this case, the primary goal of dynamic slack allocation algorithms is to reduce the slack of future tasks to try to complete the DAG within the deadline constraints or as closely as possible to the deadline. A secondary goal is to minimize the energy requirements.
Although our approach can be used in a mixed environment (i.e., an environment where
some tasks are underestimated and some tasks are overestimated), the main motivation is to
support an environment where the estimated execution time of tasks is mostly overestimated or
underestimated. The main focus for the case of underestimated tasks is to meet deadline, while
for overestimated tasks is to minimize energy.
The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks
for which the schedule will be readjusted. The schedule for the remaining tasks (i.e., tasks not
selected for the slack reallocation) is not affected. There are two steps that need to be addressed.
First, select the subset of tasks for slack reallocation. The potentially rescheduled tasks via the
dynamic slack allocation algorithm are tasks which have not yet started when the algorithm is
applied. We assume that the voltage can be selected before a task starts executing. The dynamic
slack allocation (i.e., rescheduling) is applied to the subset of tasks that depends on the
algorithm. The main reason to limit the potentially rescheduled tasks is to minimize the overhead
of reallocating the slack during runtime. Clearly, this should be done so that the other goal of
energy reduction is also met simultaneously. Second, determine the time range for the selected
tasks: The time range of the selected tasks has to be changed as some of the tasks have
completed earlier or later than expected. Based on the computation time in the current schedule
and assignment-based dependency relationships among tasks, we recompute the time range (i.e.,
earliest start time and latest finish time) where the selected tasks should be executed. Slack has to
71
be allocated to the selected tasks within this time range in order to try to meet the deadline
constraints.
At this stage a static slack allocation approach is applied to the subset of tasks within the
time range as described above. It is worth noting that the dynamic slack allocation algorithms
presented in this section are independent of the static scheduling algorithms. Once the tasks and
their constraints are determined, any static scheduling algorithm can be potentially used at
runtime. We have used the methods providing near optimal solutions (i.e., LP based approach,
Path based approach as described in Chapter 3) for this purpose. The computational overhead is
kept small due to the limited number of tasks selected for slack reallocation.
Before applying the dynamic slack allocation, the computation time of each selected task is
set to its estimated execution time used in the assignment algorithm (before any static slack
allocation) for calculating the slack during dynamic slack allocation. The slack is recalculated for
the selected tasks (ignoring the slack that was allocated during the static scheme). This will, in
general, lead to better energy requirements as considering the change of assignment-based
dependency relationships among tasks from the early finished task. It is based on the fact that the
slack allocation by considering assignment-based dependency relationships among tasks leads to
better performance in terms of reducing energy.
4.1.1 Choosing a Subset of Tasks for Slack Reallocation
The proposed dynamic slack allocation algorithms are based on choosing a subset of tasks
for which the schedule will be readjusted. The schedule for the remaining tasks (i.e., tasks not
selected for the slack reallocation) is not affected. Figure 4-1 shows the subset of tasks for slack
reallocation in an assignment DAG when task τ2 finishes early or late based on two dynamic
slack allocation algorithms that reallocate slack: k time lookahead approach and k descendent
72
lookahead approach. These approaches are described in detail in the next subsections. Note that
the assignment DAG may be changed based on the change of assignment-based direct
dependency relationships due to the slack reallocation and the early or late finished tasks.
4.1.1.1 Greedy approach
In greedy approach, only the assignment-based direct successors of the early or late
finished task are considered for readjusting the schedule. In the example shown in Figure 4-1,
only the direct successors of task τ2, e.g., tasks τ4 and τ5, are considered for slack allocation. The
greedy approach uses slack forwarding [51], which allocates slack to a direct successor of the
early or late finished task on the same processor. We extend the greedy approach in [51] by
considering all assignment-based direct successors for slack allocation on any processors. This
extension is expected to make more energy reduced compared to allocating slack to a single task.
4.1.1.2 The k time lookahead approach
Using k time lookahead approach, all tasks within a limited range of time are considered
for readjusting the schedule. The range of time is limited based on the value of k (i.e., k *
maximum computation time of tasks). The maximum computation time is defined as the
computation time of the task that takes the maximum time. In the example shown in Figure 4-1,
assume that the computation time of each task is one unit, the communication time among tasks
is zero, and the tasks in the same depth finish at the same time for ease of presentation of the key
concepts. In this case, if k is equal to 2, the time range would be 2 units (2 * one unit) and then
tasks within the time range from the finish of task τ2, e.g., τ4, τ5, τ6, τ7, τ8, τ9, and τ10, are
considered. The set of tasks selected for the slack reallocation when task τl finishes early is
defined by
73
s.t. where
},max
lll
jΓτliliiallocation
estaticFTimftimeτ
compTimek*fimeime, staticFTftimeme|staticSTi{τΓj
≠
+≤≥=∈
where staticSTimei is the start time of task τi in the static or previous schedule, staticFTimei is the
finish time of task τi in the static or previous schedule, ftimel is the actual finish time of task τl at
runtime, and compTimej is the computation time of task τj on its assigned processor, a.k.a., the
estimated execution time at the maximum voltage.
The approach with ‘all’ option for k (i.e., k-all time lookahead approach) corresponds to
the static slack allocation approach without the limitation on the time range for tasks considered
for rescheduling. Thus the k-all time lookahead approach is same as applying the static slack
allocation to all the remaining tasks at runtime. One would expect this to be close to the best that
can be achieved, particularly when applying near optimal static slack allocation algorithms (i.e.,
LP based approach, Path based approach) as described in Chapter 3. The set of tasks selected for
the slack reallocation when task τl finishes early is defined by
lllliiallocation estaticFTimftimeτftimeestaticSTimΓ ≠≥= s.t. where},|{τ
4.1.1.3 The k descendent lookahead approach
Unlike the k time lookahead approach, the k descendent lookahead approach considers
only tasks whose schedules are directly influenced by the early or late finished task. The main
intuition is that limiting the tasks to direct descendants will reduce scheduling time requirements
and also lead to good performance in terms of energy as keeping the schedule for uninfluenced
tasks or indirectly influenced tasks. Specifically, the k-th assignment-based direct successors of
the early or late finished task are considered. The number of tasks considered for readjusting the
schedule is limited with the value of k. Only descendants that are at a distance up to k are
considered. In the example of Figure 4-1, using the descendent lookahead approach that k is
74
equal to 2, the considered tasks are direct assignment-based successors of task τ2, e.g., tasks τ4
and τ5, and their direct successors, e.g., tasks τ7, τ8, and τ9 . However, task τ9 will not be
allocated slack because of no available slack for the task due to the direct dependency of task τ6.
The approach with ‘all’ option for k (i.e., k-all descendent lookahead approach) corresponds to
setting k equal to the remaining depth. The set of tasks selected for the slack reallocation is
defined by
stepfirst after the step previous at the generated step,first at the s.t. where
},|{
allocationl
lll
liiallocation
ΓτestaticFTimftimeτ
assgnSuccΓ
∈≠
∈= ττ
where assgnSuccl is the set of assignment-based direct successors of task τl.
Figure 4-1. Tasks selected for slack reallocation in an assignment DAG depending on dynamic slack allocation algorithms
1
2
4
3
5 6
7 8 9
11
10
1
2 3
6
7 8 9
11
10
k-2 Time Lookahead
k-2 Descendent Lookahead
k-all Descendent Lookahead
4 5
k-all Time Lookahead (Static DVS applied at runtime)
Greedy
75
4.1.2 Time Range for Selected Tasks
The static schedule (or the previous schedule updated at runtime) for tasks not in the set of
slack reallocable tasks (i.e., the set of selected tasks for slack reallocation) is kept to be the same.
For the set of slack reallocable tasks, the following changes are made before applying algorithms
for slack reallocation: computation time, start time, finish time (or deadline) of tasks.
First, the minimum computation time of a task is set to its estimated time at the maximum
voltage (i.e., staticCTimei = compTimei where τi ∈ Γallocation. Here staticCTimei is the computation
time of task τi in the static or previous schedule generated by the last slack reallocation). This is
the same time that was used during static assignment process. This effectively ensures that
maximum flexibility is available for slack reallocation. For instance, for tasks τ5 and τ8 in Figure
4-2 (c), their computation time is changed into their own estimated computation time before
applying runtime algorithm. However, their computation time in Figure 4-2 (d) is not changed
since it depends on whether or not they are slack reallocable tasks. Tasks in light grey colored
boxes indicate slack reallocable tasks.
Next, the start time of the tasks is changed as flexibly as possible to meet the deadline
constraints as well as the finish times of assignment-based predecessors of each task. Note that
the finish time of the predecessors that have already completed or are not part of the selected
tasks is fixed. In a case of overestimation, the selected tasks for slack reallocation may start
earlier than the current scheduled time. For instance, in Figure 4-2 (c), due to the early finish of
task τ1, task τ3 and task τ4 can start early, but task τ5 cannot start early because of the
assignment-based direct dependency relationship with task τ2. Meanwhile, in a case of
underestimation, the selected tasks for slack reallocation may have to start later than the current
scheduled time. For instance, in Figure 4-3 (c), due to the late finish of task τ1, tasks τ3 and τ4
76
should start late, but task τ5 can still start early because it is not directly influenced by the late
finished task τ1.
Finally, the finish time (or deadlines) of the tasks is changed so that they can be completed
as late as possible while ensuring that deadline constraints are (as closely) met. The successors of
a task that is not part of the selected tasks are based on the current schedule (i.e., task τ7 in Figure
4-2 (d) and Figure 4-3 (d)). In a case of overestimation, the deadlines for the selected tasks keep
their scheduled finish time. For instance, it is acceptable if slack reallocable tasks τ6, τ7, and τ8
finish no later than their finish time in the static schedule depicted in Figure 4-2 (a). Meanwhile,
in a case of underestimation, the deadlines for the selected tasks may be pushed back to ensure
that each task can complete at maximum voltage. For instance, the deadline of task τ7 has to be
increased as there is no slack in τ4. The deadlines of other tasks that can complete their execution
before their scheduled finish time (i.e., task τ6) are not changed since changing their deadlines
into the maximum finish time (i.e., finish time of task τ7) may negatively impact the remaining
tasks.
Figure 4-2 and Figure 4-3 illustrate the application of the above constraints both for k time
lookahead approach and k descendent lookahead approach, for the cases of overestimation and
underestimation respectively. The dotted box shows the range of time consisting of the start time
and the finish time (or deadline) for slack reallocable tasks which are considered for slack
reallocation at runtime. For edges among tasks, the solid line represents an assignment-based
direct dependency relationship among the tasks while the dotted line represents an assignment-
based indirect dependency relationship among the tasks.
Using the above constraints, each slack allocable task has different amount of the
maximum available slack for reallocation. The actual slack is computed to be within the time
77
range for slack reallocable tasks. The maximum available slack of slack reallocable task τi,
slacki, is defined by the difference of the latest start time of task τi, LSTi, and the earliest start
time of task τi, ESTi. The latest start time of task τi, the earliest start time of task τi, and the
maximum available slack of task τi are computed as follows, respectively:
( ) iijjsuccpSuccii estaticCTimcommTimeLSTLSTdeadlineLSTij
i−⎟
⎠⎞
⎜⎝⎛ −=
∈τmin,,min
( )( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
++
+=
∈ ijjjpred
pPredpPredi
i commTimeestaticCTimEST
estaticCTimESTstartEST
ij
ii
τmax
,,max
iii ESTLSTslack −=
where deadlinei is the deadline of task τi, starti is the start time of task τi, succi is the set of direct
successors of task τi in a DAG, pSucci is the task assigned next to task τi on the same assigned
processor, predi is the set of direct predecessors of task τi in a DAG, pPredi is the task assigned
prior to task τi on the same assigned processor, commTimeij is the communication time between
task τi and task τj on their assigned processors, and staticCTimei is the computation time of task
τi in the static or previous schedule generated by the last slack reallocation. Here the earliest start
time and the latest start time of a task not included in the set of slack reallocable tasks are equal
to its start time based on completed (i.e., callocationjjjj ΓτwhereestaticSTimLSTEST ∈== , Here
staticSTimej is the start time of task τi in the static or previous schedule generated by the last
slack reallocation).
Once the time range is determined for slack reallocable tasks, the slack is reallocated to
appropriate tasks by using a slack allocation approach in order to minimize total energy
requirements and then the schedule is updated.
78
Figure 4-2. Overestimation: Time range for selected slack allocable tasks using k-time lookahead
approach and k-descendent lookahead approach: (a) Initial static schedule, (b) Schedule from the early finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach
Figure 4-3. Underestimation: Time range for selected slack allocable tasks using k-time
lookahead approach and k-descendent lookahead approach: (a) Initial static schedule, (b) Schedule from the late finished task, (c) State before applying k time lookahead approach, (d) State before applying k descendent lookahead approach
1 2
3 4
6 7
5
9
8
deadline
1 2
3 4
6
7
5
9
8
1 2
3 4
6 7
5
9
8
1 2
3 4
6
7
5
9
8
(d) (b) (c) (a)
slack reallocable task late finished task
start
(d) (b) (c) (a)
1 2
3 4
6 7
5
9
8
1 2
3 4
6
7
5
9
8
1 2
3 4
6
7
5
9
8
slack reallocable task early finished task
deadline
2
3 4
6
7
5
9
8
1start
79
4.2 Experimental Results
In this section, we compare the performance of various dynamic slack allocation
algorithms (i.e., k-Descendent, k-Time}, and Greedy) and compare them to applying static slack
allocation in dynamic environments.
Each dynamic algorithm is applied to a static schedule given through a known assignment
algorithm which assigns based on the early finish time and a static slack allocation algorithm
(i.e., LPDVS, PathDVS). Our previous experiments in Chapter 3 show that the energy
minimization of LPDVS is comparable to PathDVS while its time requirement is higher. To
distinguish PathDVS that is used to generate a static schedule, we call PathDVS applied at
runtime as dPathDVS. The size of unit slack for PathDVS and dPathDVS is set to (0.001 * finish
time of a DAG) based on empirical results for static slack as described in Chapter 3.
4.2.1 Simulation Methodology
In this section, we describe DAG generation, dynamic environments generation, and
performance measure used in our experiments.
4.2.1.1 The DAG generation
We randomly generated a large number of graphs with 100 and 200 tasks. Since the results
for heterogeneous environments are similar to those for homogeneous environments, we present
only the results for the latter. The execution time of each task is varied from 10 to 40 units and
the communication time among tasks is set to 2 units. The execution of graphs is performed on 4,
8, and 16 processors.
4.2.1.2 Dynamic environments generation
We simulated a number of dynamic cases to study the effectiveness of our algorithms.
Here are some of the important parameters that can be varied to create dynamic cases for
overestimation and underestimation respectively:
80
Overestimation
• The fraction of tasks that finish earlier than expected (i.e., tasks with AET < ET) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyFinishedTaskRate * total number of tasks).
• The fractional difference between actual execution time and estimated time for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecreaseRate * estimated execution time).
Underestimation
• The fraction of tasks that finish later than expected (i.e., tasks with AET > ET) is given by the lateFinishedTaskRate (i.e., number of late finished tasks = lateFinishedTaskRate * total number of tasks).
• The fractional difference between actual execution time and estimated time for each task that finishes late is given by timeIncreaseRate (i.e., amount of increase = timeIncreaseRate * estimated execution time).
To generate cases with overestimation, we experimented with earlyFinishedTaskRate’s equal to
0.2, 0.4, 0.6, and 0.8 and timeDecreaseRate's equal to 0.1, 0.2, 0.3, and 0.4. To generate cases
with underestimation, we experimented with lateFinishedTaskRate’s equal to 0.2, 0.4, 0.6, and
0.8 and timeIncreaseRate’s equal to 0.05, 0.1, 0.15, and 0.2.
The deadline is determined by: deadline = (1 + deadline extension rate) * total finish time
from assignments without DVS scheme. The deadline corresponds to the time requirements of an
execution schedule that minimizes execution time for a given set of processors. This represents
the overall slack that is available for allocation. We experimented with deadline extension rates
equal to 0.0 (no extension), 0.01, 0.02, 0.05, 0.1, and 0.2.
4.2.1.3 Performance measures
An important measure is the amount of computational time (i.e., runtime overhead)
required to readjust the schedule when the execution time is less than or greater than estimated
time. The followings are other important measures for cases with overestimation and
underestimation.
81
For the case of overestimation, normalized energy consumption is measured. This is
computed as the total energy required for completing the DAG by the total energy for
completing the DAG assuming static slack allocation (i.e. all tasks completing in exactly their
estimated time). A lower value of the normalized energy consumption is desirable.
And, for the case of underestimation, deadline miss ratio and energy increase ratio are
measured. When the tasks take more time than the estimated time, the overall execution time
may be more than the deadline. The deadline miss ratio measures the difference between the
actual execution time and the deadline normalized by the deadline. A lower value of the deadline
miss ratio is desirable. A value equal to zero implies that the deadline was not missed. And, the
energy increase ratio is computed as the increase in total energy required for completing the
DAG by the total energy for completing the DAG assuming static slack allocation (i.e. all tasks
completing in exactly their estimated time). A lower value of the energy increase ratio is
desirable.
4.2.2 Overestimation
In this section, we show the performance of our algorithms in the case that the execution
time of tasks is overestimated (i.e., the actual execution time of a task is less than its estimated
time).
4.2.2.1 Comparison of energy requirements
We first compared k-all descendent algorithm with Greedy and dPathDVS algorithms.
Figure 4-4 shows the normalized energy requirements of kallDescendent, Greedy, and
dPathDVS algorithms with respect to different time decrease rates and different early finished
task rates for no deadline extension (i.e., deadline extension rate equal to zero). The results show
that the energy requirements of kallDescendent are significantly better than the greedy approach.
For instance, for timeDecreaseRate equal to 0.4, kallDescendent reduces energy by 17% and
82
29% as compared to Greedy algorithm with 0.2 and 0.8 earlyFinishedTaskRate’s, respectively.
Most importantly, the energy requirements vis-a-vis dPathDVS are within 1% in almost all
cases. The time requirement of kallDescendent is one to two orders of magnitude smaller than
dPathDVS as shown in Figure 4-11. These results demonstrate the subset of tasks that comprise
only the descendants can be used for slack allocation to simultaneously reduce time requirements
while keeping the energy requirements to be comparable to using static scheduling algorithms at
runtime.
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
GreedydPathDVSkallDescendent
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
GreedydPathDVSkallDescendent
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
GreedydPathDVSkallDescendent
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
GreedydPathDVSkallDescendent
Figure 4-4. Normalized energy consumption of Greedy, dPathDVS, and kallDescendent with
respect to different early finished task rates and time decrease rates for no deadline extension
Table 4-1 shows the energy comparison of our proposed algorithms, k time lookahead (i.e,
kTime) and k descendent lookahead (i.e., kDescendent) algorithms, with variable k values for
each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent).
83
These results show that the energy requirements of k3Time or k6Descendent are comparable
with those of kallDescendent. The difference between k6Descendent (or k3Time) and
kallDescendent is within 1-5% of each other. While kallDescendent is better than k3Time and
k6Descendent when the fraction of early finished tasks is small, k6Descendent and k3Time are
better when the fraction of early finished tasks is large.
Table 4-1. Normalized energy consumption of k time lookahead and k descendent lookahead
algorithms with different k values with respect to different early finished task rates and time decrease rates for no deadline extension
Early Finished
Task Rate
Time Decrease
Rate
k2 Time
k3 Time
k4 Descendent
k6 Descendent
kall Descendent
0.1 0.9425 0.9367 0.9379 0.9334 0.9207
0.2 0.9108 0.9008 0.9024 0.8952 0.8753
0.3 0.8866 0.8721 0.8738 0.8639 0.8372 0.2
0.4 0.8701 0.8506 0.8515 0.8393 0.8069
0.1 0.8899 0.8826 0.8845 0.8800 0.8780
0.2 0.8307 0.8194 0.8223 0.8164 0.8153
0.3 0.7857 0.7696 0.7730 0.7657 0.7660 0.4
0.4 0.7527 0.7312 0.7348 0.7266 0.7276
0.1 0.8481 0.8426 0.8420 0.8404 0.8424
0.2 0.7699 0.7621 0.7610 0.7604 0.7657
0.3 0.7092 0.6984 0.6969 0.6974 0.7061 0.6
0.4 0.6647 0.6497 0.6480 0.6492 0.6606
0.1 0.8070 0.8023 0.8007 0.8002 0.8071
0.2 0.7111 0.7057 0.7029 0.7041 0.7186
0.3 0.6430 0.6355 0.6319 0.6343 0.6548 0.8
0.4 0.5890 0.5782 0.5744 0.5781 0.6042
84
Figures 4-5, 4-6, 4-7, 4-8, 4-9, and 4-10 show the energy requirements of our proposed
dynamic slack allocation algorithms, k time lookahead (i.e., kTime) and k descendent (i.e.,
kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is equal to 2
and 3 for kTime, and 4, 6, and all option for kDescendent), greedy algorithm (i.e., Greedy), and
static slack allocation applied at runtime (i.e., dPathDVS), for no deadline extension, 0.01, 0.02,
0.05, 0.1, and 0.2 deadline extension rates, respectively. The results are very similar with ones
for no deadline extensions as described in the above.
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-5. Normalized energy consumption for no deadline extension
85
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-6. Normalized energy consumption for 0.01 deadline extension rate
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-7. Normalized energy consumption for 0.02 deadline extension rate
86
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-8. Normalized energy consumption for 0.05 deadline extension rate
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-9. Normalized energy consumption for 0.1 deadline extension rate
87
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Norm
aliz
ed E
nerg
y GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-10. Normalized energy consumption for 0.2 deadline extension rate
4.2.2.2 Comparison of time requirements
Figure 4-11 shows the average time requirements to readjust the schedule due to a single
task's early finish. The computational time of k6Descendent is roughly an order of magnitude
lower than kallDescendent and 3-4 times lower than k3Time. Based on the time and energy
comparisons described above, k6Descendent provides reasonable performance in energy
requirements at substantially lower overheads.
88
10000
100000
1000000
10000000
100000000
1000000000
0.1 0.2 0.3 0.4Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSk2Timek3Timek4Descendentk6DescendentkallDescendent
Figure 4-11. Computational time to readjust the schedule from an early finished task with respect to different time decrease rates for no deadline extension (unit: ns - via logarithmic scale)
Figure 4-12 shows the time requirements to readjust the schedule due to a single task’s
early finish with respect to different time decrease rates for different deadline extension rates
(i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results
are very similar with ones for no deadline extension as described in the above.
89
(a) 0.01 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(b) 0.02 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(c) 0.05 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(d) 0.1 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(e) 0.2 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-12. Results for variable deadline extension rates: Computational time to readjust the schedule from one early finished task with respect to different time decrease rates (unit: ns – via logarithmic scale): (a) for 0.01 deadline extension rate, (b) for 0.02 deadline extension rate, (c) for 0.05 deadline extension rate, (d) for 0.1 deadline extension rate, and (e) for 0.2 deadline extension rate
4.2.3 Underestimation
In this section, we show the performance of our algorithms in the case that the execution
time of tasks is underestimated (i.e., the actual execution time of a task is greater than its
estimated time).
90
4.2.3.1 Comparison of deadline requirements
We first compared k-all descendent algorithm with Greedy and dPathDVS algorithms. The
results in Figure 4-13 show that kallDescendent is significantly better than the greedy approach
in terms of being able to maintain the deadline requirements. Most importantly, the deadline
missed ratio vis-a-vis dPathDVS was within 0.1% in most cases. The time requirement of
kallDescendent is one to two orders of magnitude smaller than dPathDVS as shosn in Figure 4-
27. These results demonstrate the subset of tasks that comprise only the descendants can be used
for slack allocation to simultaneously reduce time requirements while meeting the deadline as
closely as the static algorithms (executed at runtime).
0.2 Late Finished Task Rate
0
0.03
0.06
0.09
0.12
0.15
0.05 0.1 0.15 0.2Time Increase Rate
Dea
dlin
e M
iss
Rat
io
No SchemeGreedydPathDVSkallDescendent
0.4 Late Finished Task Rate
0
0.03
0.06
0.09
0.12
0.15
0.05 0.1 0.15 0.2Time Increase Rate
Dea
dlin
e M
iss
Rat
io
No SchemeGreedydPathDVSkallDescendent
0.6 Late Finished Task Rate
0
0.03
0.06
0.09
0.12
0.15
0.05 0.1 0.15 0.2Time Increase Rate
Dea
dlin
e M
iss
Rat
io
No SchemeGreedydPathDVSkallDescendent
0.8 Late Finished Task Rate
0
0.03
0.06
0.09
0.12
0.15
0.05 0.1 0.15 0.2Time Increase Rate
Dea
dlin
e M
iss
Rat
io
No SchemeGreedydPathDVSkallDescendent
Figure 4-13. Deadline miss ratio with respect to different time increase rates and late finished task rates for 0.05 deadline extension rate
91
Table 4-2 shows the deadline miss ratio of our proposed algorithms, k time lookahead (i.e,
kTime) and k descendent lookahead (i.e., kDescendent) algorithms, with variable k values for
each algorithm (i.e., k is equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent).
These results show that the deadline miss ratios of k3Time and k6Descendent are comparable
with that of kallDescendent.
Table 4-2. Deadline miss ratio of k time lookahead and k descendent lookahead algorithms with different k values with respect to different late finished task rates and time increase rates for 0.05 deadline extension rate
Late Finished
Task Rate
Time Increase
Rate
k2 Time
k3 Time
k4 Descendent
k6 Descendent
kall Descendent
0.05 0.001 0.000 0.001 0.000 0.000
0.1 0.004 0.002 0.004 0.002 0.000
0.15 0.010 0.007 0.010 0.006 0.0010.2
0.2 0.018 0.013 0.018 0.013 0.003
0.05 0.000 0.000 0.000 0.000 0.000
0.1 0.003 0.001 0.003 0.002 0.001
0.15 0.010 0.008 0.010 0.009 0.0060.4
0.2 0.022 0.020 0.022 0.020 0.016
0.05 0.003 0.003 0.003 0.003 0.003
0.1 0.012 0.012 0.012 0.012 0.013
0.15 0.027 0.027 0.027 0.028 0.0290.6
0.2 0.050 0.051 0.051 0.051 0.052
0.05 0.010 0.009 0.010 0.010 0.010
0.1 0.033 0.033 0.033 0.034 0.035
0.15 0.061 0.062 0.062 0.062 0.0640.8
0.2 0.100 0.100 0.100 0.100 0.101
92
Figures 4-14, 4-15, 4-16, 4-17, 4-18, and 4-19 show the deadline miss ratio of our
proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent
(i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is
equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), static scheduling without
any change at runtime (i.e., NoScheme), greedy algorithm (i.e., Greedy), and static slack
allocation applied at runtime (i.e., dPathDVS), with respect to different time increase rates and
different early finished task rates, for no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2
deadline extension rates, respectively. The results are very similar with ones for 0.05 deadline
extension rate described in the above.
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-14. Deadline miss ratio for no deadline extension
93
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-15. Deadline miss ratio for 0.01 deadline extension rate
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-16. Deadline miss ratio for 0.02 deadline extension rate
94
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-17. Deadline miss ratio for 0.05 deadline extension rate
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s Ra
tio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-18. Deadline miss ratio for 0.1 deadline extension rate
95
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.05 0.1 0.15 0.2
Time Increase Rate
Dead
line
Mis
s R
atio
NoSchemeGreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-19. Deadline miss ratio for 0.2 deadline extension rate
4.2.3.2 Comparison of energy requirements
Figure 4-20 shows the energy increase ratio for the three algorithms: dPathDVS,
kallDescendent, and k6Descendent. The deadline extension rate is set to 0.05 (this corresponds to
the case when the amount of slack is small and has to the potential of a large number of deadline
misses). The three algorithms were found to be comparable for the amount of energy increase. In
general, the k-6 descendent lookahead algorithm is better in terms of energy when larger number
of tasks finish late while the static algorithm applied at runtime is better when smaller number of
tasks finish late.
96
0.2 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2Time Increase Rate
Ener
gy In
crea
se R
atio
dPathDVSkallDescendentk6Descendent
0.4 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2Time Increase Rate
Ener
gy In
crea
se R
atio
dPathDVSkallDescendentk6Descendent
0.6 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2Time Increase Rate
Ener
gy In
crea
se R
atio
dPathDVSkallDescendentk6Descendent
0.8 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2Time Increase Rate
Ener
gy In
crea
se R
atio
dPathDVSkallDescendentk6Descendent
Figure 4-20. Energy increase ratio with respect to different time increase rates and late finished
task rates for 0.05 deadline extension rate
Figures 4-21, 4-22, 4-23, 4-24, 4-25, and 4-26 show the energy increase ratio of our
proposed dynamic slack allocation algorithms, k time lookahead (i.e, kTime) and k descendent
(i.e., kDescendent) lookahead algorithms with variable k values for each algorithm (i.e., k is
equal to 2 and 3 for kTime, and 4, 6, and all option for kDescendent), greedy algorithm (i.e.,
Greedy), and static slack allocation applied at runtime (i.e., dPathDVS), with respect to different
time increase rates and different early finished task rates, for no deadline extension, 0.01, 0.02,
0.05, 0.1, and 0.2 deadline extension rates, respectively. The results are very similar with ones
for 0.05 deadline extension rate as described in the above.
97
0.2 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-21. Energy increase ratio for no deadline extension
0.2 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-22. Energy increase ratio for 0.01 deadline extension rate
98
0.2 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-23. Energy increase ratio for 0.02 deadline extension rate
0.2 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-24. Energy increase ratio for 0.05 deadline extension rate
99
0.2 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.05 0.1 0.15
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.3
0.05 0.1 0.15 0.2
Time Increase Rate
Ene
rgy
Incr
ease
Rat
io dPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-25. Energy increase ratio for 0.1 deadline extension rate
0.2 Late Finished Task Rate
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.4 Late Finished Task Rate
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.6 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
0.8 Late Finished Task Rate
0
0.05
0.1
0.15
0.2
0.25
0.3
0.05 0.1 0.15 0.2
Time Increase Rate
Ener
gy In
crea
se R
atio dPathDVS
kallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-26. Energy increase ratio for 0.2 deadline extension rate
100
4.2.3.3 Comparison of time requirements
Figure 4-27 shows the average time requirements to readjust the schedule per task that is
underestimated. The computational time of k6Descendent is roughly an order of magnitude
lower than kallDescendent and 3-4 times lower than k3Time. Based on the time, deadline miss
ratio, and energy increase ratio comparisons described above, k6Descendent provides reasonable
performance in deadline satisfaction and energy requirements at substantially lower overheads.
1000
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
c
Figure 4-27. Computational time to readjust the schedule from a late finished task with respect to
different time increase rates for no deadline extension (unit: ns - via logarithmic scale)
Figure 4-28 shows the time requirements to readjust the schedule due to a single task’s
early finish with respect to different time decrease rates for different deadline extension rates
(i.e., no deadline extension, 0.01, 0.02, 0.05, 0.1, and 0.2 deadline extension rates). The results
are very similar with ones for 0.05 deadline extension rate as described in the above.
101
(a) No Deadline Extension
10000
100000
1000000
10000000
100000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(b) 0.01 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(c) 0.02 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(d) 0.1 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
(e) 0.2 Deadline Extension Rate
10000
100000
1000000
10000000
100000000
1000000000
10000000000
0.05 0.1 0.15 0.2
Time Increase Rate
Com
puta
tiona
l Tim
e GreedydPathDVSkallDescendentk4Descendentk6Descendentk2Timek3Time
Figure 4-28. Results for variable deadline extension rates: Computational time to readjust the
schedule from one late finished task with respect to different time decrease rates (unit: ns – via logarithmic scale): (a) for 0.0 deadline extension rate (no deadline extension) (b) for 0.01 deadline extension rate, (c) for 0.02 deadline extension rate, (d) for 0.1 deadline extension rate, and (e) for 0.2 deadline extension rate
102
CHAPTER 5 STATIC ASSIGNMENT
As presented in Chapter 1, the following two step processes are generally used for
scheduling tasks with the goal of energy minimization while still meeting deadline constratins:
assignment and then slack allocation. In this chapter, we explore the assignment process at
compile time (i.e., static assignment) which determines the ordering to execute tasks and the
mapping of tasks to processors based on the computation time at the maximum voltage level.
Note that the finish time of DAG at the maximum voltage has to be less than or equal to the
deadline for any feasible schedule.
Most of the prior research on scheduling for energy minimization of DAGs on parallel
machines is based on deriving an assignment schedule that minimizes total finish time in the
assignment step. Simple list based scheduling algorithms are generally used for this purpose.
This may be a reasonable approach as minimizing finish time generally leads to more slack to be
allocated and finally reducing the energy requirements during the slack allocation step. However,
this approach is not enough to minimize total energy consumption because it cannot incorporate
the differential energy and time requirements of each task of the workflow on different
processors.
For the first step, we present a novel algorithm that has lower finish time compared to
existing algorithms for a heterogeneous environment. We show that the extra slack that this
algorithm generates can lead to overall reduction in energy after slack allocation as compared to
existing algorithms.
The main thrust of this chapter is to show that incorporating energy minimization during
the assignment process can lead to even better results. Genetic Algorithm (GA) based scheduling
algorithms [56, 57] have tried to partially address this issue by searching through a large number
103
of assignments. This approach was shown to outperform existing algorithms in terms of energy
consumption based on their experimental results. However, the assignment itself does not
consider the energy consumption after slack allocation during assignment. Furthermore, the
testing of energy requirements of multiple solutions each corresponding to a different assignment
requires considerable computational time. We present novel algorithms which can achieve
assignments with better energy requirements at lower computational times as compared to the
Genetic Algorithm based methods.
5.1 Overall Scheduling Process
In this section, we present the overall process of our proposed scheduling approach. A high
level description of our proposed scheduling approach is illustrated in Figure 5-1.
In the first step, tasks are assigned to processors with the goal of minimizing total finish
time of a DAG to derive a Baseline Assignment. This is done for two reasons:
• Check whether the deadline constraints can be met. Note that if the deadline is shorter than the finish time of the DAG, the DAG cannot be feasibly executed in the required time.
• Generate an initial time based task prioritization to determine the scheduling order of tasks. This “minimizing time” based prioritization is called Baseline Prioritization for the rest of this chapter.
If a feasible assignment is derived, then a DVS based slack allocation scheme is applied
with the goal of energy minimization. This energy serves as Baseline Energy. In general, having
a lower finish time can lead to a larger amount of slack that can be allocated to the appropriate
tasks during the slack allocation step. This can lead to a large reduction in energy requirements
as compared to algorithms that have a larger finish time. However, incorporating DVS based
energy minimization during the assignment process can provide better solutions for energy
minimization. Thus our goal is to derive assignments that have better energy requirements than
the baseline energy.
104
The baseline prioritization along with the energy requirements of each task is used to
generate multiple prioritizations. Each prioritization is based on a parameter α that weighs the
importance of time versus energy for the assignment. For each such prioritization, a time
minimization assignment algorithm is applied to minimize total finish time. Note that if the
finish time for a given prioritization is larger than the deadline constraint, it cannot be feasibly
executed in the required time constraint and the prioritization is abandoned.
For all the feasible prioritizations, the following steps are applied:
• Step 1: An estimated deadline is assigned to each task. This estimated deadline is based on the criticality of the task in the schedule in order to meet the deadline constraints.
• Step 2: An assignment for the estimated DVS based energy minimization is now applied such that the estimated deadline constraints defined in step 1 are generally met.
If the above provides a feasible assignment (i.e., the one whose finish time is less than or
equal to the deadline), a DVS based slack allocation scheme is applied to minimize energy. The
estimated deadline assigned to each task in step 1 as described above is parameterized based on a
parameter β. Higher value of β allows for potentially lower energy requirements as providing
higher flexibility for processor selection, but a higher probability of deriving an assignment that
does not meet the deadline constraints. The above steps are executed for each value of the
parameter β each potentially resulting in a different assignment. The feasible assignment with the
least energy for different values of α and β is chosen.
105
Figure 5-1. A high level description of proposed scheduling approach
Assignment to minimize finish time
Assignment to minimize finish time
Time based task prioritization
Energy based task prioritization
Task prioritization
Weighting factor on time, α
(a)
Feasible task prioritization
Feasible solution?
Assignment to minimize finish time
Assignment to minimize finish time
Time based task prioritization
Energy based task prioritization
Task prioritization
Weighting factor on time, α
(a)
Feasible task prioritization
Feasible solution?
Task deadline based on the assignment to minimize time
DVS
Assignment to minimize energy
Get a schedule with the minimum energy
Weighting factor on latest finish time, β
Feasible solution?
(b)
Feasible task prioritization
Task deadline based on the assignment to minimize time
DVS
Assignment to minimize energy
Get a schedule with the minimum energy
Weighting factor on latest finish time, β
Feasible solution?
(b)
Feasible task prioritization
106
It is worth noting that the above methodology is independent of the time minimization
assignment algorithm and the DVS scheme for slack allocation. As the time minimization
assignment algorithm, we have used ICP based assignment (which will be presented in the next
section) as it is shown to have superior performance over prior algorithms. Also, as the DVS
scheme, we have chosen PathDVS (presented in Chapter 3) which provides near optimal
solutions for slack allocation with smaller computational time requirements.
5.2 Proposed Static Assignment to Minimize Finish Time
Several scheduling algorithms for generating assignment that minimizes the finish time of
DAGs for a heterogeneous environment have been recently proposed [44, 62, 64]. Most of them
are based on static list scheduling heuristics to minimize the finish time of DAGs, for example,
Dynamic Level Scheduling (DLS) [62], Heterogeneous Earliest Finish Time (HEFT) [64], and
Iterative List Scheduling (ILS) [44]. The DLS algorithm selects a task to schedule and a
processor where the task will be executed at each step using earliest task first. The HEFT
algorithm reduces the cost of scheduling by using pre-calculated priorities of tasks in scheduling
and uses the earliest finish time for the selection of a processor. This can in general provide
better performance as compared to the DLS algorithm. However, since the HEFT uses the
average of computation time across all the processors for a given task to determine tasks’
priorities, it may lead an inaccurate ordering for executing tasks. To address the problem, the ILS
algorithm generates an initial schedule by using HEFT and iteratively improves it by updating
priorities of tasks.
Our approach is based on the fact that task prioritization can be improved by using a group
based approach. There are two main features of the proposed assignment algorithm, called
Iterative Critical Path (ICP). First, assign multiple independent ready tasks simultaneously. The
computation of priority of a task depends on estimating the execution path from this task to the
107
last task of the DAG representing the workflow. Since the mapping of tasks yet to be scheduled
is unknown and the cost of task execution depends on the processor that is assigned, the priority
has to be approximated during scheduling. Hence, it is difficult to explicitly distinguish the
execution order of tasks with similar priorities. Using this intuition, the proposed algorithm
forms independent ready tasks whose priorities are similar into a group and finds an optimal
solution (e.g., resource assignment) for this subset of tasks simultaneously. Here the set of ready
tasks that can be assigned consists of tasks for which all the predecessors have already been
assigned. Second, iteratively refine the scheduling. The scheduling is iteratively refined by using
the cost of the critical path based on the assignment generated in the previous iteration.
Assuming that the mappings of the previous iteration are good, it provides a better estimate of
the cost of the critical path than using the average or median computation and communication
time as the estimate in the first iteration.
5.2.1 Task Selection
To determine the scheduling order of tasks in our algorithm, the priority of each task is
computed using its critical path, which is the length of the longest path from the task to an exit
task. The critical path of each task is computed by traversing the graph from an exit task. The
critical path of task τi, cpi, is defined by
( )jijsuccii cpeavgCommTimeavgCompTimcpij
++=∈τ
max
where avgCompTimei is the average computation time of task τi, avgCommTimeij is the average
communication time between task τi and task τj, and succi is the set of direct successors of task τi
in a DAG.
Using the critical path of each task, the tasks are sorted into non-increasing order of their
critical path values. The composition of this generated task ordering list preserves the original
108
precedence constraints among tasks of the given DAG. During the assignment process, at each
step a list of ready tasks is used for the next set of tasks that can be assigned. The list of ready
tasks consists of tasks for which all the predecessors have already been assigned. ICP finds a
subset of these ready tasks whose values of critical paths are similar among ready tasks for
resource assignment, but which have no precedence relationships with each other. In other
words, the subset is composed of independent tasks whose predecessors are all assigned to
processors. The size of the selected subset is bounded by a pre-specified threshold value.
The average values of computation time and communication time are used at the initial
step. After the first assignment, the actual computation time and communication time based on
the previous assignment are used for the computation of critical path.
5.2.2 Processor Selection
ICP optimally assigns multiple independent ready tasks in the previous steps
simultaneously on the available processors. For a list of independent ready tasks, ICP finds the
best processor for each task in the list such that the total finish time of the selected subset of
tasks is minimized (i.e., Option 1) or the sum of finish time on processors is minimized (i.e.,
Option 2). For the goal to reduce the finish time of a DAG, Option 1 is to apply the goal directly
and Option 2 is to increase the possibility of minimizing the finish times of next tasks as leaving
more space for next tasks. Any of both methods and their combination can be applied for the
processor selection.
The optimal solution for the processor selection with the selected subset of tasks is
generated using ILP (Integer Linear Programming) formulation. The formulation for Options 1 is
as follows:
109
( )⎟⎟⎠
⎞
⎜⎜
⎝
⎛
++=
∈∈∀
∈ )(,,,)(,max,
max subject to
,max Minimize
kpkjikpkpred
ij
ijij
js
iij
commTimeftimeimeavailableT
compTimeftime
Ρ, pΓτfitme
ikτ
Here, ftimeij is the finish time of task τi on processor pj, Γs is the subset of ready tasks, P is the
set of processors, compTimeij is the computation time of task τi on processor pj, p(k) is the
processor where task τk is assigned, commTimei,j,k,p(k) is the communication time between task τi
on processor pj and task τk on processor p(k), and predi is the set of direct predecessors of task τi
in a DAG. The available start time of task τi from the free slot of processor pj is represented by
availableTimeij.
In the case of Option 2, only the objective function is changed from Option 1 and the
constraints are same with Option 1. The formulation for Options 2 is as follows:
( )⎟⎟⎠
⎞
⎜⎜
⎝
⎛
++=
∈∈∀
∈
∑
)(,,,)(,max,
max subject to
, Minimize
kpkjikpkpred
ij
ijij
js
iij
commTimeftimeimeavailableT
compTimeftime
Ρ, pΓτfitme
ikτ
We found that the schedules generated with either of these two options were comparable.
Thus, in the following, we limit ourselves to Option 1.
5.2.3 Iterative Scheduling
The ICP assignment method is based on an iterative scheduling in order to provide a better
estimate of the cost of the critical path. Figure 5-2 presents a high level description of the ICP
assignment procedure.
110
Figure 5-2. The ICP procedure
In the first iteration, the estimation of the critical path is based on average computation
time across all processors for the tasks yet to be scheduled. This can result in inaccuracies in
estimating the critical path. To reduce or eliminate the possibility of inappropriate assignment
due to an inaccurate critical path estimate, ICP iteratively reschedules tasks using a critical path
Initialize 1. minFinishTime = maxValue 2. Compute the average of computation time and communication time for each task 3. Compute the critical path value for each task based on the average values Procedure ICP 4. While there is a continuous improvement of performance do 5. Generate the list of tasks, Γ, sorted by non-increasing order of the critical path values 6. While the list of tasks is not empty do 7. Find tasks τi∈ Γ whose priorities are close, where τi ∉ succk and τk∈ Γs 8. Insert them into the list of ready tasks, Γs 9. Assign the ready tasks based on ILP formulation 10. If the finish time of each assigned task >= minFinishTime then 11. Break 12. End If 13. Delete tasks in Γs from Γ and empty Γs, Γs = {} 14. End While 15. If total finish time is less than minFinishTime then 16. Update minFinishTime 17. Assign each task τi ∈ Γ to its selected processor 18. End If 19. Compute the critical path based on the current assignment 20. If the times that total finish time is not improved over a current threshold is greater than k times or the critical path is same with one of previous assignment then 21. Change the number of ready tasks, threshold 22. End If 23. End While End Procedure
111
which is determined based on the assignment from the previous iteration of the scheduling
algorithm. In other words, the critical path of each task depends on the previous assignment (i.e.,
The computation time of each task for the computation of critical path is its computation time on
its assigned processor, not average computation time across all processors and also the
communication time among tasks is also based on the specified value based on their assigned
processors). This iterative refinement continues till the total finish time does not decrease or the
prespecified number of iteration times is completed. The value of the threshold for the subset of
tasks starts with a fixed value and is decremented by one if no reduction in finish time (i.e.,
schedule length) is seen after a few iterations. The change of threshold value increases the
possibility to improve the performance in terms of finish time.
5.3 Proposed Static Assignment to Minimize Energy
As described earlier, the prior research on scheduling for energy minimization has
concentrated on the slack allocation step to minimize the energy requirements during a given
phase while using simple list based scheduling approaches to minimize total finish time for the
assignment step. Unlike these methods, our proposed assignment algorithm considers the energy
requirements based on potential slack during the assignment step.
The main features of our assignment algorithm are as follows. First, utilize expected DVS
based energy information during assignment. Our algorithm assigns the appropriate processor for
each task such that the total energy expected after slack allocation is minimized. The expected
energy after slack allocation (i.e., expected DVS based energy) for each task is computed by
using the estimated deadline for each task so that the overall DAG can be executed within the
deadline of the DAG. Second, consider multiple task prioritizations. We test multiple
assignments using multiple task prioritizations based on tradeoffs between energy and time for
112
each task. The execution of these assignments can be potentially done in parallel to minimize the
computational time (i.e., runtime to execute algorithm).
The details on task prioritization, estimated deadline for each task, and processor selection
for our assignment algorithm to minimize DVS based energy are described in the subsequent
subsections.
5.3.1 Task Prioritization
In the time minimization assignment methods, the priorities of tasks which are used to
determine the scheduling order of tasks are based on only using time information without paying
any attention to energy requirements. The task prioritization in our algorithm is based on a
weighted sum between the time and energy requirements. After applying an assignment
algorithm to minimize finish time (i.e., baseline assignment), the baseline prioritization (i.e., time
based prioritization) is generated and used to determine the task prioritization for reapplying an
assignment algorithm. Appropriate choice of weight provides tradeoffs between energy and
deadline constraints.
To compute the time based priority of each task (i.e., baseline prioritization), we use its
critical path which is the length of the longest path from the task to an exit task. The critical path
of each task is computed in the same way with one for ICP assignment presented in the previous
section. The composition of the task ordering list generated based on the critical path of tasks
preserves the original precedence constraints among tasks of the given DAG. The critical path of
task τi, cpi, is defined by
( )jijsuccii cpcommTimecompTimecpij
++=∈τ
max
where compTimei is the computation time of task τi, commTimeij is the communication time
between task τi and task τj, and succi is the set of direct successors of task τi in a DAG.
113
Given the baseline prioritization, the priority of each task used in our algorithm is
recomputed by incorporating the energy information. The priority of task τi, priorityi, is defined
by
( ) 10 where,/1/ ≤≤×−+×= ∑Γ∈
ααατ k
kiii energyenergyCPcppriority
where CP is the critical path of a DAG (i.e., total finish time of a DAG), cpi is the critical path of
task τi, energyi is the energy consumed to execute task τi, α is the weight of time, and Γ is the
set of all tasks in a DAG.
If the weighting factor α is closer to zero, the task which requires the higher energy to be
executed is assigned to the appropriate processor with the higher priority than any other tasks
with the lower energy consumption. It is expected to lead to better performance in terms of
energy. However, due to the ignorance of time information, the finish time of the DAG may be
larger and even the deadline constraints may not be satisfied. If the weighting factor α is closer
to one, the probability of a feasible assignment of the DAG is higher, but the lack of
consideration on energy information may lead to lower energy performance.
The above prioritization is modified to accommodate the precedence relationships among
tasks during assignment, i.e., a successor task is always assigned after its predecessor tasks. For
instance of Figure 1-1, assume that the ordering of tasks based on the priority values is
τ5−τ3−τ1−τ4−τ2−τ6−τ7. Due to the precedence relationships among tasks, the actual execution
ordering for assignment is changed into τ1−τ3−τ2−τ5−τ4−τ6−τ7. Tasks τ, τ3, and τ2 precede task
τ5 although their priorities are lower and also task τ2 precedes task τ4 to execute task τ5 ahead of
task τ4 based on their priorities.
114
5.3.2 Estimated Deadline for a Task
The goal of the assignment is to minimize the expected total energy consumption after
slack allocation while still satisfying deadline constraints. Consider a scenario where the
assignment of a subset of tasks has been already completed and a given next task in the
prioritization list has to be assigned. The choice of the processors that can be assigned to this task
should be limited to the ones where expected finish time from the overall assignment will lead to
meeting the deadline constraints (else this will result in an infeasible assignment). Clearly, there
is no guarantee that the schedule derived will be is a feasible schedule (i.e., the schedule meeting
deadline) at the time when the assignment for a given task is being determined because the
feasibility of the schedule depends on the assignment of the other remaining tasks whose
assignment is not determined.
The proposed algorithm calculates the estimated deadline for each task, that is, deadline
expected to enable a feasible schedule if the task’s finish time satisfies its estimated deadline.
The estimated deadline of a task is an interpolated value between the earliest finish time to the
latest finish time using a weighting factor β. The latest finish time of task τi, LFTi, its earliest
finish time, EFTi, its estimated deadline, di, are respectively defined by
( )( )⎟
⎟
⎠
⎞
⎜⎜
⎝
⎛
−−
−=
∈ ijjjsucc
pSuccpSucci
i commTimecompTimeLFT
compTimeLFTdeadlineLFT
ij
ii
τmin
, ,min
( ) iijjpredpPredii compTimecommTimeEFTEFTstartEFTij
i+⎟
⎠⎞
⎜⎝⎛ +=
∈τmax,,max
( ) 10 where,1 ≤≤×−+×= βββ iii EFTLFTd
where deadlinei is the deadline of task τi, starti is the start time of task τi, compTimei is the
computation time of task τi on its assigned processor, commTimeij is the communication time
115
between task τi and task τj on their assigned processors, succi is the set of direct successors of
task τi in a DAG, pSucci is the task put next to task τi on the same assigned processor, predi is the
set of direct predecessors of task τi in a DAG, pPredi is the task put prior to task τi on the same
assigned processor, and β is the weight of the latest finish time.
If the weighting factor β is closer to one, the task is allowed more flexibility for processor
assignment as the task can take a longer time to complete. However, the probability of feasible
assignment of the DAG may be lower. If the weighting factor β is closer to zero, there is less
flexibility in assigning the task to a processor. However, the probability of a feasible assignment
of the DAG is higher. Also, as this potentially generates more slack after assignment, the slack
can be allocated by the DVS algorithm for energy minimization.
5.3.3 Processor Selection
Figure 5-3 presents a high level description of the assignment procedure for a given task
prioritization. The task is assigned to a processor such that the total energy consumption
expected after applying DVS scheme for the tasks that have already been assigned so far (and
including the new task that is being considered for assignment) is minimized while trying to
meet estimated deadline of the task. The candidate processors for the task are selected such that
the task can execute within its estimated deadline. Once selecting the candidate processors for
the task, the next process is followed depending on the following conditions:
First, if no processor is available to satisfy the estimated deadline for the task, the
processor with the earliest finish time is selected. It is possible that it later becomes a feasible
schedule as the assignment is based on the estimated times for future tasks whose assignment is
yet to be determined. When the task finishes within the range of its earliest finish time and its
latest finish time, we assume that the deadline of a DAG can be met with a high probability. By
116
selecting a processor where the task finishes earlier, the chance to meet deadline becomes
increased.
Second, if there is only one candidate processor that meets the above constraint, the task is
assigned to that processor. It is also in order to increase the chance to meet deadline constraints.
Finally, if there are more than one candidate processors that meet the above constraint, a
processor is selected such that the total energy expected after slack allocation is minimized. The
expected total energy is the sum of expected energy of already assigned tasks and the task
considered for assignment. For the computation of the expected energy for a given processor
assignment in this step a faster heuristic based strategy (as compared to PathDVS which provides
nearly optimal solutions) is used. This procedure is described in the next subsection.
The above selection process is iteratively performed until all tasks are assigned. However, if the
finish time of a task exceeds the deadline, the process stops.
5.3.3.1 Greedy approach for the computation of expected energy
The unit slack allocation used in PathDVS algorithm (described in Chapter 3) finds the
subset of tasks which maximally reduces the total energy consumption. This corresponds to the
maximum weighted independent set (MWIS) problem [7, 53, 65]. This is computationally
intensive. Our approach requires the use of a DVS scheme during the assignment of each task in
order to compute expected DVS based energy to select the best processor in the processor
selection step. This is an intermediate step where exact energy estimates are not as important as
in the slack allocation step. To reduce the time requirements of the optimal branch and bound
strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS
problem [53] can be used while providing good estimates of energy. The greedy algorithm in our
approach is as follows:
117
• Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already assigned tasks and a task considered for assignment)
• Select a task with the maximum energy reduction among the independent tasks of the previously selected task
• Iteratively select a task until there is no independent task of the selected tasks
The above greedy approach for unit slack allocation is iteratively performed until there is no
slack or no task for slack allocation under the estimated deadline constraints. In the proposed
greedy approach, the independent tasks can be easily identified using compatible task matrix or
lists which represent the list of tasks which can share unit slack together for each task or vice
versa like in PathDVS.
Figure 5-3. The DVSbasedAssignment procedure
Procedure DVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task τi can execute within its estimated deadline di Condition 1: If there is no processor 4.1. If the finish time of the task τi > deadline 4.2. Stop the procedure 4.3. Else 4.4. Select a processor such that the finish time of the task τi is minimized 4.5. End If Condition 2: If there is only one processor 4.1. Select the processor for the task τi Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem for the task τi and the already assigned tasks 4.2. Select a processor such that the total energy is minimized 5. End For End Procedure
118
5.3.3.2 Example for assignment
In the following, we briefly describe the benefit of considering DVS based expected
energy for tasks during the assignment process by a simple example. Figure 5-4 (a) and (b) show
a DAG with 4 tasks and the execution time and the energy consumption for each task on each
processor at the maximum voltage level. There is large variation in the energy requirements of
the task (This was done mainly to keep the example simple in terms of the number of tasks). An
assignment that minimizes total finish time is presented in Figure 5-4 (c). The total finish time is
7. The corresponding energy consumption before slack allocation is 27.
The time based task prioritization corresponding to this assignment is as follows:
τ1−τ2−τ3−τ4. Consider that the deadline to complete the execution of the DAG is 9. The
prioritization is obviously feasible since the finish time is less than this deadline. The estimated
deadline for each task is determined using the assignment shown in Figure 5-4 (c). The estimated
deadlines for tasks τ1, τ2, τ3, and τ4 are 4, 6, 7, and 9 respectively based on weighting factor of
latest finish β equal to one.
The proposed assignment method to minimize energy is now applied (Note that the energy
model follows a quadratic function and the unit slack is one unit). In the following, we show the
assignment process based on the above prioritization order.
First, consider task τ1. If task τ1 is assigned to processor p1, there is estimated slack of two
units since its finish time is 2 and its estimated deadline is 4. After slack is allocated to the task,
the energy consumption is 0.25. If the task is assigned to processor p2, the expected energy is 2.5
after allocating the estimated slack of two units to the task. Thus, the task τ is assigned to
processor p1.
119
Second, consider task τ2. If this task is assigned to processor p1, the estimated slack is two
units. The entire slack can be allocated to task τ1or τ2, or a slack of one unit is allocated to tasks
τ and τ2 respectively. The better solution is to allocate the whole slack to task τ2. Then the total
energy for tasks τ1 and τ2 based on this assignment is 2.25. If this task is assigned to processor
p2, there is no estimated slack (since the estimated deadline for task τ2 is 6). However, the total
expected energy based on this assignment is 2, so the task τ2 is assigned to processor p2.
Next, consider task τ3. Processor p2 is not considered for task τ3 because the finish time of
task τ3 on processor p2 exceeds its estimated deadline. Therefore, the task τ3 is assigned to
processor p1.
Finally, consider task τ4. If this task is assigned to processor p1, the estimated slack is three
units. In this case, the entire slack is allocated to task τ3 and then the total expected energy is 7.2.
If this task is assigned to processor p2, then a slack of two units is allocated to task τ3 and a slack
of one unit is allocated to task τ2 and the total expected energy is 7.6. Thus, the task τ4 should be
assigned to processor p1 even though its energy requirements on processor p2 (i.e., energy before
slack allocation) is less than that on processor p1.
Figure 5-4 (d) shows the assignment to minimize DVS based energy. Here the total finish
time is 9 and the total energy consumption before slack allocation is 24.
Once the assignment is completed, a slack allocation algorithm is applied to minimize the
total energy requirements. Let us now compare the two assignments of Figure 5-4 (c) and (d)
after the slack allocation. The assignment in Figure 5-4 (c) (i.e. assignment that minimizes finish
time), a slack of two units is allocated to tasks τ2 and τ3 resulting in the total energy is 8.25. For
the assignment in Figure 5-4 (d) (i.e., assignment that minimizes energy), the total energy after
slack allocation is 7.2 - this corresponds that the slack of three units is allocated to task τ3. This
120
represents a 12.7% improvement in overall energy requirements. The algorithm was able to
achieve this improvement by focusing the potential slack on task 3 which had higher energy
requirements.
Figure 5-4. Example of assignment to minimize finish time and assignment to minimize DVS based energy: (a) DAG, (b) Execution time and energy information for each task on two processors, (c) Assignment to minimize finish time, (d) Assignment to minimize DVS based energy (i.e., our assignment)
5.4 Experimental Results for Assignment Algorithms that Minimize Finish Time
In this section, we present comparisons of our algorithm with algorithms that minimize
total finish time followed by slack allocation. We compare the performance of the combination
with ILS [44] and HEFT [64]. The latter two algorithms have been shown to be superior to
existing algorithms for minimizing time for heterogeneous environments. We combined these
algorithms with three DVS algorithms, PathDVS which was presented in Chapter 3,
EProfileDVS [48, 55], and GreedyDVS [13] in order to see if the DVS algorithm makes a
difference in the relative comparison of the three assignment algorithms. The size of unit slack
for PathDVS (i.e., unitSlack) is set to the best size obtained empirically in the experiments shown
in Chapter 3: unitSlack is equal to 0.001 * total finish time.
1
20
1
10
P2
2
20
5
1
P1
Energy
2
2
3
2
P2
Time
2
2
2
2
P1
4
3
2
1
Task
1
2 3
4
(a) (b)
1
20
1
10
P2
2
20
5
1
P1
Energy
2
2
3
2
P2
Time
2
2
2
2
P1
4
3
2
1
Task
1
2 3
4
(a) (b)
0 1 2 3 4 5 6 7 8 9
P1 1
2
3 4
P2
(d)
0 1 2 3 4 5 6 7 8 9
P1 1
2
3 4
P2
0 1 2 3 4 5 6 7 8 9
P1 1
2
3 4
P2
(d)
0 1 2 3 4 5 6 7 8 9
P1 1
3
2
4P2
(c)
0 1 2 3 4 5 6 7 8 9
P1 1
3
2
4P2
0 1 2 3 4 5 6 7 8 9
P1 1
3
2
4P2
(c)
121
5.4.1 Simulation Methodology
In this section, we describe DAG generation and performance measure used in our
experiments.
5.4.1.1 The DAG generation
We randomly generated a large number of graphs with 50 and 100 tasks. The execution
time of each task on each processor at the maximum voltage is varied from 10 to 40 units (given
that we are targeting a heterogeneous environment) and the communication time between a task
and its child task for a pair of processors is varied from 1 to 4 units. The energy consumed to
execute each task on each processor is varied from 10 to 80. The execution of graphs is
performed on 4, 8, and 16 processors. For each combination of values of number of tasks and
processors, 20 different synthetic graphs are generated.
5.4.1.2 Performance measures
We used total finish time and improvement in total energy consumption for comparing the
different algorithms. The deadline extension rate is the fraction of the total finish time that is
added to the deadline (i.e., deadline = (1+deadline extension rate) * maximum total finish time
from assignments before applying DVS). We provide experimental results for deadline extension
rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6, 0.8, and 1.0. The total iteration times and
the iteration times of unimproved state for the same threshold for ICP are set to 10 and 3 and the
threshold varies 1 to 4.
5.4.2 Comparison of Assignment Algorithms Using Different DVS Algorithms
We compared our algorithm, ICP, with ILS [44] and HEFT [64] which outperform any
other existing algorithms in terms of total finish time. They are compared in terms of total finish
time and total energy consumption after applying slack allocation in order to show the
relationship between minimizing finish time and minimizing energy consumption.
122
A comparison of the three different algorithms shows that ICP was slightly better than ILS
and considerably better than HEFT in terms of total finish time. The average total finish time of
ICP is reduced by 3.95% and 9.31% compared to ILS and HEFT respectively.
Tables 5-1, 5-2, 5-3, 5-4, 5-5, and 5-6 show the improvement of ICP-PathDVS over the
remaining three assignment algorithms (i.e., ICP, ILS, and HEFT) and using three DVS
algorithms (i.e., EProfileDVS, GreedyDVS, and PathDVS) in terms of energy consumption with
respect to different deadline extension rates for each combinations of 50 and 100 tasks on 4, 8,
and 16 processors, respectively. Based on the results, our assignment algorithm, ICP, leads to
lower energy requirements as compared to other assignment algorithms regardless of any DVS
algorithms. For instance, using PathDVS algorithm, the energy on ICP assignment reduces by
11-14% over ILS and 13-17% over HEFT. We believe the main reason is that having a lower
finish time leads to a large amount of slack that can be allocated optimally to the appropriate
tasks during the slack allocation step. This leads to a large reduction in energy requirements as
compared to an algorithm that has a larger finish time.
The results also show that the performance of PathDVS (which is presented in Chapter 3)
outperforms compared to any other DVS algorithms regardless of using any assignment
algorithms in terms of minimizing energy. For instance, given ICP assignment, PathDVS
improves by 4-18% over EProfileDVS and 19-84% over Greedy depending on the values of
deadline extension rate.
Finally, the combination of ICP and PathDVS outperforms compared to any other
combinations. For instance, the combined effects of ICP along with PathDVS provide an
improvement of 13-26% over the combination of ILS and EProfileDVS.
123
Table 5-1. Results for 50 tasks and 4 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 2.83% 5.97% 6.75% 7.08% 7.31% 7.36%ICP
GreedyDVS 19.82% 47.24% 61.90% 71.08% 77.29% 81.70%
PathDVS 12.15% 11.68% 11.93% 12.10% 12.28% 12.33%
EProfileDVS 13.86% 16.05% 16.81% 17.14% 17.34% 17.38%ILS
GreedyDVS 24.98% 50.41% 64.19% 72.83% 78.66% 82.80%
PathDVS 21.80% 17.94% 17.88% 17.98% 18.14% 18.19%
EProfileDVS 21.88% 21.83% 22.25% 22.42% 22.62% 22.65%HEFT
GreedyDVS 26.00% 50.08% 63.93% 72.63% 78.51% 82.68%
Table 5-2. Results for 50 tasks and 8 processors: Improvement of ICP-PathDVS in terms of
energy consumption with respect to different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 3.72% 10.08% 12.08% 13.14% 13.94% 14.20%ICP
GreedyDVS 20.40% 49.71% 64.46% 73.37% 79.29% 83.40%
PathDVS 12.20% 11.97% 12.80% 13.52% 14.17% 14.49%
EProfileDVS 14.82% 20.55% 22.29% 23.36% 23.99% 24.31%ILS
GreedyDVS 26.64% 53.44% 67.10% 75.36% 80.85% 84.65%
PathDVS 20.64% 17.64% 17.75% 18.26% 18.84% 19.11%
EProfileDVS 21.06% 24.97% 26.37% 27.35% 27.95% 28.20%HEFT
GreedyDVS 27.09% 52.62% 66.47% 74.87% 80.46% 84.34%
124
Table 5-3. Results for 50 tasks and 16 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 5.04% 11.73% 13.00% 13.93% 14.35% 14.60%ICP
GreedyDVS 20.99% 49.48% 63.85% 72.81% 78.80% 83.07%
PathDVS 13.96% 12.44% 12.40% 12.91% 13.20% 13.43%
EProfileDVS 16.26% 22.29% 23.60% 24.45% 24.88% 25.18%ILS
GreedyDVS 24.92% 51.16% 64.97% 73.66% 79.46% 83.60%
PathDVS 17.44% 14.93% 14.59% 14.89% 15.08% 15.24%
EProfileDVS 18.01% 24.05% 24.96% 25.91% 26.28% 26.53%HEFT
GreedyDVS 25.46% 50.97% 64.74% 73.45% 79.29% 83.45%
Table 5-4. Results for 100 tasks and 4 processors: Improvement of ICP-PathDVS in terms of
energy consumption with respect to different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 2.92% 7.31% 9.18% 10.81% 11.48% 11.97%ICP
GreedyDVS 16.33% 47.45% 62.65% 72.04% 78.15% 82.46%
PathDVS 9.16% 8.40% 9.16% 10.29% 10.71% 11.13%
EProfileDVS 10.63% 14.09% 15.70% 17.15% 17.82% 18.30%ILS
GreedyDVS 19.35% 49.22% 63.90% 72.99% 78.89% 83.06%
PathDVS 17.11% 13.48% 13.15% 14.15% 14.28% 14.57%
EProfileDVS 17.14% 18.65% 19.91% 21.28% 21.88% 22.35%HEFT
GreedyDVS 19.61% 48.82% 63.62% 72.78% 78.73% 82.93%
125
Table 5-5. Results for 100 tasks and 8 processors: Improvement of ICP-PathDVS in terms of energy consumption with respect to different deadline extension rates (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 4.36% 12.88% 16.61% 18.29% 18.69% 19.43%ICP
GreedyDVS 17.30% 50.16% 65.40% 74.29% 79.91% 83.99%
PathDVS 8.86% 8.58% 10.39% 11.76% 12.02% 12.83%
EProfileDVS 11.38% 19.16% 22.67% 24.27% 24.62% 25.29%ILS
GreedyDVS 20.15% 51.73% 66.53% 75.15% 80.59% 84.53%
PathDVS 14.07% 11.12% 12.52% 13.82% 14.08% 14.87%
EProfileDVS 14.31% 20.95% 24.33% 25.85% 26.19% 26.86%HEFT
GreedyDVS 19.57% 50.94% 65.98% 74.75% 80.28% 84.30%
Table 5-6. Results for 100 tasks and 16 processors: Improvement of ICP-PathDVS in terms of
energy consumption with respect to different deadline extension rates for 100 tasks on 16 processors (unit: percentage)
Deadline Extension Rate
0 0.2 0.4 0.6 0.8 1.0
EProfileDVS 5.06% 16.17% 18.78% 19.73% 20.22% 20.40%ICP
GreedyDVS 19.28% 52.75% 67.13% 75.46% 80.93% 84.77%
PathDVS 9.65% 9.41% 9.88% 10.28% 10.57% 10.82%
EProfileDVS 12.85% 23.59% 26.09% 26.97% 27.44% 27.71%ILS
GreedyDVS 23.23% 54.82% 68.55% 76.52% 81.75% 85.43%
PathDVS 13.39% 11.41% 11.49% 11.74% 13.01% 13.91%
EProfileDVS 14.25% 24.50% 26.75% 27.43% 28.71% 29.49%HEFT
GreedyDVS 21.74% 53.52% 67.62% 75.82% 81.20% 84.99%
126
5.4.3 Comparison between CPS (Used in Prior Scheduling for Energy Minimization) and ICP
We also compared our algorithm to the CPS assignment algorithm that is typically used in
the energy minimization literature [48]. Here we show the performance for a large number of
graphs with 100 and 200 tasks on 4 and 8 processors. The other experimental settings (e.g.,
execution time, communication time, etc.) are same with the above. The performance is also
measured in terms of total finish time and total energy consumption after applying slack
allocation in order to show the relationship between minimizing finish time and minimizing
energy consumption
The average ratio of total finish time of ICP to CPS is 0.71 and 0.59 on 4 and 8 processors
respectively. Figure 5-5 shows the result of comparison of ICP and CPS followed by slack
allocation (i.e., PathDVS) in terms of total energy consumption. The results show that ICP
assignment algorithm gives more energy savings compared to CPS assignment algorithm. It is
because ICP gives more slack that can be used to save energy due to the earlier total finish time.
For instance, the results for 100 tasks on 8 processors showed that that ICP required 40% less
time and 67-75% less energy as compared to CPS. And, the results for 100 tasks on 4 processors
showed that that ICP required 29% less time and 48-56% less energy as compared to CPS. From
these results, we can see that the assignment is one of critical factors to minimize energy
consumption because less finish time makes more slack potentially used for energy
minimization.
127
100 Tasks on 4 Processors
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
ICP-PathDVSCPS-PathDVS
100 Tasks on 8 Processors
00.10.20.30.40.50.60.70.80.9
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
ICP-PathDVSCPS-PathDVS
200 Tasks on 4 Processors
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
ICP-PathDVSCPS-PathDVS
200 Tasks on 8 Processors
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
ICP-PathDVSCPS-PathDVS
Figure 5-5. Normalized energy consumption of ICP and CPS using PathDVS with respect to
different deadline extension rates for different number of tasks and processors: (a) 100 tasks on 4 processors, (b) 100 tasks on 8 processors, (c) 200 tasks on 4 processors, and (d) 200 tasks on 8 processors
5.5 Experimental Results for Assignment Algorithms that Minimize Energy
We have conducted a number of simulations to evaluate the benefits of our algorithm to
other algorithms that do not consider energy profiles in the assignment. We also compared our
proposed scheduling algorithm with GA based algorithms that consider multiple assignments
[56, 57]. The performance of our energy based assignment algorithm is relatively independent of
the slack allocation and the time minimization assignment algorithms. Given that PathDVS and
ICP perform better than other related algorithms (as presented in Chapter 3 and Section 5.4
respectively), we use these algorithms for slack allocation and time minimization assignment
respectively.
128
The experimental results are presented into two broad subsections. In the first subsection,
we assume that the energy requirements of a task on a processor are relatively independent of the
execution time requirements. In the second subsection, we assume that there is a strong
correlation between time and energy requirements of executing the task on a processor.
5.5.1 Simulation Methodology
In this section, we describe DAG generation and performance measure used in our
experiments.
5.5.1.1 The DAG generation
We randomly generated a large number of graphs with 50 and 100 tasks. The execution
time of each task on each processor at the maximum voltage is varied from 10 to 40 units and the
communication time between a task and its child task for a pair of processors is varied from 1 to
4 units. The energy consumed to execute each task on each processor is varied from 10 to 80.
The execution of graphs is performed on 4, 8, 16, and 32 processors. For each combination of
values of number of tasks and processors, 20 different synthetic graphs are generated.
5.5.1.2 Performance measures
The performance is measured in terms of normalized total energy consumption and
computational requirements (i.e., runtime of algorithms). The former is defined as the total
energy consumption normalized by the energy consumption obtained from assignment algorithm
without a DVS scheme. We assume that the deadline is always larger than or equal to the finish
time of the DAG. Here the finish time of the DAG is based on the baseline assignment (i.e., time
minimization assignment using time based prioritization). The deadline extension rate is the
fraction of the total finish time that is added to the deadline (i.e., deadline = (1 + deadline
extension rate) * total finish time from assignment before applying DVS). We provide
129
experimental results for deadline extension rate equal to 0 (no deadline extension), 0.2, 0.4, 0.6,
0.8, and 1.0.
5.5.1.3 Variations of our algorithms
We tested three variations of our algorithms to understand the impact of multiple
prioritizations (based on parameter α) and variable estimates on deadline for each task (based on
parameter β). The algorithms used in our experiments are classified into three categories: A0,
A1, and A2. First, A0 is an assignment for time based task prioritization (α = 1.0) and deadline
estimate equal to the latest finish time (β = 1.0). This is followed by a slack allocation and
corresponds to an assignment that is based on using base prioritization and allowing for
maximum allowable deadline for each task. Second, A1 is an ssignment for the weight of time
equal to one and the various weights of LFT (i.e., α = 1.0, β = 1.0, 0.75, and 0.5). For each of the
feasible assignment, a final slack allocation step is performed. This corresponds to assignments
that are based on using base prioritization. For this prioritization, attempt variable amounts of
estimated deadline given by β. The basic idea here is that choosing the maximum allowable
deadline for each task (i.e., higher value of β) may lead to infeasible assignments but may lead to
best energy requirements by providing more flexibility for processor selection. Finally, A2 is an
assignment for the various weights of time and LFT (i.e., α = 0, 0.2, 0.4, 0.6, 0.8, and 1.0, β =
1.0, 0.75, and 0.5). For all of the feasible assignments, a final slack allocation step is performed.
These correspond to assignments that are based on multiple prioritizations. For each
prioritization, attempt variable amounts of estimated deadline given by β.
The optimal values of α and β for A1 and A2 formulation are instance dependent. For each
instance all the values are attempted and the one that results in the minimal energy is chosen. We
chose the range of values of α and β as discussed above based on initial experimentation.
130
5.5.1.4 Variations of GA based algorithms
Genetic algorithms consist of a population of individuals that go through several
generations. The algorithms in [56, 57] use a nested set of individuals. The first set corresponds
to multiple mapping of tasks to processors. For each mapping, there is a population consisting of
multiple individuals corresponding ordering or prioritization of tasks. Each generation is used to
generate the next generation using crossover and mutation. The former combines two individuals
to generate a new set of two individuals. The latter modifies one of the individual. The fitness of
an individual is measured by the total energy requirements after applying a slack allocation
scheme and the satisfaction of deadline constraints. There are several parameters including the
number of individuals of the population, the crossover rate, and the mutation rate. The values of
parameters used in GA are set as suggested in [56, 57]. We terminate the GA algorithm if the
improvement is less than 1% after 10 generations as suggested in [56, 57]. The performance of
GA based algorithms depends on the slack allocation method and the initial seeding of the
population. To show the comparison between our algorithms and GA based approaches, we
conducted experiments with four variations of GA based algorithms: GARandNonOptimal,
GARandOptimal, GASolNonOptimal, and GASolOptimal.
• GA using DVS scheme in [56, 57] with randomly generated solutions for an initial population (i.e., GARandNonOptimal). This is the scheme that is presented in [56, 57]
• GA using PathDVS with randomly generated solutions for an initial population (i.e., GARandOptimal)
• GA using DVS scheme in [56, 57], with randomly generated population consisting of A0 as one of the solution (i.e., GASolNonOptimal)
• GA using PathDVS with randomly generated population consisting of A0 as one of the solution (i.e., GASolOptimal)
We chose different DVS schemes as the GA requires fitness calculations (in our case energy
required) for each solution that is generated. We wanted to find out if a less computationally
131
intensive DVS scheme during the GA process can lead to similar solutions as a more
computationally intensive DVS scheme.
5.5.2 DVS Schemes to Compute Expected Energy in Processor Selection Step
As discussed in the algorithm section, our approach requires the use of a DVS scheme
during the assignment of each task in order to compute expected DVS based energy to select the
best processor in the processor selection step. This is an intermediate step where exact energy
requirement is not needed. To reduce the time requirements of the optimal branch and bound
strategy for unit slack allocation as described in Chapter 3, we used a greedy strategy. To test
whether this strategy leads to inferior assignments, we compared the energy requirements using
these two methods for slack allocation during this intermediate step. Figure 5-6 shows this
comparison for different deadline extension rates. Since the performance difference in terms of
energy was not significant and the greedy scheme is one to two orders of magnitude faster, we
chose a greedy based scheme for this step.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
A0-Optimal A0-Greedy
0
10000
20000
30000
40000
50000
60000
70000
80000
0 0.2 0.4
Deadline Extension Rate
Run
time
A0-Optimal A0-Greedy
Figure 5-6. Comparison between optimal scheme and greedy scheme for processor selection of
A0 for 50 tasks on 4 and 8 processors: (a) with respect to normalized energy consumption and (b) with respect to runtime (unit: ms)
5.5.3 Independence between Time and Energy Requirements
In this section, we present the experimental results for the cases that the energy
requirement of a task on a processor is relatively independent of the execution time requirement.
132
5.5.3.1 Comparison of energy requirements of proposed algorithms
Figures 3-7 and 3-8 show the results of comparison of energy consumption for our
algorithms (i.e., A0, A1, and A2) and baseline algorithm (i.e., Base: the combination of ICP and
PathDVS) with respect to different deadline extension rates for different number of processors
(i.e., 4, 8, 16, and 32 processors) and tasks (i.e., 50 and 100 tasks). Based on the results, all of
our algorithms lead to significant energy reduction compared to baseline algorithm. Furthermore,
A2 is better than A1, while A1 is better than A0. For instance, using 1.0 deadline extension rate
for 32 processors, A0, A1, and A2 improves by 30.9%, 32.8%, and 36.8% over baseline
algorithm, respectively.
(a) 50 Tasks on 4 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(b) 50 Tasks on 8 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(c) 50 Tasks on 16 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(d) 50 Tasks on 32 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
Figure 5-7. Results for 50 tasks: Normalized energy consumption of our algorithms with respect
to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors
133
(a) 100 Tasks on 4 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(b) 100 Tasks on 8 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(c) 100 Tasks on 16 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
(d) 100 Tasks on 32 Processors
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseA0A1A2
Figure 5-8. Results for 100 tasks: Normalized energy consumption of our algorithms with respect
to variable deadline extension rates for different number of processors: (a) 4 processors, (b) 8 processors, (c) 16 processors, and (d) 32 processors
Figure 5-9 shows the improvement of our algorithms over baseline algorithm (i.e., Base:
ICP-PathDVS) with respect to different number of processors. Based on the results, as the
number of processors increases, the performance of our algorithms shows increased
improvement over baseline algorithm. For instance, with 0.4 deadline extension rate, A0
improves by 8.4%, 11.3%, 21.4%, and 31%, A1 improves by 10.6%, 12.7%, 23.1%, and 33%,
and A2 improves by 16.8%, 18.9%, 27.2%, and 35.8%, for 4, 8, 16, and 32 processors,
respectively, as compared to baseline algorithm.
134
(a) No Deadline Extension
0.00%1.00%2.00%3.00%4.00%5.00%6.00%7.00%8.00%9.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
(b) 0.2 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
(c) 0.4 Deadline Extension Rate
0.00%5.00%
10.00%15.00%
20.00%25.00%
30.00%35.00%
40.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
(d) 0.6 Deadline Extension Rate
0.00%5.00%
10.00%15.00%
20.00%25.00%
30.00%35.00%
40.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
(e) 0.8 Deadline Extension Rate
0.00%5.00%
10.00%15.00%
20.00%25.00%
30.00%35.00%
40.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
(f) 1.0 Deadline Extension Rate
0.00%5.00%
10.00%15.00%
20.00%25.00%
30.00%35.00%
40.00%
4 8 16 32
Number of Processors
Impr
ovem
ent
A0A1A2
Figure 5-9. Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) with respect to different number of processors for variable deadline extension rates (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate
5.5.3.2 Comparison of energy requirements with GA based algorithms
We found that the GA (Genetic Algorithm) based algorithms have relatively poor
performance and do not always generate a feasible schedule (i.e., schedule that completes by a
given deadline), especially when the deadline is tight (i.e., small values of deadline extension
135
rate). Based on the results, in general, A0 was considerably better than the GA - the
improvements ranged anywhere from 50%-70% of the energy requirements of the GA. In the
following, we present the results of the comparison of our algorithms with four variations of GA
based algorithms (i.e., GARandNonOptimal, GARandOptimal, GASolNonOptimal,
GASolOptimal).
Comparison with GARandNonOptimal
Figure 5-10 shows the result of comparison between our algorithms and
GARandNonOptimal in terms of energy consumption with respect to different number of tasks
and processors. The GA based algorithm using initial solutions for task ordering and mapping
which are randomly generated does not provide good performance in terms of energy
consumption. Furthermore, it cannot even generate a feasible schedule (i.e., schedule meeting
deadline), especially under the tight deadline, when using the limited initial solution pool (i.e., 25
individuals for ordering and 50 individuals for mapping) and the constraint for the termination of
GA algorithm (i.e., repeat until no improvement of at least 1% is made for 10 generations) as
suggested in [56, 57]. We then provide the results for deadline extension rate equal to 1.0 to
fairly compare the energy with the feasible solutions generated. Based on the results,
GARandNonOptimal gives even worse performance than the baseline algorithm (i.e., Base: ICP-
PathDVS) – for example, 65% improvement of Base over GARandNonOptimal. Our algorithms,
A0, A1, and A2, respectively improve by 68.7%, 70.0%, and 73.1% in terms of energy
consumption compared to GARandNonOptimal, for 8 processors. As the number of processors
increases, our algorithms provide much more benefit. While A0 improves by 48.5% for 4
processors, it improves by 68.7% for 8 processors. Our algorithms also provide better
136
performance as the number of tasks increases. For instance, A0 improves by 58.3% for 50 tasks
and 61.5% for 100 tasks.
0
0.1
0.2
0.3
0.4
0.5
0.6
4 Processors 8 Processors
Number of Processors
Norm
aliz
ed E
nerg
y
BaseGARandNonOptimalA0A1A2
0
0.1
0.2
0.3
0.4
0.5
50 Tasks 100 Tasks
Number of Tasks
Norm
aliz
ed E
nerg
y
BaseGARandNonOptimalA0A1A2
Figure 5-10. Normalized energy consumption of GARandNonOptimal and our algorithms for
different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks
Comparison with GARandOptimal
The performance did not significantly improve by using a better slack allocation scheme
like PathDVS which provides near optimal solutions for energy minimization. Like
GARandNonOptimal, due to the use of randomly generated initial solutions, the limited number
of individuals and the constraint for termination, GARandOptimal does not give good
performance. Figure 5-11 shows the results of comparison between our algorithms and
GARandOptimal in terms of energy consumption with respect to different number of tasks and
processors for 1.0 deadline extension rate. Based on the results, our algorithms, A0, A1, and A2,
respectively improve by 69.1%, 70.5%, and 73.5% in terms of energy consumption compared to
GARandOptimal, for 8 processors. Also, our algorithms become better over GARandOptimal as
the number of processors and tasks becomes increased. For instance, A0 improves by 46.5% and
69.1% for 4 and 8 processors, and 58.8% and 60.5% for 50 and 100 tasks, respectively.
137
0
0.1
0.2
0.3
0.4
0.5
0.6
4 Processors 8 Processors
Number of Processors
Norm
aliz
ed E
nerg
y
BaseGARandOptimalA0A1A2
0
0.1
0.2
0.3
0.4
0.5
50 Tasks 100 Tasks
Number of Tasks
Norm
aliz
ed E
nerg
y
BaseGARandOptimalA0A1A2
Figure 5-11. Normalized energy consumption of GARandOptimal and our algorithms for
different number of tasks and processors: (a) with respect to different number of processors and (b) with respect to different number of tasks
Comparison with GASolNonOptimal
Next we tried the other approach that seeds the population with a good solution (from A0)
because using all the randomly generated initial solutions leads to poor performance. Figure 5-12
shows the result of comparison between our algorithms and GASolNonOptimal in terms of
energy consumption with respect to different deadline extension rates for different number of
tasks and processors. Although GASolNonOptimal uses one good solution from A0, no
significant improvement was achieved as compared to A0. It is because their DVS scheme used
in GASolNonOptimal does not provide good performance in terms of energy consumption, while
our algorithms use PathDVS, optimal DVS scheme. Based on the results, our algorithms, A0,
A1, and A2, respectively improve by 11.1%, 15.0%, and 20.5% compared to GASolNonOptimal,
for 100 tasks on 8 processors with 1.0 deadline extension rate. Figure 5-13 shows the results of
comparison between our algorithms and GASolNonOptimal in terms of energy consumption
with respect to different number of tasks and processors for 1.0 deadline extension rate.
138
50 Tasks on 4 Processors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseGASolNonOptimalA0A1A2
50 Tasks on 8 Processors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseGASolNonOptimalA0A1A2
100 Tasks on 4 Processors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseGASolNonOptimalA0A1A2
100 Tasks on 8 Processors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Nor
mal
ized
Ene
rgy
BaseGASolNonOptimalA0A1A2
Figure 5-12. Normalized energy consumption of GASolNonOptimal and our algorithms with
respect to different extension rates for different number of tasks and processors: (a) 50 tasks and 4 processors, (b) 50 tasks and 8 processors, (c) 100 tasks and 4 processors, and (d) 100 tasks and 8 processors
0
0.05
0.1
0.15
0.2
0.25
4 Processors 8 Processors
Number of Processors
Nor
mal
ized
Ene
rgy
Base GASolNonOptimal A0 A1 A2
0
0.05
0.1
0.15
0.2
0.25
50 Tasks 100 Tasks
Number of Tasks
Nor
mal
ized
Ene
rgy
Base GASolNonOptimal A0 A1 A2
Figure 5-13. Normalized energy consumption of GASolNonOptimal and our algorithms: (a) with
respect to different number of processors and (b) with respect to different number of tasks
139
Comparison with GASolOptimal
Figure 5-14 shows the results of comparison between our algorithms and GASolOptimal in
terms of energy consumption with respect to different number of tasks and processors for 1.0
deadline extension rate. Although GASolOptimal uses one good solution from A0 and a near
optimal DVS scheme, no significant improvement was achieved as compared to A0. The
performance of A0 and GASolOptimal is very similar – the fractional difference between energy
requirements of A0 and GASolOptimal was between 0.00009 and 0.002. Furthermore, our
algorithms with iteration (i.e., A1, A2) provide the improved performance. Based on the results,
our algorithms, A1 and A2, respectively improve by 4.6% and 14.3% for 8 processors, 5.3% and
16.3% for 100 tasks.
0
0.05
0.1
0.15
0.2
0.25
4 Processors 8 Processors
Number of Processors
Norm
aliz
ed E
nerg
y
BaseGASolOptimalA0A1A2
0
0.05
0.1
0.15
0.2
0.25
50 Tasks 100 Tasks
Number of Tasks
Norm
aliz
ed E
nerg
y
BaseGASolOptimalA0A1A2
Figure 5-14. Normalized energy consumption of GASolOptimal and our algorithms: (a) with
respect to different number of processors and (b) with respect to different number of tasks
5.5.3.3 Comparison of time requirements
Figure 5-15 shows the results of runtime requirements of our algorithms in terms of
runtime with respect to different deadline extension rates. The total runtime for A1 and A2 is
proportional to the number of different values of α and β times the runtime of A0. It is worth
noting that, since A1 and A2 can effectively execute in parallel, their runtime can be reduced
significantly in a parallel environments.
140
(a) 50 Tasks
0
10000
20000
30000
40000
50000
60000
70000
80000
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Run
time A0
A1A2
(b) 100 Tasks
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
0 0.2 0.4 0.6 0.8 1
Deadline Extension Rate
Run
time A0
A1A2
Figure 5-15. Runtime to execute our algorithms with respect to variable deadline extension rates for different number of tasks (unit: ms): (a) 50 tasks and (b) 100 tasks
Figure 5-16 shows the result of comparison of A0 and GA based algorithms in terms of
computational time (i.e., runtime taken to execute algorithms) for 1.0 deadline extension rate
with respect to different number of tasks. Based on the results, A0 is two orders of magnitude
faster than GA based algorithms using a suboptimal DVS scheme (i.e., RandNonOptimal and
GASolNonOptimal). Furthermore, A0 is 2237, 2406 times faster than GARandOptimal and
GASolOptimal which a nearly optimal DVS scheme is used.
1
10
100
1000
10000
100000
1000000
10000000
100000000
50 Tasks 100 Tasks
Number of Tasks
Run
time
A0GARandNonOptimalGASolNonOptimalGARandOptimalGASolOptimal
Figure 5-16. Runtime to execute GA algorithms and our algorithm with respect to different number of tasks for 1.0 deadline extension rate (unit: ms – logarithmic scale)
141
5.5.4 Dependence between Time and Energy Requirements
In the experimental results presented in the previous section, we assumed that the time and
energy requirements of a task were independent of each other. We also conducted experiments to
see the performance with various degrees of correlation between time and energy consumption
for tasks on a given processor. We define a parameter γ that controls this correlation (i.e.,
correlation rate). The energy for a task is proportional to the execution time multiplied a value
varied from (1 – γ) to (1 + γ) (i.e., energy of each task = execution time of each task * [1-γ,
1+γ]). We experimented with a number of values for γ and present results for γ equal to 0, 0.4,
and 0.8. We also compare the results with the independent case as defined in the previous
section. This corresponds to rand.
Figure 5-17 shows the energy improvement of our algorithms over ICP-PathDVS (i.e.,
baseline algorithm) for variable values of γ and different deadline extension rates, for 8
processors respectively. Based on the results, as the parameter γ increases, the relative
improvement of our algorithms increases. For instance, with 1.0 deadline extension rate, A0
improves by 4%, 5.7%, and 17.9%, A1 improves by 7.6%, 9.5%, and 20.5%, and A2 improves
by 15.5%, 19.2%, and 30.3%, for γ equal to 0, 0.4, and 0.8 respectively. For the case of time-
independent energy consumption (based on our experimental setting), the improvement is
between value of γ between 0.4 and 0.8. For instance, using the ‘rand’ option (i.e., time-
independent energy consumption), A0 improves by 12.5%, A1 improves by 16.9%, and A2
improves by 23.2%, for 1.0 deadline extension rate.
142
(a) No Deadline Extension
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(b) 0.2 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(c) 0.4 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Energy Heterogeneity Rate
Impr
ovem
ent
A0A1A2
(d) 0.6 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(e) 0.8 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(f) 1.0 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
Figure 5-17. Results for 4 processors: Improvement of our algorithms over ICP-PathDVS (i.e.,
baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate
143
(a) No Deadline Extension
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(b) 0.2 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(c) 0.4 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Energy Heterogeneity Rate
Impr
ovem
ent
A0A1A2
(d) 0.6 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(e) 0.8 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
(f) 1.0 Deadline Extension Rate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
0 0.4 0.8 rand
Correlation Rate γ
Impr
ovem
ent
A0A1A2
Figure 5-18. Results for 8 processors: Improvement of our algorithms over ICP-PathDVS (i.e., baseline algorithm) in terms of energy consumption with respect to different correlation rates for variable deadline extension rates for 50 and 100 tasks (unit: percentage): (a) no deadline extension, (b) 0.2 deadline extension rate, (c) 0.4 deadline extension rate, (d) 0.6 deadline extension rate, (e) 0.8 deadline extension rate, and (f) 1.0 deadline extension rate
144
CHAPTER 6 DYNAMIC ASSIGNMENT
We assume that a static scheduling algorithm has already been applied before executing
tasks and the schedule needs to be adjusted whenever a task finishes before its scheduled time.
Thus this schedule is updated whenever a dynamic scheduling is applied. When a task finishes
before its estimated time, two changes may occur for all the remaining tasks (i.e., tasks that have
not yet executed) in the schedule. Its processor mapping may change (along with the start time
and end time). Also, the amount of slack (time over minimum execution time for that processor
based on executing the task at maximum voltage) may change.
Most prior research on scheduling for energy minimization does not focus on the
assignment process, in particular, in dynamic environments. We have shown that reallocating the
slack at runtime (i.e., dynamic slack allocation) leads to better energy minimization in Chapter 4.
We also showed that applying our dynamic slack allocation method at runtime not only
outperforms the existing greedy method but also is comparable to static near optimal methods
applied at runtime in terms of energy requirements in Chapter 4.
In this chapter, we explore whether reassignment of tasks along with reallocation of slack
during runtime can lead to even better performance in terms of energy minimization. For an
approach that is effective at runtime, its overhead should be small for it to be useful. The
proposed dynamic scheduling algorithm utilizes several threads to generate a schedule:
• One set for reallocating slack while keeping the assignment in the current schedule.
• Another set for changing the assignment and then reallocating slack.
Then a schedule providing the minimum energy is selected.
As described in Chapter 4, for the dynamic scheduling (i.e., rescheduling), there are two
steps that need to be addressed. First, select the subset of tasks for rescheduling. The potentially
145
rescheduled tasks via the dynamic scheduling algorithm are tasks which have not yet started
when the algorithm is applied. We assume that the voltage can be selected before a task starts
executing. The dynamic scheduling is applied to the subset of tasks among the tasks. The tasks
considered for rescheduling are limited in order to minimize the overhead of reassigning
processors and reallocating the slack during runtime. Clearly, this should be done so that the
other goal of energy reduction is also met simultaneously. Second, determine the time range for
the selected tasks. The time range of the selected tasks has to be changed as some of the tasks
have completed earlier than expected. Based on the computation time in the schedule and
assignment-based dependency relationships among tasks, we recompute the time range (i.e.,
earliest start time and latest finish time) where the selected tasks should be executed. The time
range is defined differently for reassignment and slack reallocation – time range over processors
for reassignment and time range for the selected tasks given an assignment for slack reallocation.
However, the main concept is same as the selected tasks have to be reassigned and reallocated
slack within this time range in order to meet deadline constraints.
At this stage our proposed reassignment algorithm and slack reallocation approach are
applied to the subset of tasks within the time range as described above. The computational time
(i.e., runtime overhead) is kept small due to the limited number of tasks selected for
rescheduling. While several assignment methods can be applied using threads, we propose a
reassignment method based on our method described in Chapter 5. This incorporates the
expected DVS based energy information during the reassignment process. The dynamic
assignment algorithm is described in detail in the next section.
6.1 Proposed Dynamic Assignment
This section presents a novel dynamic assignment algorithm which reassigns processors
for the reschedulable tasks at runtime. The main feature of our proposed reassignment algorithm
146
is to consider the energy requirements based on potential slack during the assignment step. In
other words, the algorithm assigns an appropriate processor for each reschedulable task such that
the total energy expected after slack allocation is minimized. The expected energy after slack
allocation for each reschedulable task is computed by using the estimated deadline for the task so
that the overall DAG can be executed by the deadline.
6.1.1 Choosing a Subset of Tasks for Rescheduling
The proposed dynamic scheduling algorithm, k lookahead approach, is based on choosing
a subset of tasks for which the schedule will be readjusted. The schedule for the remaining tasks
(i.e., tasks not selected for the rescheduling) is not affected. Figure 5-1 shows the subset of tasks
for rescheduling in an assignment DAG when task τ2 finishes early.
Using k lookahead approach, all tasks within a limited range of time are considered for the
readjustment of schedule. The range of time is limited with the value of k (i.e., k * maximum
computation time of tasks). In the example of Figure 5-1, assume that the computation time of
each task is one unit, the communication time among tasks is zero, and the tasks in the same
depth finish at the same time for ease of presentation of the key concepts. In this case, if k is
equal to 2, the time range would be 2 units (2 * one unit) and then tasks within the time range
from the finish of task τ2, e.g., τ4, τ5, τ6, τ7, τ8, τ9, and τ10, are considered. The set of tasks
selected for the rescheduling is defined by
s.t. where
},max
lll
jΓτliliiallocation
estaticFTimftimeτ
compTimek*fimeime, staticFTftimeme|staticSTi{τΓj
≠
+≤≥=∈
where staticSTimei is the start time of task τi in the static or previous schedule, staticFTimei is the
finish time of task τi in the static or previous schedule, ftimel is the actual finish time of task τl at
147
runtime, and compTimej is the computation time of task τj on its assigned processor, a.k.a., the
estimated execution time at the maximum voltage.
The approach with ‘all’ option for k (i.e., k-all lookahead approach) corresponds to the
static scheduling approach without the limitation on the time range for tasks considered for
rescheduling. Thus, the k-all lookahead approach is same as applying the static scheduling
algorithm to all the remaining tasks at runtime. One would expect this to be close to the best that
can be achieved. The set of tasks selected for the rescheduling is defined by
lllliiallocation estaticFTimftimeτftimeestaticSTimΓ ≠≥= s.t. where},|{τ
6.1.2 Time Range for Selected Tasks
The schedule for tasks not in the set of reschedulable tasks is kept to be the same (this is
based on static schedule or schedule generated by last rescheduling). For the set of reschedulable
tasks, the range of time to execute them is defined for feasible solutions before dynamic
scheduling algorithm. The time range is differently defined for reassignment and slack
reallocation – time range over each processor for reassignment and time range for the set of
reschedulable tasks given an assignment for slack reallocation. It is because reassignment can
map a task to any processor. Even in a case that there is a processor where no reschedulable task
is assigned, the time range over the processor for reassignment may be limited based on the
assignment of other tasks no in the set of reschedulable tasks. Meanwhile, for slack reallocation,
there is no need to define the time range for all processors, but only for the set of selected tasks
because the slack is reallocated based on a given assignment. For reassignment, the time range of
processors is defined as follows.
First, the minimum computation time of a task is set to its estimated time at the maximum
voltage (i.e., staticCTimei = compTimei where τi ∈ Γallocation. Here staticCTimei is the computation
148
time of task τi in the static or previous schedule generated by the last rescheduling). This is the
same time that was used during static assignment process. This effectively ensures that
maximum flexibility is available for reassignment.
Second, the available start time of each processor is the possible earliest start time of each
processor for the tasks. It is set to the expected finish time (i.e., the finish time in the current
schedule) of the last task that is not in the set of reschedulable tasks and already started when
applying an algorithm (it is still executing or finished) on each processor (i.e., a task with the
latest finish time on each processor among tasks not in the set of reschedulable tasks). It is worth
noting that it is not the earliest start time of reschedulable tasks on each processor. The earliest
start times of the tasks on a processor are different due to the precedence relationships among
other tasks. The available start time of a processor pj, procSTimej, is defined by
iiliji
ij
estaticSTimftimeestaticSTimpprocwhereestaticFTimprocSTime
max&& ,
<=
=
Finally, the deadline of each processor is the possible latest finish time of each processor
for the tasks. It is set to the expected start time (i.e., the start time in the current schedule) of the
first task that is not in the set of reschedulable tasks and is not started yet when applying an
algorithm on each processor (i.e., a task with the earliest start time on each processor among
tasks not in the set of reschedulable tasks). It is worth noting that it is not the latest finish time of
reschedulable tasks on each processor. Like the earliest start time, the latest finish times of the
tasks on a processor are different due to the precedence relationships among other tasks. The
deadline of a processor pj, procDeadlinej, is defined by
iijicallocationiij estaticSTimpprocΓτwhereestaticSTimneprocDeadli min&& , =∈=
149
6.1.3 Estimated Deadline and Energy
The goal of the assignment is to minimize the expected total energy consumption after
slack allocation while still satisfying deadline constraints. Consider a scenario where the
assignment of a subset of tasks has already been completed and a given next task in the
prioritization list has to be assigned. The choice of the processors that can be assigned to this task
should be limited to the ones where expected finish time from the overall assignment will lead to
meeting the deadline constraints (else this will result in an infeasible assignment). Clearly, there
is no guarantee that the schedule derived will be a feasible schedule (i.e., a schedule meeting
deadline) at the time when the assignment for a given task is being determined because the
feasibility of the schedule depends on the assignment of the other remaining tasks whose
assignment is not determined.
The proposed algorithm calculates the estimated deadline for each task, that is, deadline
expected to enable a feasible schedule if the task’s finish time satisfies its estimated deadline.
The estimated deadline of a task is set to the latest finish time in order to allow more flexibility
for processor assignment as the task can take a longer time to complete (while the probability of
feasible schedule for DAG may be lower). The latest finish time of task τi, LFTi, is defined by
( )( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−−
−=
∈ ijjjsucc
pSuccpSucci
i commTimeestaticCTimLFT
estaticCTimLFTdeadlineLFT
ij
ii
τmin
, ,min
Here the latest finish time of a task is different based on its potential assigned processor due to
the assignment-based dependency relationship among tasks. From this fact, the time limit which
a task should be completed within will vary for processors.
Using this estimated deadline, the estimated energy of reschedulable tasks is computed
while selecting processors for reassignment. The estimated energy is the energy expected after
150
slack allocation. For the computation of the estimated energy, we apply the principle of unit
slack allocation used in PathDVS algorithm which is a static slack allocation algorithm providing
near optimal solutions. The unit slack allocation used in PathDVS algorithm (described in
Chapter 3) finds the subset of tasks which maximally reduces the total energy consumption. This
corresponds to the maximum weighted independent set (MWIS) problem [7, 53, 65]. This is
computationally intensive. Our approach requires the use of a DVS scheme during the
assignment of each task in order to compute expected DVS based energy to select the best
processor in the processor selection step. This is an intermediate step where exact energy
estimates are not as important. To reduce the time requirements of the optimal branch and bound
strategy for unit slack allocation as described in Chapter 3, a greedy algorithm for the MWIS
problem [53] can be used. The greedy algorithm in our approach is as follows:
• Select a task with the maximum energy reduction (i.e., energy reduced when unit slack is allocated) among all tasks (i.e., already assigned tasks and a task considered for assignment).
• Select a task with the maximum energy reduction among the independent tasks of the previously selected task.
• Iteratively select a task until there is no independent task of the selected tasks.
The above greedy approach for unit slack allocation is iteratively performed until there is no
slack or no task for slack allocation under the estimated deadline constraints. In the proposed
greedy approach, the independent tasks can be easily identified using compatible task matrix or
lists which represent the list of tasks which can share unit slack together for each task or vice
versa like in PathDVS.
6.1.4 Processor Selection
Figure 6-1 presents a high level description of the assignment procedure. The task is
assigned to a processor such that the total energy consumption expected after applying DVS
151
scheme for the tasks that have already been assigned so far (and including the new task that is
being considered for assignment) is minimized while trying to meet estimated deadline of the
task. The candidate processors for the task are selected such that the task can execute within its
estimated deadline. Note that the estimated deadline of a task may be different based on
processors. Once selecting the candidate processors for the task, the next process is followed
depending on the three following conditions.
First, if no processor is available to satisfy the estimated deadline, the processor with the
earliest finish time is selected (it is possible that it later becomes a feasible schedule as the
assignment is based on estimated times for future tasks whose assignment is yet to be
determined). When the task finishes within its latest finish time, we assume that the deadline of a
DAG can be met with a high probability. By selecting a processor where the task finishes earlier,
the chance to meet deadline becomes increased. However, if its finish time exceeds the time
range for reschedulable tasks or its specific deadline, the reassignment process stops because it
means the schedule will not meet deadline constraints obviously.
Second, if there is only one processor that meets the above constraint, the task is assigned
to that processor. It is also in order to increase the chance to meet deadline constraints.
Finally, if there are more than one candidate processors that meet the above constraint, a
processor is selected such that the total energy expected after slack allocation is minimized. The
expected total energy is the sum of expected energy of already assigned tasks and the task
considered for assignment. For the computation of the expected energy for a given processor
assignment in this step a faster heuristic based strategy (as compared to PathDVS which is nearly
optimal) is used as described in the previous subsection.
152
The above selection process is iteratively performed until all selected tasks for
rescheduling are assigned. However, if the finish time of a task exceeds the deadline, the process
stops and the previous assignment is kept for all reschedulable tasks.
Figure 6-1. The DynamicDVSbasedAssignment procedure
6.2 Experimental Results
In this section, we compare the performance of the combination of dynamic assignment
and dynamic slack allocation proposed in this paper (i.e., DynamicAssgn) with the following two
main methods which outperform other existing in each given state:
Procedure DynamicDVSbasedAssignment 1. Compute the estimated deadline for each task 2. For each task 3. Find the processors that a task τi can execute within its estimated deadline Condition 1: If there is no processor 4.1. If the finish time of the task τi > deadline 4.2. Stop the procedure 4.3. Else 4.4. If there is any processor such that the task can execute within processor’s deadline for reschedulable tasks 4.5 Select a processor such that the finish time of the task τi is minimized 4.6. Else 4.7. Stop the procedure 4.8. End If 4.9. End If Condition 2: If there is only one processor 4.1. Select the processor for the task τi Condition 3: If there is more than one processor 4.1. Apply a greedy algorithm for the weighted independent task set problem
for the task τi and the already assigned task 4.2. Select a processor such that the total energy is minimized 5. End For End Procedure
153
• Static scheduling (i.e., StaticDVS) presented in Chapter 3: This static scheduling provides near optimal solutions for energy minimization given an assignment. However, it keeps the schedule generated at compile time during runtime.
• Dynamic slack allocation (i.e., DynamicDVS) presented in Chapter 4: This dynamic slack allocation readjusts the schedule whenever a task finishes earlier than expected during runtime while keeping a given assignment. In our experiments, k-3 time lookahead slack allocation approach which gives good performance in terms of energy is used.
The dynamic algorithms (i.e., DynamicDVS, DynamicAssgn) are applied to a static
schedule that is based on a known assignment algorithm which assigns based on the early finish
time and a static slack allocation algorithm. We use a static scheduling algorithm presented in
Chapter 3. Also, for a fair comparison with DynamicDVS, k-3 time lookahead approach for
DynamicAssgn is used and PathDVS is used as a slack allocation method applied at runtime.
6.2.1 System Methodology
In this section, we describe DAG generation, dynamic environments generation, and
performance measure used in our experiments.
6.2.1.1 The DAG generation
We randomly generated a large number of graphs with 50 and 100 tasks. The execution
time of each task on each processor at the maximum voltage is varied from 10 to 40 units and the
communication time between a task and its child task for a pair of processors is varied from 1 to
4 units. The energy consumed to execute each task on each processor is varied from 10 to 80.
The execution of graphs is performed on 4, 8, 16, and 32 processors.
6.2.1.2 Dynamic environments generation
There are two broad parameters for dynamic environments:
• The number of tasks that finish earlier than expected (i.e., tasks whose the actual execution time is less than its estimated execution time) is given by the earlyFinishedTaskRate (i.e., number of early finished tasks = earlyFinishedTaskRate * total number of tasks).
• The amount of decrease for each task that finishes early is given by timeDecreaseRate (i.e., amount of decrease = timeDecreaseRate * estimated execution time).
154
We experimented with earlyFinishedTaskRate’s equal to 0.2, 0.4, 0.6, and 0.8 and
timeDecreaseRate’s equal to 0.1, 0.2, 0.3, and 0.4.
6.2.1.3 Performance measures
The deadline extension rate is the fraction of the total finish time that is added to the
deadline (i.e., deadline = (1 + deadline extension rate) * total finish time from assignments
without DVS scheme). We experimented with deadline extension rates equal to 0 (no extension),
0.01, 0.02, 0.05, 0.1, and 0.2, but only the results for no deadline extension are presented due to
space limitations since the results are similar. To compare algorithms, the normalized energy
consumption, that is, total energy normalized by the energy obtained from the static assignment
(before applying static slack allocation), is used. The computational time (i.e., runtime overhead)
is also performed as an important measure for algorithms in dynamic environments.
6.2.2 Comparison of Energy Requirements
Figures 6-2, 6-3, 6-4, and 6-5 show the comparison of our algorithm with static scheduling
and dynamic slack allocation in terms of energy consumption with respect to different time
decrease rates and different early finished task rates for 4, 8, 16, and 32 processors, respectively.
Based on the results, the combination of dynamic assignment and dynamic slack allocation (i.e.,
DynamicAssgn) significantly outperforms static scheduling and dynamic slack allocation in
terms of energy consumption. For instance, for 32 processors, DynamicAssgn improves energy
requirements by 15-26% and 8-12% compared to StaticDVS and DynamicDVS respectively.
These results show that adjusting the assignment at runtime as well as adjusting the slack at
runtime is necessary for minimizing the energy requirements. Furthermore, in general, the
improvement of DynamicAssgn over the other two algorithms increases as timeDecreaseRate
and earlyFinishedTaskRate increase.
155
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
Figure 6-2. Results for 4 processors: Normalized energy consumption of StaticDVS,
DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks
156
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
Figure 6-3. Results for 8 processors: Normalized energy consumption of StaticDVS,
DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks
157
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
Figure 6-4. Results for 16 processors: Normalized energy consumption of StaticDVS,
DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks
158
0.2 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.4 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.6 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
0.8 Early Finished Task Rate
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4
Time Decrease Rate
Nor
mal
ized
Ene
rgy
StaticDVSDynamicDVSDynamicAssgn
Figure 6-5. Results for 32 processors: Normalized energy consumption of StaticDVS,
DynamicDVS, and DynamicAssgn with respect to different time decrease rates and early finished task rates for 50 and 100 tasks
6.2.3 Comparison of Time Requirements
Figure 6-6 shows the average time requirement to readjust the schedule due to a single
task’s early finish (i.e., runtime overhead). The computational time of DynamicAssgn is an order
of magnitude larger than DynamicDVS since DynamicAssgn requires assignment process as
well as slack allocation process. However, DynamicAssgn requires 0.02-0.04 seconds in average
to readjust the schedule at runtime – this small time should make it useful for a large number of
computation intensive applications.
159
50 Tasks
1000
10000
100000
1000000
10000000
100000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e
DynamicDVSDynamicAssgn
100 Tasks
1000
10000
100000
1000000
10000000
100000000
0.1 0.2 0.3 0.4
Time Decrease Rate
Com
puta
tiona
l Tim
e
DynamicDVSDynamicAssgn
Figure 6-6. Computational time to readjust the schedule from an early finished task with respect
to different time decrease rates (unit: ns – via logarithmic scale)
160
CHAPTER 7 CONCLUSION AND FUTURE WORK
Energy consumption is a critical issue in parallel and distributed embedded systems. The
scheduling for DVS based energy minimization broadly consists of two steps: assignment and
slack allocation.
• Assignment: This step determines the ordering to execute tasks and the mapping of tasks to processors based on the computation time at the maximum voltage level.
• Slack allocation: Once the assignment of each task is known, this step allocates variable amount of slack to each task so that the total energy consumption is minimized while the DAG can execute within a given deadline.
We have presented novel scheduling algorithms to minimize DVS based energy
consumption of DAG based applications under the deadline constraints for parallel systems. The
proposed scheduling algorithms are classified into four categories: static slack allocation,
dynamic slack allocation, static assignment, and dynamic assignment, presented in Chapter 3, 4,
5, and 6, respectively. In this chapter, we review our main contributions for scheduling
algorithms presented in this thesis.
7.1 Static Slack Allocation
In Chapter 3, we have presented a novel static slack allocation algorithm (i.e., static DVS
scheme) for DAG based application in parallel and distributed systems. There are three main
contributions of our method:
• The performance in terms of reducing energy is comparable to LP (Linear Programming) based algorithm which provides near optimal solutions.
• It requires significantly less memory as compared to the LP based algorithm and can be scaled to larger size problems.
• The time requirements of our algorithm are an order to two orders of magnitude faster than the LP based algorithm when the amount of total available slack is small (i.e., tight deadline).
161
Our experimental results also show that the energy reduction of our proposed algorithm is
considerably better than simplistic schemes. Furthermore, based on the efficient techniques for
search space reduction such as compatible task lists, compression, and lower bound, the branch
and bound search method can be effectively used to provide near optimal solutions for energy
minimization while requiring the low computational time.
7.2 Dynamic Slack Allocation
In Chapter 4, we have presented novel slack allocation algorithms to minimize energy
consumption/meet deadline constraints for DAG based applications in dynamic environments,
where the actual execution time of a task may be different from its estimated time. There are
three main contributions of our methods:
• They require significantly less computational time (i.e., runtime overhead) than applying the static algorithm at runtime for every instance when a task finishes early or late.
• The performance in terms of reducing energy and/or meeting a given deadline is comparable to applying the static algorithm at runtime.
• They are effective for cases when the estimated execution time is underestimated or overestimated.
The experimental results also show that our methods offer significant improvement over
simplistic greedy methods in terms of energy requirements and/or satisfying the deadline
constraints. Our methods have been shown to work for environments where the estimated time
for all tasks is greater than or equal to the execution time (i.e., underestimation) or where the
estimated time for all tasks is less than or equal to the execution time (i.e., overestimation).
However, they should be equally effective for hybrid environments where some tasks complete
before estimated time while some tasks complete after estimated time.
162
7.3 Static Assignment
In Chapter 5, we have presented novel static assignment algorithms to minimize DVS
based energy consumption of DAG based applications for parallel systems. The proposed
assignment algorithms effectively assign tasks to appropriate processors with the goal of energy
minimization by utilizing expected DVS based energy information during assignment and
considering multiple task prioritizations based time and energy. There are three main
contributions of our methods:
• Through the assignment method to minimize finish time, the deadline constraints are satisfied and also the energy can be reduced due to the generation of a larger amount of slack that can be allocated to tasks during the slack allocation step.
• The performance in terms of reducing energy requirements is significantly improved by incorporating energy minimization during the assignment process.
• They require two to three orders of magnitude less time as compared to the Genetic Algorithm based formulations which outperform other existing algorithms in terms of energy consumption.
Our experimental results show that our proposed algorithms significantly outperform in terms of
energy consumption with the lower computational time compared to existing algorithms.
7.4 Dynamic Assignment
In Chapter 6, we have presented a novel assignment algorithm to minimize energy
consumption for dynamic environments. The proposed algorithm adjusts the schedule by
reassigning tasks to processors and then reallocating slack to tasks, whenever a task finishes
earlier than expected at runtime. There are two main contributions of our method:
• The time requirements of our scheme are small enough that it should be useful for a large number of application workflows.
• It provides considerably better energy minimization compared to (a) static scheduling without any change of the schedule at runtime and (b) only reallocating the slack at runtime while keeping the assignment.
163
Our experimental results show that our proposed algorithms significantly outperform in terms of
energy consumption with the lower computational time. Our scheme can easily be modified to
cases when the actual execution time is greater than the estimated time like dynamic slack
allocation, although in these cases the deadline guarantees cannot be maintained.
7.5 Future Work
In this thesis, we have presented scheduling algorithms assuming that there is no resource
contention. However, in practice, resources such as buses, caches, and I/O devices may be shared
between multiple tasks. These types of resource conflict can have a significant impact on the
time and energy requirements and have to be effectively incorporated in scheduling. We will
develop algorithms that can model and encompass these issues for energy minimization.
164
LIST OF REFERENCES
1. AeA (formerly American Electronics Association) Report Cybernation, http://www.aeanet.org
2. R. K. Ahuja and J. B. Orlin, A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints, Operations Research, 49(5), Sept. 2001, pp. 784-789.
3. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics, Euromicro Conference on Real-Time Systems (ECRTS’01), Delft, Netherlands, June 2001, pp.225-232.
4. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Dynamic and Aggressive Scheduling Techniques for Power-Aware Real-Time Systems, Real-Time Systems Symposium (RTSS’01), London, UK, Dec. 2001, pp.95-105.
5. H. Aydin, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Power-Aware Scheduling for Periodic Real-Time Tasks, IEEE Transactions on Computers, 53(5), May 2004, pp.584-600.
6. N. K. Bambha, S. S. Bhattacharyya, J. Teich, and E. Zitzier, A Hybrid Global/Local Search Strategies for Dynamic Voltage Scaling in Embedded Multiprocessors, International Symposium on Hardware/Software Codesign (CODES’01), Copenhagen, Denmark, Apr. 2001, pp.243-248.
7. S. Basagni, Finding a Maximal Weighted Independent Set in Wireless Networks, Telecommunication Systems, 18(1-3), Sept. 2001, pp.155-168.
8. T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems, Journal of Parallel and Distributed Computing, 61(6), June 2001, pp.810-837.
9. T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, Dynamic Voltage Scaled Microprocessor System, IEEE Journal of Solid-State Circuits, 35(11), Nov. 2000, pp.1571-1580.
10. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-Power CMOS Digital Design, IEEE Journal of Solid-State Circuits, 27(4), Apr. 1992, pp.473-484.
11. J. Chen, H. Hsu, K. Chuang, C. Yang, A. Pang, and T. Kuo, Multiprocessor Energy-Efficient Scheduling with Task Migration Considerations, Euromicro Conference on Real-Time Systems (ECRTS’04), Sicily, Italy, July 2004, pp.101-108.
165
12. J. Chen and T. Kuo, Multiprocessor Energy-Efficient Scheduling for Real-Time Tasks with Different Power Characteristics, International Conference on Parallel Processing (ICPP’05), Oslo, Norway, June 2005, pp.13-20.
13. P. Chowdhury and C. Chakrabarti, Static Task-Scheduling Algorithms for Battery-Powered DVS Systems, IEEE Transactions on Very Large Scale Integration Systems, 13(2), Feb. 2005, pp.226-237.
14. CPLEX, http://www.ilog.com/products/cplex/
15. Dataquest, http://data1.cde.ca.gov/dataquest/
16. H. El-Rewini and T. G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target Machines, Journal of Parallel Distributed Computing, 9(2), June 1990, pp.138-153.
17. W. Felter, K. Rajamani, T. Keller, and C. Rusu, A Performance-conserving Approach for Reducing Peak Power Consumption in Server Systems, International Conference on Supercomputing (ICS’05), Cambridge, MA, USA, June 2005, pp.293-302
18. F. Franchetti, Y. Voronenko, and M. Pueschel, FFT Program Generation for Shared Memory: SMP and Multicore, Supercomputing (SC’06), Tampa, FL, USA, Nov. 2006, pp.51.
19. D. Geer, Chip Makers Turn to Multicore Processors, IEEE Computer, 38(5), May 2005, pp.11-13.
20. K. Govil, E. Chan, and H. Wasserman, Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU. International Conference on Mobile Computing and Networking, Berkeley, CA, USA, Nov. 1995, pp.13-25.
21. F. Gruian, Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors, International Symposium on Low Power Electronics and Design, Huntington Beach, CA, USA, Aug. 2001, pp.46-51.
22. F. Gruian and K. Kuchcinski, LEneS: Task Scheduling for Low-Energy Systems Using Variable Supply Voltage Processors, Asian South Pacific Design Automation Conference (ASP-DAC’01), Yokohama, Japan, Jan. 2001, pp.449-455.
23. F. Gruian and K. Kuchcinski, Uncertainty-Based Scheduling: Energy-Efficient Ordering for Tasks with Variable Execution Time, International Symposium on Low Power Electronics and Design, Seoul, Korea, Aug. 2003, pp.465-468.
24. D. S. Hochbaum and J. G. Shanthikumar, Convex Separable Optimization Is Not Much Harder than Linear Optimization, Journal of the ACM, 37(4), Oct. 1990, pp.843-862.
25. I. Hong, G. Qu, M. Porkonjak, and M. B. Srivastava, Synthesis Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors, Real-Time Systems Symposium (RTSS’98), Madrid, Spain, Dec. 1998, pp.178-187.
166
26. I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M. B. Srivastava, Power Optimization of Variable-Voltage Core-Based Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(12), Dec. 1999, pp.1702-1714.
27. J. Hu and R. Marculescu, Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures under Real-Time Constraints, Design, Automation and Test in Europe Conference (DATE’04), Paris, France, Feb. 2004, pp.10234.
28. J. Hu and R. Marculescu, Communication and Task Scheduling of Application-Specific Networks-on-Chip, Computer and Digital Techniques, 152(5), Sept. 2005, pp.643-651
29. S. Hua and G. Qu, Power Minimization Techniques on Distributed Real-Time Systems by Global and Local Slack Management, Asia South Pacific Automation Conference (ASP-DAC’05), Shanghai, China, Jan. 2005, pp.830-835.
30. O. H. Ibarra and C. E. Kim, Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors, Journal of the ACM, 24(2), Apr. 1977, pp. 280-289.
31. T. Ishihara and H. Yasuura, Voltage Scheduling Problem for Dynamically Variable Voltage Processors, International Symposium on Low Power Electronics and Design (ISLPED’98), Monterey, CA, USA, Aug. 1998, pp.197-202.
32. M. Iverson, F. Ozuner, G. Follen, Parallelizing Existing Applications in a Distributed Heterogeneous Environment, Heterogeneous Computing Workshop (HCW’95), Santa Barbara, California, USA, Apr. 1995, pp.93-100.
33. R. Jejurikar and R. Gupta, Dynamic Slack Reclamation with Procrastination Scheduling in Real-Time Embedded Systems, Design Automation Conference (DAC’05), San Diego, California, USA, June 2005, pp.111-116.
34. R. Jejurikar and R. Gupta, Energy-Aware Task Scheduling With Task Synchronization for Embedded Real-Time Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(6), June 2006, pp.1024-1037.
35. R. Jejurikar and R. Gupta, Optimized Slowdown in Real-Time Task Systems, IEEE Transactions on Computers, 55(12), Dec. 2006, pp.1588-1598.
36. A. Jerraya, H. Tenhunen, and W. Wolf, Multiprocessor Systems-on-Chips, IEEE Computer, 38(7), July 2005, pp.36-40.
37. P. V. Karzanov and S. T. McCormick, Polynomial Methods for Separable Convex Optimization in Unimodular Linear Spaces with Applications, SIAM Journal of Computing, 26(4), Aug. 1997, pp.1245-1275.
38. W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, Performance Comparison of Dynamic Voltage Scaling Algorithms for Real-Time Systems, Real-Time and Embedded Technology and Application Symposium (RTAS’02), San Jose, CA, USA, Sept. 2002, pp.219-228.
167
39. R. Kumarm K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, International Symposium on Microelectronics, Washington, DC, USA, Dec. 2003, pp. 81.
40. R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous Chip Multiprocessors, IEEE Computer, 38(11), Nov. 2005, pp. 32-38.
41. Y. Kwok and I. Ahmad, Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 7(5), May 1996, pp.506-521.
42. Y. Kwok and I. Ahmad, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Computing Surveys, 31(4), December 1999, pp.406-471.
43. W. Kwon and T. Kim, Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors, ACM Transactions on Embedded Computing Systems, 4(1), Feb. 2005, pp.211-230.
44. G. Q. Liu, K. L. Poh, and M. Xie, Iterative List Scheduling for Heterogeneous Computing, Journal of Parallel and Distributed Computing, 65(5), May 2005, pp.654-665.
45. J. Luo and N. K. Jha, Power-conscious Joint Scheduling of Periodic Task Graphs and Aperiodic Tasks in Distributed Real-time Embedded Systems, International Conference on Computer-Aided Design (ICCAD’00), San Jose, California, USA, Nov. 2000, pp.357-364.
46. J. Luo and N. K. Jha, Battery-Aware Static Scheduling for Distributed Real-Time Embedded Systems, Design Automation Conference (DAC’01), Las Vegas, NV, USA, June 2001, pp.444-449.
47. J. Luo and N. K. Jha, Static and Dynamic Variable Voltage Scheduling Algorithms for Real-Time Heterogeneous Distributed Embedded Systems, Asia South Pacific Design Automation Conference (ASP-DAC’02), Bangalore, India, Jan. 2002, pp.712-719.
48. J. Luo and N. K. Jha, Power-profile Driven Variable Voltage Scaling for Heterogeneous Distributed Real-time Embedded Systems, International Conference on VLSI Design (VLSI’03), Las Vegas, Nevada, USA, Jan. 2003, pp.369-375.
49. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling for Minimizing Energy or Minimizing Power, International Conference on Acoustic, Speech, and Signal Processing (ICASSP’00), Istanbul, Turkey, June 2000, pp.3239-3242.
50. A. Manzak and C. Chakrabarti, Variable Voltage Task Scheduling Algorithms for Minimizing Energy, International Symposium on Low Power Electronic Design (ISLPED’01), Huntington Beach, California, USA, Aug. 2001, pp.279-282.
168
51. R. Mishra, N. Rastogi, D. Zhu, D. Mossé, and R. Melhem, Energy Aware Scheduling for Distributed Real-Time Systems, International Parallel and Distributed Processing Symposium (IPDPS’03), Nice, France, Apr. 2003, pp.21b.
52. P. Pillai and K. G. Shin, Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems, ACM Symposium On Operating Systems Principles, Banff, Alberta, Canada, Oct. 2001, pp.89-102.
53. S. Sakai, M. Togasaki, and K. Yamazaki, A Note on Greedy Algorithms for the Maximum Weighted Independent Set Problem, Discrete Applied Mathematics, 126(2-3), Mar. 2003, pp.313-322.
54. V. Sarkar, Partitioning and Scheduling Parallel Programs for Multi-processors, Cambirdge, Mass, MIT Press, 1989.
55. M. T. Schmitz and B. M. Al-Hashimi, Considering Power Variations of DVS Processing Elements for Energy Minimisation in Distributed Systems, International Symposium on System Synthesis, Montréal, P.Q., Canada, Oct. 2001, pp.250-255.
56. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Systems, Design, Automation, and Test in Europe Conference (DATE’02), Paris, France, Mar. 2002, pp.514-521.
57. M. T. Schmitz, B. M. Al-Hashimi, and P.Eles, Iterative Schedule Optimization for Voltage Scalable Distributed Embedded Systems, ACM Transactions on Embedded Computing Systems, 3(1), Feb. 2004, pp.182-217.
58. S. Shankland and M. Kanellos, Intel to Elaborate on New Multicore Processor, http://news.zdnet.co.uk/hardware/0,1000000091,39116043,00.htm?r=1
59. Y. Shin and K. Choi, Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems, Design Automation Conference (DAC’99), New Orleans, Louisiana, USA, June 1999, pp.134-139.
60. Y. Shin, K. Choi, and T. Sakurai, Power Optimization of Real-Time Embedded Systems on Variable Speed Processors, International Conference on Computer-Aided Design (ICCAD’00), San Jose, California, USA, Nov. 2000, pp.365-368.
61. S. Shivel, H. J. Siegel, A. A. Maciejewski, P. Sugavanam, T. Banka, R. Castain, K. Chindam, S. Dussinger, P. Pichumani, P. Satyqsekaran, W. Saylor, D. Sendek, J. Sousa, J. Sridharan, and J. Velazco, Static Allocation of Resources to Communicating Subtasks in a Heterogeneous Ad Hoc Grid Environment, Journal of Parallel and Distributed Computing, 66(4), Apr. 2006, pp.600-611.
62. G. C. Sih and E. A. Lee, A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures, IEEE Transactions on Parallel and Distributed Systems, 4(2), Feb. 1993, pp.175-187.
169
63. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, Reducing Power in High-Performance Microprocessors, Design Automation Conference (DAC’98), San Francisco, California, USA, June 1998, pp.732-737.
64. H. Topcuoglu, S. Hariri, and M. Wu, Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Transactions on Parallel and Distributed Systems, 13(3), Mar. 2002, pp.260-274.
65. D. Warrier, W. E. Wilhelm, J. S. Warren, I. V. Hicks, A Branch-and-Price Approach for the Maximum Weight Independent Set Problem, Networks, 46(4), Dec. 2005, pp. 198-209.
66. M. Weiser, B. Welch, A. Demers, and S. Shenker, Scheduling for Reduced CPU Energy, USENIX Conference on Operating Systems Design and Implementation, Monterey, CA, USA, Nov. 1994, pp.13-23.
67. S. Williams, L. Oliker, R. Vuduc, K. Yelick, J. Demmel, and J. Shalf, Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms, Supercomputing (SC’07), Reno, NV, USA, Nov. 2007, pp.38.
68. W. Wolf, The Future of Multiprocessor Systems-on-Chips, Design Automation Conference (DAC’04), San Diego, CA, USA, June 2004, pp.681-685.
69. M. Y. Wu and D. D. Gajski, Hypertool: A Programming Aid for Message-Passing Systems, IEEE Transactions on Parallel and Distributed Systems, 1(3), July 1990, pp.330-343.
70. C. Yang, J. Chen, T. Kuo, An Approximation Algorithm for Energy-Efficient Scheduling on A Chip Multiprocessor, Design, Automation, and Test in Europe Conference (DATE’05), Munich, Germany, Mar. 2005, pp.468-473.
71. T. Yang and A. Gerasoulis, DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors, IEEE Transactions on Parallel and Distributed Systems, 5(9), Sept. 1994, pp.951-967.
72. R. Yao, A. Demers, and S. Shenker, A Scheduling Model for Reduced CPU Energy, IEEE Symposium on Foundations of Computer Science (FOCS’95), Milwaukee, Wisconsin, USA, Oct. 1995, pp.374-382.
73. Y. Yu and V. K. Prasanna, Resource Allocation for Independent Real-Time Tasks in Heterogeneous Systems for Energy Minimization, Journal of Information Science and Engineering, 19(3), May 2003, pp.433-449.
74. Y. Yu and V. K. Prasanna, Energy-Balanced Task Allocation for Collaborative Processing in Wireless Sensor Networks, Mobile Networks and Applications, 10(1-2), Feb. 2005, pp.115-131.
170
75. Y. Zhang, X. (Sharon) Hu, and D. Z. Chen, Task Scheduling and Voltage Selection for Energy Minimization, Design Automation Conference (DAC’02), New Orleans, Louisiana, USA, June 2002, pp.183-188.
76. D. Zhu, R. Melhem, and B. R. Childers, Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 14(7), July 2003, pp.686-700.
77. D. Zhu, D. Mossé, and R. Melhem, Power-Aware Scheduling for AND/OR Graphs in Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 15(9), Sept. 2004, pp.849-864.
78. J. Zhuo and C. Chakrabarti, An Efficient Dynamic Task Scheduling Algorithm for Battery Powered DVS Systems, Asian South Pacific Design Automation Conference (ASP-DAC’05), Shanghai, China, Jan. 2005, pp.846-849.
79. J. Zhuo and C. Chakrabarti, System-Level Energy-Efficient Dynamic Task Scheduling, Design Automation Conference (DAC’05), San Diego, California, USA, June 2005, pp.628-631.
BIOGRAPHICAL SKETCH
Jaeyeon Kang obtained her Master of Science in computer science from University of
Southern California in 2002. She obtained her Master of Science and Bachelor of Science
degrees in electrical and computer engineering from Sungkyunkwan University, Korea in 1997
and 1999 respectively.