Optimizing Performance and Reliability on Heterogeneous...

41
Optimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation Algorithms and Heuristics Emmanuel Jeannot a , Erik Saule b , Denis Trystram c a INRIA Bordeaux Sud-Ouest, Talence, France b BMI - Ohio State University - Columbus 43210, OH / USA c Grenoble Institute of Technology, Grenoble, France Abstract We study the problem of scheduling tasks (with and without precedence constraints) on a set of related processors which have a probability of failure governed by an exponen- tial law. The goal is to design approximation algorithms or heuristics that optimize both makespan and reliability. First, we show that both objectives are contradictory and that the number of points of the Pareto-front can be exponential. This means that this prob- lem cannot be approximated by a single schedule. Second, for independent unitary tasks, we provide an optimal scheduling algorithm where the objective is to maximize the relia- bility subject to makespan minimization. For the bi-objective optimization, we provide a (1+,1)-approximation algorithm of the Pareto-front. Next, for independent arbitrary tasks, we propose a ¯ 2, 1 -approximation algorithm (i.e. for any fixed value of the makespan, the obtained solution is optimal on the reliability and no more than twice the given makespan) that has a much lower complexity than the other existing algorithms. This solution is used to derive a (2 + , 1)-approximation of the Pareto-front of the problem. All these proposed solutions are discriminated by the value of the product {failure rate}×{unitary instruction execution time} of each processor, which appears to be a cru- cial parameter in the context of bi-objective optimization. Based on this observation, we provide a general method for converting scheduling heuristics on heterogeneous clusters into heuristics that take into account the reliability when there are precedence constraints. The average behaviour is studied by extensive simulations. Finally, we discuss the specific case of scheduling a chain of tasks which leads to optimal results. Preprint submitted to J. of Parallel and Dist. Computing November 22, 2011

Transcript of Optimizing Performance and Reliability on Heterogeneous...

Page 1: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Optimizing Performance and Reliability on Heterogeneous Parallel

Systems: Approximation Algorithms and Heuristics

Emmanuel Jeannota, Erik Sauleb, Denis Trystramc

aINRIA Bordeaux Sud-Ouest, Talence, FrancebBMI - Ohio State University - Columbus 43210, OH / USA

cGrenoble Institute of Technology, Grenoble, France

Abstract

We study the problem of scheduling tasks (with and without precedence constraints) on

a set of related processors which have a probability of failure governed by an exponen-

tial law. The goal is to design approximation algorithms or heuristics that optimize both

makespan and reliability. First, we show that both objectives are contradictory and that

the number of points of the Pareto-front can be exponential. This means that this prob-

lem cannot be approximated by a single schedule. Second, for independent unitary tasks,

we provide an optimal scheduling algorithm where the objective is to maximize the relia-

bility subject to makespan minimization. For the bi-objective optimization, we provide a

(1+ε,1)-approximation algorithm of the Pareto-front. Next, for independent arbitrary tasks,

we propose a⟨2, 1⟩-approximation algorithm (i.e. for any fixed value of the makespan, the

obtained solution is optimal on the reliability and no more than twice the given makespan)

that has a much lower complexity than the other existing algorithms. This solution is used

to derive a (2 + ε, 1)-approximation of the Pareto-front of the problem.

All these proposed solutions are discriminated by the value of the product {failure

rate}×{unitary instruction execution time} of each processor, which appears to be a cru-

cial parameter in the context of bi-objective optimization. Based on this observation, we

provide a general method for converting scheduling heuristics on heterogeneous clusters into

heuristics that take into account the reliability when there are precedence constraints. The

average behaviour is studied by extensive simulations. Finally, we discuss the specific case

of scheduling a chain of tasks which leads to optimal results.

Preprint submitted to J. of Parallel and Dist. Computing November 22, 2011

Page 2: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Keywords: Scheduling, Pareto-front approximation, Reliability, Makespan, Precedence

Task Graphs.

1. Introduction

With the recent development of large parallel and distributed systems (computational

grids, cluster of clusters, peer-to-peer networks, etc.), it is difficult to ensure that the re-

sources are always available for a long period of time. Indeed, hardware failures, software

faults, power breakdown or resources removal often occur when using a very large number

of machines. Hence, in this context, taking into account new objectives dealing with fault-

tolerance is a major issue. Several approaches have been proposed to tackle the problem of

faults. One possible approach is based on duplication. The idea is that if one resource fails,

other resources can continue to correctly execute the redundant parts of the application.

However, the main drawback of this approach is a possible waste of resources. An alterna-

tive solution consists in check-pointing the computations from time to time and, in case of

failure, to restart it from the last check-point [1, 2]. However, check-pointing an application

is costly and may require to modify it. Furthermore, restarting an application slows it down.

Therefore, in order to minimize the cost of the check-point/restart mechanism, it is necessary

to provide a reliable execution that minimizes the probability of failure of this application.

Scheduling an application corresponds to determine which resources will execute the tasks

and when they will start. Thus, the scheduling algorithm is responsible for minimizing the

probability of failure of the application by choosing the adequate set of resources that enable

a fast and reliable execution.

Unfortunately, as we will show in this paper, increasing the reliability implies, most of the

time, an increase of the execution time (a fast schedule is not necessarily a reliable one). This

motivates the design of algorithms that look for a set of trade-offs between these compromise

solutions.

In this paper, we study the problem of scheduling an application represented by a prece-

dence task graph or by a set of independent tasks on heterogeneous computing resources.

The objectives are to minimize the makespan and to maximize the reliability of the schedule.

2

Page 3: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

In the literature, this problem has been mainly studied from a practical point of view [3, 4, 5].

We lack analysis based of well-founded theoretical studies for this problem. Some unanswered

questions are the following:

• Is maximizing the reliability a difficult (NP-Hard) problem?

• Is it possible to find polynomial solutions of the bi-objective problem for special kind

of precedence task graphs?

• Is it possible to approximate the general problem for any precedence relations?

• Can we build approximation schemes?

• How to help the user in finding a good trade-off between reliability and makespan?

All these questions will be addressed in this article. More precisely we show why both

objectives are contradictory and how provide approximation of the Pareto-front1 in the case

of independent tasks and task graphs (with the special case of chain of tasks).

The main goal of this paper is to provide a deep understanding of the bi-criteria problem

(makespan vs. reliability) we study as well as different ways to tackle the problem depending

on the specificity of the input.

The content and the organization of this paper are as follows. In section 2.1, we intro-

duce the definition of reliability and makespan and some related notations. In section 2.2, we

present and discuss most significant related works. In section 3, we study some basic charac-

teristics of the bi-objective problem. In particular, we show that maximizing the reliability

is a polynomial problem (Proposition 1) and is simply obtained by executing the application

on the processors that have the smallest product of {failure rate} and {unitary instruction

execution time} sequentially. This means that minimizing the makespan is contradictory

to the objective of maximizing the reliability. Furthermore, we show that for the general

case, approximating both objectives simultaneously is not possible (Proposition 2). We show

1Intuitively, the Pareto-front is the set of best compromise solutions; any absolutely better solution being

infeasible

3

Page 4: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

that the number of points of the Pareto-front in the case of independent tasks can be expo-

nential (Theorem 2) and hence it is required to be able to approximate it. In section 4.2,

we study the problem of scheduling a set of independent unitary tasks (i.e. same length).

For this case, we propose an optimal algorithm (Algorithm 3) for maximizing the reliability

subject to makespan minimization. We also propose an (1+ε,1)-approximation of the Pareto-

front (Section 4.2.2). This means that we can provide a set of solutions of polynomial size

that approximates, at a constant ratio, all the optimal makespan/reliability trade-offs. In

section 4.3, we study the case of independent tasks of arbitrary length. We provide (Al-

gorithm 4) a⟨2, 1⟩-approximation algorithm (i.e. for any fixed value of the makespan, the

obtained solution is optimal on the reliability and no more than twice the given makespan)

and derive a Pareto-front approximation from this algorithm (Section 4.3.2). An experimen-

tal evaluation of this algorithm is provided in Section 4.4. All the above solutions emphasize

the importance of the {failure rate} by {unitary instruction execution time} product. Based

on this observation, we show, in section 5.1, how to easily transform a heuristic that targets

makespan minimization to a bi-objective heuristic for the case of any precedence relation

(Algorithm 5). In this case also, we demonstrate how to help the users to choose a suitable

makespan/reliability trade-off. We implement this methodology using two heuristics and we

compare our approach against other heuristics of the literature. Moreover, in section 5.2

we study a special sub-case of precedence task graphs where all the tasks are sequentially

serialized by a chain (lemma 4). Finally, we conclude the paper and discuss some challenging

perspectives.

2. Preliminaries

2.1. Problem Definition

As in most related studies, a parallel application is represented by a precedence task

graph: let G = (T , E) be a Directed Acyclic Graph (DAG) where T is the set of n vertices

(that represent the tasks) and E is the set of edges that represent precedence constraints

among the tasks (if there are any). Let Q be a set of m uniform processors as described

4

Page 5: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

in [6]. A uniform processor is defined as follows: processor j computes 1/τj operations by

time unit and pij = piτj denotes the running time of task i on processor j (τj is also called

the unitary instruction execution time, i.e. the time to perform one operation). In the

remainder of the paper, i will denote the task index while j will refer to the processors.pi

denotes the processing requirement of task i. Moreover, processor j has a constant failure

rate of λj. When a processor is affected by a failure, it stops working until the end of the

schedule (this model is usually called crash fault). If a processor fails before completing the

execution of all its tasks, the execution has failed.

A schedule s = (π, σ) is composed of two functions: a function π : T → Q that maps

a task to the processor that executes it and a function σ : T → R∗ that associates to each

task the time when it starts its execution. We denote by π−1 the function which maps a

processor to the set of tasks allocated on it; which we improperly call the inverse of function

π. To be valid a schedule must satisfy the precedence constraints and no processor should

execute more than a task at once. The completion time of processor j is the first time when

all its tasks are completed: Cj(s) = maxi∈π−1(j) σ(i) + pij. The makespan of a schedule is

defined as the maximum completion times Cmax(π) = maxjCj(π). The probability that a

processor j executes all its tasks successfully is given by an exponential law: Prjsucc(π) =

e−λjCj(π). We assume that faults are independent, therefore, the probability that schedule π

finished correctly is: Prsucc = ΠjPrjsucc(π) = e−

∑j Cj(π)λj . The reliability index is defined by

rel(π) =∑

j Cj(π)λj. When no confusion is possible, π will be omitted.

We are interested in minimizing both Cmax and rel simultaneously (i.e. minimizing the

makespan and maximizing the probability of success of the whole schedule).

2.2. Related Works

Optimizing single objectives. First, we discuss briefly how each single-objective problem

has been studied in the literature. The minimization of the makespan is a classical problem.

It is well-known that scheduling independent tasks on uniform processors in less than a fixed

amount of time is a NP-complete problem because it contains PARTITION as a sub-problem

which is NP-complete [7]. A low cost (2− 1m+1

)-approximation algorithm has been proposed

5

Page 6: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

in [8]. It consists of classical list scheduling where the longest task of the list is iteratively

mapped on the processor that will complete it the soonest. Hochbaum and Shmoys pro-

posed a PTAS(Polynomial Time Approximation Scheme) based on the bin packing problem

with variable bin sizes [9]. However, this result is only of theoretical interest because its

runtime complexity is far too high. The problem with precedence constraints is much less

understood from the approximation theory point of view. Without communication delay, the

best known approximation algorithm for arbitrary dependency graphs and uniform processor

is a O(logm)-approximation proposed by [10]. The problem with communication delay is

known to be difficult even on identical processors and often requires to make the distinction

between small communication delays and large communication delays or hypothesis such as

Unitary Execution Task [11]. It is beyond the scope of this paper to make a full review on

the scheduling theory, the reader is referred to [12] for more details.

Since there exists many reliability models, there exist multiple methods to minimize the

reliability depending on the chosen reliability model. Some models lead to harder problems

for determining the maximum reliability, the main problem is to avoid having dependent

probabilistic event which prevent the existence of a useful closed formula and that often arises

in schedule with replication. For instance, [13] needs to add constraint on the structure of the

schedule to be able to compute the reliability in polynomial time; without this restriction,

determining the reliability of a schedule would be a difficult problem [14]. We consider in

this work a realistic model where the schedule with the best reliability can be computed in

polynomial time (of course, the corresponding makespan may be very large). The assumption

of crash faults is realistic in the sense that it corresponds to the most common case of failure:

a machine goes offline. The assumption that the probability of success follows an exponential

law is a direct consequence of the assumption that the failure rate is constant during the

execution of the application. This assumption is reasonable since the execution time of the

application is small compared to the lifetime of the cluster. Moreover, this assumption is

the base of the Shatz-Wang reliability model [15] which has been used in numerous works on

reliability such as [13, 3, 4, 5]. Finally, some authors studied new non-conventional objectives

6

Page 7: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

like maximizing the number of tasks performed before failure [16].

Related bi-objective problems: Shmoys and Tardos studied the bi-objective problem of

minimizing both the makespan and the sum of costs of a schedule of independent tasks on

unrelated machines in [17]. This problem is mathematically the same as the problem of

optimizing the makespan and reliability of independent tasks. In their model, the cost is

induced by scheduling a task on a processor and the cost function is given by a cost matrix.

They proposed an algorithm based on two parameters, namely, a target value M for the

makespan and C for the cost and returns a schedule whose makespan is lower than 2M

with a cost better than C. This method can be adapted to solve our problem. However,

it is difficult to implement since it relies on Linear Programming and its complexity is in

O(mn2 log n) which is costly. Section 4.3.1 will present an algorithm tailored to our case

of uniform machine that is asymptotically faster by a ratio of O(nm). It is also possible

to use integrated approaches where one of the objectives implicitly contains the other like

the minimization of the mean makespan with check-points [18]. Here, the trade-off between

doing a check-point or not is included into the expression of the mean makespan.

Optimizing both makespan and reliability: several heuristics have been proposed to solve

this bi-objective problem. Dogan and Ozguner proposed in [3] a bi-objective heuristic called

RDLS. In [4], the same authors improved their solution using an approach based on genetic

algorithms. In [5], Hakem and Butelle proposed a bi-objective heuristic called BSA that out-

performs RDLS. In [19], the authors proposed MRCD, an algorithm to compute/reliability

compromise. They show that this compromise can be better than ones found by other

heuristics, but contrary to this work they do not focus on the whole Pareto-front. All these

results focused on the general case where the precedence task graph is arbitrary. Moreover,

none of the proposed heuristics have a constant approximation ratio. This manuscript is an

extended version of two works on this topic: [20] and [21]. On the theoretical side, we prove

here that the Pareto Front can be exponential and we work on the case of chains of tasks

and propose an optimal algorithm. On the experimental side we have added a huge set of

work concerning the experimental validation of our algorithm for the independent arbitrary

7

Page 8: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

task and for the different heuristics studied in the case of arbitrary task graph.

2.3. Preliminary Analysis

The goal of our work is to solve a bi-objective problem, namely minimizing the makespan

and maximizing the reliability (which corresponds to minimize the probability of failure).

Unfortunately, these objectives are conflicting. More precisely, as shown in the following

Proposition 1, the optimal reliability is obtained while mapping all the tasks on processor

j such that, j = argmin(τjλj), i.e., on the processor for which the product of {failure

rate}×{unitary instruction execution time} is minimal. However, from the view point of the

makespan, such a schedule can be arbitrarily far from the optimal one.

Proposition 1. Let S be a schedule where all the tasks have been assigned to processor j0,

in topological order, such that τj0λj0 is minimal. Let rel be the reliability of the successful

execution of schedule S. Then, any schedule S ′ 6= S, with reliability rel’, is such that rel ≤

rel’.

Proof. Suppose without loss of generality that j0 = 0 (i.e. ∀j : τ0λ0 ≤ τjλj). Then

rel = C0λ0 (all the tasks are mapped to processor 0). Let call C′j the completion date of

the last task on processor j with schedule S ′. Therefore, rel’ ≥∑m

j=0C′jλj (The inequality

comes from the idle times that may appear which can be omitted here since it decreases the

bound on rel’ and a lower bound is enough for our calculations). Let T be the set of tasks

that are not executed on processor 0 by schedule S ′. Then, C′0 ≥ C0 − τ0

∑i∈T pi (there are

still some tasks of T \T to execute on processor 0). Let T = T1 ∪ T2 ∪ . . . ∪ Tm, where Tj is

the subset of the tasks of T executed on processor j by schedule S ′ (these sets are disjoint:

∀j1 6= j2, Tj1 ∩ Tj2 = ∅). Then, ∀j, 1 ≤ j ≤ m, C′j ≥ τj

∑i∈Tj pi. Let us compute the

8

Page 9: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

difference rel’-rel:

≥∑m

j=0C′jλj − C0λ0 ≥ C0λ0 − τ0λ0

∑i∈T pi

+∑m

j=1

(τjλj

∑i∈Tj pi

)− C0λ0

=∑m

j=1

(τjλj

∑i∈Tj pi

)− τ0λ0

∑i∈T pi

=∑m

j=1

(τjλj

∑i∈Tj pi

)− τ0λ0

∑mj=1

(∑i∈Tj pi

)(because the Tj’s are disjoint)

=∑m

j=1

((τjλj − τ0λ0)

∑i∈Tj pi

)≥ 0 (because ∀j : τ0λ0 ≤ τjλj)

This proposition shows that the problem of minimizing the makespan subject to the

condition that the reliability is maximized corresponds to the problem of minimizing the

makespan using only processors having a minimal τjλj. If there is only one such single pro-

cessor, the problem is straightforward. In this case, the reliability is maximized only if all

the tasks are sequentially executed on this processor. However, in the case when there are

several processors that have the same minimal λjτj value, the problem is NP-Hard since it

requires to minimize the makespan on all of these processors.

The following proposition proves that for the problem we are interested in, there are no

solutions for the bi-objective problem simultaneously that are close to each of both objectives.

Proposition 2. The bi-objective problem of minimizing Cmax and rel cannot be approxi-

mated within a constant factor with a single solution.

Proof. Consider the class of instances Ik of the problem with two machines such that τ1 = 1,

τ2 = 1/k and λ1 = 1, λ2 = k2 (k ∈ R+∗) and a single task t1 with p1 = 1. There exist only

two feasible schedules, namely, π1 in which t1 is scheduled on processor 1 and π2 in which it

is scheduled on processor 2. Remark that π2 is optimal for Cmax and that π1 is optimal for

rel.

9

Page 10: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Cmax(π1) = 1 and Cmax(π2) = 1/k. This leads to Cmax(π1)/Cmax(π2) = k. This ratio

goes to infinity when k goes to infinity. Similarly, rel(π1) = 1 and rel(π2) = k2

k= k which

leads to rel(π2)/rel(π1) = k. Again, this ratio goes to infinity with k.

None of these feasible schedules can approximate both objectives within a constant factor.

Proposition 2 shows that the problem of optimizing simultaneously both objectives can

not be approximated. That is to say, in general, there exists no solution which is close

to the optimal value on both objectives at the same time. Therefore, we will tackle the

problem as optimizing one objective subject to the condition that the second one is kept

at a reasonable value ([22] Chap. 3, pp. 12). For our problem, it corresponds to maximize

the reliability subject to the condition that the makespan is under a threshold value. This

approach may be seen as giving the priority to the makespan (the most difficult objective

to optimize) and optimizing the reliability as a secondary goal. However, since finding the

optimal makespan is usually NP-hard, we aim first at designing an approximation algorithm

and then at determining an approximation of the Pareto-front.

As the number of Pareto-optimal solutions can be exponential, it is important to be able

to generate an approximation of the Pareto-front that has a polynomial size. In order to

achieve this goal, we use the methodology proposed by Papadimitriou and Yannakakis in

[23]. It is recalled briefly in the next section. This methodology will be used in section 4 for

the case of independent tasks.

3. Bi-objective Approximation

In bi-objective optimization there is no concept of absolute best solution. In general,

no solution is the best on both objectives. However, a given solution may be better than

another one on both objectives. It is said that the former Pareto-dominates the latter.

The interesting solutions in bi-objective optimization, called Pareto-optimal solutions,

are those that are not dominated by any other solutions. The Pareto-front (also called

Pareto-set) of an instance is the set of all Pareto-optimal solutions. Intuitively, the Pareto-

10

Page 11: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

x

rel

y

yρ2

Cmaxxρ1

Figure 1: Bold crosses are a (ρ1, ρ2)-approximation of the Pareto-front.

front divides the solution space between feasible and unfeasible solutions. It is the set

of interesting compromise solutions and determining this set is the main target of multi-

objective optimization. Unfortunately, this set is most of the time difficult to compute

because one of the underlying optimization problem is NP-hard or because its cardinality is

exponential. In our case, both reasons stand2. Thus, we look for an approximation of the

Pareto-front with a polynomial cardinality.

A generic method to obtain an approximated Pareto-front was introduced by Papadim-

itriou and Yannakakis in [23]. Pc is a (ρ1, ρ2)-approximation of the Pareto-front Pc∗

if each solution s∗ ∈ Pc∗ is (ρ1, ρ2)-approximated by a solution s ∈ Pc: ∀s∗ ∈ Pc∗,∃s ∈

Pc, Cmax(s) ≤ ρ1Cmax(s∗) and rel(s) ≤ ρ2rel(s

∗). Fig. 1 illustrates this concept. Crosses

are solutions of the scheduling problem represented in the (Cmax; rel) space. The bold

crosses are an approximated Pareto-front. Each solution (x; y) in this set (ρ1, ρ2)-dominates

a quadrant delimited in bold in the figure and whose origin is at (x/ρ1; y/ρ2). All solutions

are dominated by a solution of the approximated Pareto-front as they are included into a

(ρ1, ρ2)-dominated quadrant.

One possible way for building such an approximation uses an algorithm that constructs

2We will show in the next section that the size of the Pareto-front can be exponential

11

Page 12: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

a ρ2-approximation of the second objective constrained by a threshold on the first one. The

threshold cannot be exceeded by more than a constant factor ρ1. Such an algorithm is said

to be a⟨ρ1, ρ2

⟩-approximation algorithm. More formally,

Definition 1. Given a threshold value of the makespan ω, a⟨ρ1, ρ2

⟩-approximation algorithm

delivers a solution whose Cmax ≤ ρ1ω and rel ≤ ρ2rel∗(ω) where rel∗(ω) is the best possible

value of the reliability index in schedules whose makespan is less than ω.

Let APPROX be a⟨ρ1, ρ2

⟩-approximation algorithm (For instance, Algorithm 3 and 4,

we will explain later). Algorithm 1 constructs a (ρ1+ε, ρ2)-approximation of the Pareto-front

of the problem by applying APPROX on a geometric sequence of makespan thresholds. The

geometric sequence will only be considered between a lower bound Cminmax and an upper bound

Cmaxmax of makespan of Pareto-optimal solutions.

Algorithm 1: Pareto-front approximation (according to the method of Papadimitriou

and Yannakakis)

Data: ε a positive real number

Result: S a set of solutions

begink ← 0

S ← ∅

while k ≤ dlog1+ε/ρ1(Cmaxmax

Cminmax)e do

ωk ← (1 + ερ1

)kCminmax

sk ← APPROX(ωk)

S ← S ∪ {sk}

k ← k + 1return S

end

Theorem 1. The method of Papadimitriou and Yannakakis described in Algorithm 1 builds

a (ρ1 + ε, ρ2) approximation of the Pareto-front from a⟨ρ1, ρ2

⟩-approximation algorithm.

12

Page 13: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Cmax

rel∗(ωk)

rel∗(ωk+1)

ωk ωk+1 = (1 + ερ1

)ωk

×ρ1

rel

×ρ2

×(ρ1 + ε)

APPROX(ωk+1)

×(1 + ερ1

)

Figure 2: APPROX(ωk+1) is a (ρ1 + ε, ρ2) approximation of Pareto-optimal solutions whose makespan is

between ωk and ωk+1. There is at most a factor of ρ2 for the reliability between APPROX(ωk+1) and

rel∗(ωk+1). The ratio for the makespan between APPROX(ωk+1) and ωk+1 is less than ρ1 and ωk+1 =

(1 + ερ1

)ωk. Thus, APPROX(ωk+1) is a (ρ1 + ε, ρ2)-approximation of (ωk, rel∗(ωk+1))

Proof. Let s∗ be a Pareto-optimal schedule. Then, there exists k ∈ N such that (1 +

ερ1

)kCminmax ≤ Cmax(s

∗) ≤ (1 + ερ1

)k+1Cminmax. We show that sk+1 is an (ρ1 + ε, ρ2)-approximation

of s∗. The construction from step k to step k + 1 is illustrated in Figure 2.

• Reliability. rel(sk+1) ≤ ρ2rel∗((1 + ε

ρ1)k+1Cmin

max) (by definition). s∗ is Pareto-optimal,

hence rel(s∗) = rel∗(Cmax(s∗)). But, Cmax(s

∗) ≤ (1 + ερ1

)k+1Cminmax. Since rel∗ is a

decreasing function, we have: rel(sk+1) ≤ ρ2rel(s∗).

• Makespan. Cmax(sk+1) ≤ ρ1(1 + ερ1

)k+1Cminmax = (ρ1 + ε)(1 + ε

ρ1)kCmin

max (by definition)

and Cmax(s∗) ≥ (1 + ε

ρ1)kCmin

max.

Thus, Cmax(sk+1) ≤ (ρ1 + ε)Cmax(s∗).

Remark that APPROX(ωk) may not return a solution (in this case we sk is set to ∅

and we increment k). However, this is not a problem because it means that no solution has

a makespan lower than ωk. APPROX(ωk) approximates Pareto-optimal solutions whose

makespan is lower than ωk. Hence, there is no forgotten solution.

The algorithm generates dlog1+ ερ1

Cmaxmax

Cminmaxe solutions and calls the APPROX algorithm the

same number of times.

13

Page 14: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

4. Independent tasks

4.1. Size of the Pareto-front

Before proposing algorithmic solutions for the bi-objective problem, we show that it is

not possible to compute the whole Pareto-front in polynomial time. More precisely, we show

that the number of points of the Pareto-front can be exponential in the size of the input.

Theorem 2. There exists a class of instances whose set of Pareto-optimal solutions is ex-

ponential in the number of tasks.

Proof. The proof is obtained by exhibiting a class of instances with an exponential number of

solutions. Let us consider instance In composed of n tasks such that pi = 2i−1,∀i, 1 ≤ i ≤ n

and 2 processors where the first one is very fast and unreliable (τ1 = 2−n, λ1 = 1) whereas

the second one is very slow but highly reliable (τ2 = 1, λ2 = 2−n). The processor parameters

and task sizes induce that:

• The makespan is only determined by the task scheduled on processor 2: Cmax =∑i∈π−1(2) pi (or is equal to

∑ni=1 2i−1 × τ1 = 2n−1

2n≈ 1 if all the tasks are scheduled on

processor 1).

• The reliability is mainly determined by the tasks scheduled on processor 1: rel =∑i∈π−1(1) pi (the contribution of the tasks on processor 2 is less than 2n−1

2nand thus can

be omitted for the sake of clarity).

• There are exactly 2n solutions since each task may be scheduled either on processor 1

or 2. Each solution is uniquely described by the sum of processing times of the tasks

scheduled on processor 1 which can take all the values between 0 and 2n − 1.

From above, let solution πi be the schedule with a makespan of Cmax = i. Its reliability

is rel = 2n−1− i. All the solutions have different objective values. Moreover, the makespan

strictly increases with i whereas the reliability strictly decreases. This proves that each

solution is Pareto-optimal.

14

Page 15: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

4.2. Independent unitary tasks

Notice that when we consider only independent tasks, all the solutions are compact (i.e.,

they do not contain idle time) and the order of the tasks does not matter. Therefore, a

solution for independent unitary tasks is entirely defined by the number of tasks allocated

to each processor.

4.2.1. A⟨1, 1⟩-approximation algorithm

Given a makespan objective ω, we show how to find a task allocation that is the most

reliable for a set of n independent unitary tasks (∀i ∈ T , pi = 1).

To build a⟨ρ1, ρ2

⟩-approximation algorithm, we consider the problem of minimizing the

probability of failure subject to the condition that the makespan is constrained. Since the

tasks are unitary and independent, the problem is then to find for each processor j ∈ Q the

number of tasks aj to allocate on processor j such that the following constraints are fulfilled:

(1)∑

j∈Q aj = n. (2) The makespan is constrained: ∀j ∈ Q, ajτj ≤ ω. This threshold ω on

the makespan is assumed to be larger than the optimal makespan C∗max. (3) Subject to the

previous constraints, rel is minimized, i.e.,∑

j∈Q ajλjτj is minimized. Once the allocation

is known, it is easy to express a solution π such that aj = |π−1(j)|.

First, it is important to notice that finding a schedule whose makespan is smaller than

a given objective ω can be found in polynomial time. Indeed, Algorithm 2 determines the

minimal makespan allocation for any given set of independent unitary tasks as shown in [24],

pp. 161.

Second, we propose Algorithm 3 to solve the problem. It determines an optimal allocation

as proven by Theorem 3. It is a greedy algorithm that allocates the tasks to the processors

in an increasing order of their λjτj products. Each processor receive the largest number of

task while keeping the makespan less than ω.

Theorem 3. Algorithm 3 is a⟨1, 1⟩-approximation.

Proof. Let X be the number of tasks already assigned. Since when X < n we allocate at

most n−X tasks to a processor, at the end of the algorithm we have: X ≤ n (since ω ≥ C∗max,

15

Page 16: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Algorithm 2: Optimal allocation for independent unitary tasks

begin

for j from 1 to m do

aj ←⌊

1/τj∑1/τi

⌋× n

while∑aj < n do

k ← argminl(τl(al + 1))

ak ← ak + 1

end

Algorithm 3: Optimal reliable allocation for independent unitary tasks

Input: ω ≥ C∗max

beginSort the processors by increasing λjτj

X ← 0

for j from 1 to m do

if X < n then

aj ← min(n−X,

⌊ωτj

⌋)else

aj ← 0

X ← X + aj

end

16

Page 17: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

at the end of the algorithm X = n, i.e. all the tasks are assigned). For each processor j

we allocate at most b ωτjc tasks, hence the makespan constraint is respected: ajτj ≤ ω. Since

in Algorithm 2, the order of the tasks and the order of the processors are not taken into

account, Algorithm 3 is valid (i.e., all tasks are assigned using at most the m processors).

Hence, the makespan of the schedule is lower than ω.

We need to show that∑

j∈Q ajλjτj is minimum. First let us remark that Algorithm 3

allocates the tasks to the processors in increasing order of the λjτj values. Hence, any other

valid schedule π′ of allocation a′ is such that a′i < ai and a′j > aj for any i < j. Without loss

of generality, let us assume that a′1 = a1 − k, a′i = ai + k and aj = a′j for k ∈ N, 1 ≤ k ≤ ai,

j 6= 1 and j 6= i. Then, the difference between the two objective values is

D =∑x∈Q

axλxτx −∑x∈Q

a′xλxτx

= λ1τ1(a1 − a′1) + λiτi(ai − a′i)

= −kλ1τ1 + kλiτi

= k(λiτi − λ1τ1)

≥ 0 because λiτi ≥ λ1τ1.

Hence, the first allocation has a smaller objective value.

4.2.2. Approximating the Pareto-front

We propose below two methodologies for computing the Pareto-front based on Algo-

rithm 3.

The first technique consists in using the method of Papadimitriou and Yannakakis pre-

sented in Algorithm 1. Since Algorithm 3 is a⟨1, 1⟩-approximation algorithm, we obtain

a (1+ε,1) Pareto-front approximation thanks to Theorem 1. In this case, the lower bound

Cminmax = C∗max computed by Algorithm 2 and the upper bound Cmax

max = nτ1 is the makespan

where all the tasks are executed on the processor that leads to the most reliable schedule

(hence, the longer schedules are Pareto-dominated by this one). The time-complexity of this

method is in O(m log1+ε(nτ1)

)which is polynomial.

17

Page 18: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

The second method consists in calling Algorithm 3 only on relevant values of ω. It leads

to the question “What is the smallest value of ω′ > ω that produces a different schedule

?”. ω′ must be large enough to allow one task scheduled on processor j to be scheduled on

processor j′ < j instead, improving the reliability. Therefore, only the values of ω = xτj

are interesting; they correspond to the execution time of x(1 ≤ x ≤ n) tasks on processor

j(1 ≤ j ≤ m). There are less than nm interesting times and thus, less than nm Pareto-

optimal solutions. Using Algorithm 3, the exact Pareto-front can be found in O(nm2); this

time-complexity is exponential in the size of the instance. Indeed the size of the instance is

not n but O(log n): we only need to encode the value of n, not the n tasks as they are all

identical.

4.3. Independent arbitrary tasks

In this section, we extend the analysis to the case where the tasks are not unitary (the

values pi are integers). As before, the makespan objective is fixed and we aim at determining

the best possible reliability. However, since the problem of finding if there exists a schedule

whose makespan is smaller than a target value, given a set of processors and any independent

tasks, is NP-complete, it is not possible to find an optimal schedule unless P=NP.

4.3.1. A⟨2, 1⟩

approximation-algorithm

We present below a⟨2, 1⟩-approximation algorithm called CMLT (for ConstrainedMin-

LambdaTau) which has a better complexity and which is easier to implement than the general

algorithm presented in [17].

Let ω be the guess value of the optimum makespan. Let M(i) = {j | pij ≤ ω} be the

set of processors able to execute task i in less than ω units of time. It is obvious that if i is

executed on j /∈M(i) then, the makespan will be greater than ω.

The following proposition states that if task i has less operations than task i′, then all

the machines able to schedule i′ in less than ω time units can also schedule i in the same

time. The proof is directly derived from the definition of M and thus it is omitted.

Proposition 3. ∀i, i′ ∈ T such that pi ≤ pi′, M(i′) ⊆M(i)

18

Page 19: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

CMLT is presented as follows: for each task i considered in non-increasing number of

operations, schedule i on the processor j of M(i) that minimizes λjτj with Cj ≤ ω (or it

returns no schedule if there is no such processor). Sorting the tasks by non-increasing number

of operations implies that more and more processors are used over time.

The principle of the algorithm is rather simple. However several properties should be

verified to ensure that it is always possible to schedule all the tasks this way.

Lemma 1. CMLT returns a schedule whose makespan is lower than 2ω or ensures that there

is no schedule whose makespan is lower than ω.

Proof. We need first to remark that if the algorithm returns a schedule, then its makespan

is lower than 2ω (task i is executed on processor j ∈ M(i) only when Cj ≤ ω). It remains

to prove that if the algorithm does not return a schedule then there is no schedule with a

makespan lower than ω.

Suppose that task i cannot be scheduled on any processor of M(i). Then all processors

of M(i) execute tasks during more than ω units of time, ∀j ∈M(i), Cj > ω.

Moreover, due to Proposition 3, each task i′ ≤ i such that pi′ > pi could not have been

scheduled on a processor not belonging to M(i). Thus, in a schedule with a makespan lower

than ω, all the tasks i′ ≤ i must be scheduled on M(i).

There are more operations in the set of tasks {i′ ≤ i} than processors in M(i) can execute

in ω units of time.

Lemma 2. CMLT generates a schedule such that rel ≤ rel∗(ω)

Proof. We first construct a non-feasible schedule π∗ whose reliability is a lower bound of

rel∗(ω). Then, we will show that rel(CMLT ) ≤ rel(π∗).

We know from Theorem 3, that the optimal reliability under the makespan constraint

for unitary tasks and homogeneous processors is obtained by adding tasks to processors in

(sorted in increasing order of λτ) up to reaching the threshold ω. For arbitrary lengths, we

can construct a schedule π∗ using a similar method. Task i is allocated to the processor

of M(i) that minimizes the λτ product. But if i finishes after ω, the exceeding quantity is

19

Page 20: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

scheduled on the next processor belonging to M(i) in λτ order. Note that such a schedule

exists because CMLT returns a solution. Of course this schedule is not always feasible as

the same task can be required to be executed on more than one processor at the same time.

However, it is easy to adapt the proof of Theorem 3 and to how that rel(π∗) ≤ rel∗(ω).

The schedule generated by CMLT is similar to π∗. The only difference is that some

operations are scheduled after ω. In π∗, these operations are scheduled on less reliable

processors. Thus, the schedule generated by CMLT has a better reliability than π∗.

Finally, we have rel(CMLT ) ≤ rel(π∗) ≤ rel∗(ω) which concludes the proof.

Remark that if ω is very large, M(i) = Q for all tasks i and hence all the tasks will

be scheduled on the processor which minimizes the λτ product leading to the most reliable

schedule.

Lemma 3. The time complexity of CMLT is in O(n log n+m logm).

Proof. The algorithm should be implemented using a heap according to what is presented

in Algorithm 4. The cost of sorting tasks is in O(n log n) and the cost of sorting processors

is in O(m logm). Adding (and removing) a processor to (from) the heap costs O(logm) and

such operations are done m times. Heap operations cost O(m logm). Scheduling the tasks

and all complementary tests are done in constant time, and there are n tasks to schedule.

Scheduling operations cost is in O(n).

All the results of this section are summarized in the following theorem:

Theorem 4. CMLT is a⟨2, 1⟩-approximation algorithm with a complexity in O(n log n +

m logm).

4.3.2. Approximating the Pareto-front

Here again we can approximate the Pareto-front using the method of Papadimitriou and

Yannakakis. Thank to Theorem 1, Algorithm 1 applied on CMLT leads to a (2 + ε,1)-

approximation of the Pareto front.

20

Page 21: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Algorithm 4: CMLT

Input: ω the makespan threshold

begin

Sort the tasks in non-increasing pi order (now, ∀i ∈ [1, n− 1], pi ≥ pi+1)

Sort the processors in non-decreasing τj order (now, ∀j ∈ [1,m− 1], τj ≤ τj+1)

Let H be an empty heap

j ← 1

for i from 1 to n do

while j ∈M(i) doAdd j to H with key λjτj

j ← j + 1

if H.empty() thenReturn no solution

j′ ← H.min()

schedule i on j′

Cj′ ← Cj′ + piτj′

if Cj′ > ω then

Remove j′ from H

end

21

Page 22: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

The lower bound Cminmax =

∑i pi∑j

1τj

is obtained by considering that a single virtual processor

gathers the whole computational power of all the processors.

The upper bound Cmaxmax =

∑i pi maxj τj is the makespan obtained by scheduling all tasks

on the slowest processor. No solution can have a worse makespan without introducing idle

times which are harmful for both objective functions. Notice that Cmaxmax can be achieved by

a Pareto-optimal solution if the slowest processor is also the most reliable one.

The last points to answer are about the cardinality of the generated set and the complexity

of the algorithm.

• Cardinality: The algorithm generates less than dlog1+ ερ1

Cmaxmax

Cminmaxe ≤ dlog1+ ε

ρ1

maxiτi∑

j 1/τje

≤ dlog1+ ε2mmaxiτi

miniτie solutions which is polynomial in 1/ε and in the size of the instance.

• Complexity: Remark that CMLT sorts the tasks in an order which is independent of

ω. This sorting can be done once for all. Thus, the complexity of the Pareto-front

approximation algorithm is O(n log n+ dlog1+ε/2(Cmaxmax

Cminmax)e(n+m logm)).

In Section 2.2 we briefly recalled the work of Shmoys and Tardos done for a different

bi-objective problem [17] which may also be used in our context. Using this method, we can

derive a⟨2, 1⟩-approximation algorithm whose time-complexity is in O(mn2 log n). This is

larger than the time-complexity of CMLT in O(n log n+m logm). Moreover, in the perspec-

tive of approximating the Pareto-front of the problem with the method previously presented,

the algorithm derived from [17] would have a time-complexity of dlog1+ε/2(Cmaxmax

Cminmax)e(mn2 log n).

Unlike CMLT, this algorithm cannot be easily tuned to avoid a significant part of computa-

tions when the algorithm is called several times. Thus, CMLT is significantly better than the

algorithm presented in [17] which has been established in a more general setting on unrelated

processors.

4.4. Experimental analysis of CMLT

The goal of this section is to compare the front obtained by approximation Algorithm 1

applied with CMLT with an idealized virtual front (called F ). We intent to show that this

22

Page 23: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� Ref. point

π1

π2

π3

π4

π5

Fai

l.pro

ba.

Makespan

Figure 3: The hypervolume is the set of the points that are dominated by a point of the front and that

dominates the reference point. In this example, it is the blue zone. When the two objectives have to be

minimized, the hypervolume should be maximized.

algorithm has not only a very-good worst case guaranty as shown in Theorem 4, but has

also a good behavior on average.

More precisely, we use Algorithm 1 with ε = 10−3 applied on CMLT. The obtained result

is compared to a front F composed of three points, namely, the HEFT [25] schedule (oriented

to optimize the makespan), the most reliable schedule (obtained by scheduling all the tasks to

the processor with the smallest λτ product) and a fictitious schedule with the same makespan

as HEFT and the best reliability. Although one can find a better makespan-centric schedule

than the one found by HEFT, the front F is a very good front that dominates all the fronts

found by CMLT.

To compare the fronts we use the Hypervolume unary indicator [26] (see Fig. 3) which

considers the volume of the objective space dominated by the considered front up to a

reference point. This choice is motivated by the fact that this indicator is the only unary

indicator that is sensitive to any type of improvements. Hence, if a front maximizes this

indicator, then it contains all the Pareto-optimal solutions. Since we target a problem of

minimizing two objectives, the greater the hypervolume the better the front [26]. In our

case, the hypervolume of F is always a rectangular.

23

Page 24: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

2−approx vs. Inf. Bound

Hypervolume ratio

Fre

quen

cy

010

020

030

040

050

0

●●● ●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.6 0.7 0.8 0.9 1.0

Figure 4: ECDF and histogram of the hypervolume ratio between the approximation algorithm front and F

The input cases are the following. We consider three sets of machines with respectively

10, 20 and 50 processors. Speeds and the inverses of the failure rates are randomly generated

according to an uniform distribution. We generate sets of tasks with cardinality between

10 and 100 (by increment of 1). For each set of tasks we draw the processing requirement

uniformly between 1 and 100 (resp. 104, 106 and 109) for sets of class A (resp. B, C and D).

For each set and class of tasks, 4 different seeds were used.

In Fig. 4, we plot the empirical cumulative distributed function (ECDF) and the his-

togram of the ratio between the hypervolume of the two fronts for all the input cases (the

higher the ratio the closer the approximation algorithm front to F ). From this figure, we see

that the ratio is never lower than 0.6 and the median is 0.94 and 2/3 of the cases have a ratio

greater than 0.9. This means that the (2 + ε, 1)-approximation algorithms gives very good

fronts on average: in most of the cases, the obtained fronts are very close to the optimal

ones.

24

Page 25: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

5. Precedence Task Graphs

5.1. Arbitrary Graphs

In this section, we study the general case where there is no restriction on the precedence

task graph. We present three ways of designing bi-objective heuristics from makespan centric

ones. The first one is based on the characterization of the role of the λτ product {failure

rate} {unitary instruction execution time}. The second one uses aggregation to change the

allocation decision in the list-based makespan-optimizing heuristic. The third one, called

geometric, selects the solution that follows the best a given direction in the objectives space.

5.1.1. The case of communication

When dealing with regular task graph, edges model communication. In this case failure

of the network can also have an impact on the reliability. We could tackle this problem by

considering the network as a new resource like in [27, 5]. However, a simpler way to consider

this problem is to incorporate the network and the CPU into one entity called a node3. As

we only consider failstop error, a node has to be up from the start of the application to its

end. We assume that each node has a unique dedicated link to a fail-free network backbone.

If for a schedule π, node j is used during Cj(π), this means that both the network and the

CPU and the network must work. Let call λcj, λnj , and λlj the failure rate of the CPU and

the network card and the network link of node j the probability that the three are up is

therefore e−λcjCj(π) × e−λ

nj Cj(π) × e−λ

ljCj(π) = e−(λ

cj+λ

nj +λ

lj)Cj(π). This means the node has a

failure rate which is the sum of the failure rate of its CPU, its network card and its network

link to the fail-free backbone. Therefore, in the following we will call λj the failure rate of

the whole node in order to take into account the CPU and the network failures.

5.1.2. Approximating the Pareto-front Using a Makespan-Centric Heuristic

Both for the unitary and non-unitary independent tasks we have shown that scheduling

tasks on the nodes with the smallest λτ helps in improving the reliability. Therefore, in order

3In the remaining, the term node is used to encompass both the CPU and the network card.

25

Page 26: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

to approximate the Pareto-front we propose a heuristic, called GPFA (General Pareto-Front

Approximation), which is detailed in Algorithm 5 below.

Algorithm 5: GPFA a General heuristic for approximating the Pareto-front

Input: H a makespan centric heuristic

Data: G the input DAG

Result: S an approximation of the Pareto-front

beginSort the nodes in non-decreasing λjτj order

S ← ∅

for j from 1 to m doLet πj be the schedule of G obtained by H using the first j nodes

if πj is not dominated by any solutions of S then

S ← S ∪ {πj}

return S

end

The idea is to build a set of makespan/reliability trade-offs by scheduling the tasks on a

subset of nodes (sorted by non-decreasing λτ product) using a makespan centric heuristic.

The smaller the number of used nodes the larger the makespan and the better the reliability

(and vice-a-versa). We can use any makespan centric heuristics to implement this strategy

such as HEFT [25], BIL [28], PCT [29], GDL [30], HSA [5] or CPOP [25].

5.1.3. Bi-objective Aggregation-based Heuristic

The class of heuristics based on aggregation uses an additive function to combine objec-

tives. As in [5], we use the following function. Given a ranking of the tasks, the heuristic

schedules task i to the node j such that:√α

(end(i, j)

maxj′ end(i, j′)

)2

+ (1− α)

(piτjλj

maxj′ piτj′λj′

)2

is minimized, where, end(i, j) is the completion time of task i if it is scheduled as soon as

possible on node j and α is parameter given by users that determines the tradeoff between

26

Page 27: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

each objective (α = 1 leads to a makespan-centric heuristic). Each term represents one of

the objective and is normalized since all objectives are expressed in different units and can

have different orders of magnitude. The normalization is done relatively to an approximation

of the worst allocation of the tasks.

5.1.4. Bi-objective Geometric-based Heuristic

Concerning the geometric class of heuristics, the idea has been introduced in [31] and is

described below. The user provides an angle θ between 0◦ an 90◦ and a greedy scheduling

algorithm. Intuitively, θ is the direction in the objective space, the user wants to follow. A

value close to 0◦ means that the user favors the Makespan while a value close to 90◦ means

the opposite. At each step, a partial schedule S is constructed and a new task is considered.

The algorithm simulates its execution on all the m nodes and hence, it generates m partial

schedules, each one having its own reliability and makespan. Among these schedules, we

discard the Pareto dominated ones. Then, these partial schedules and S – the one generated

at the previous step – are plotted into a square of size 1, S being at the origin (see Fig 5).

Then, a line determined by the origin and an angle θ with the x-axis is drawn. The closest

partial schedule to this line is retained (s2 in the figure) and we proceed the next step.

5.1.5. Experimental Settings

We compare experimentally the three ways of designing bi-objective heuristics from

makespan centric ones by implementing them on HEFT and HSA. Therefore, GPFA is

used to derive P-HEFT and P-HSA, the aggregation scheme is used to derive B-HEFT and

B-HSA4, and the geometric construction is used to derive G-HEFT and G-HSA.

We have used 3 types of graphs: the Strassen DAG [32] and 2 random graphs namely

samepred (each created node can be connected to any other existing nodes) and layrpred

(where the nodes are arranged by layers). We have used the following parameters to build

the graphs:

• Number of tasks: 10, 100, 1000 for random graphs or 23, 163, 1143 for Strassen DAGs.

4notice that this heuristic was first proposed By Hakem and Butelle and is called BSA in [5]

27

Page 28: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Makespan

Θ

s1

s2

Fai

l.p

rob

a.

S

s3

s4

s5

s6

s7

s8

s9

s10

Figure 5: The geometric heuristic with 10 nodes: star are Parto-optimal solutions and crosses are dominated

solution and are discarded. Hence partial schedule s2 is selected and the task is mapped on node 2.

• Average task cost of the pis (in FLOP), for random graphs: 106, 107 or 109 (fixed by

structure for Strassen).

• Variation of the task costs: 0.5, 0.0001, 0.1, 0.3, 1 or 2 for random graphs (fixed by

structure for Strassen). These numbers, combined with the average costs are used to

compute the standard deviation of the Gamma distribution used to draw the task cost

(we use a Gamma distribution because it is a positive function that is commonly used

to model timings). In this case, the standard deviation is computed by multiplying the

average cost by the variation.

• Average communication cost (Byte) 103, 104 or 106 for random graphs (fixed by struc-

ture for Strassen).

• Variation of communication costs: 0.5, 0.0001, 0.1, 0.3, 1 or 2 for random graphs (fixed

by structure for Strassen). Here again, the variation is used combined with the average

cost to compute the standard deviation of the distribution.

28

Page 29: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

• Average number of edges per node: 1, 3 or 5 for random graphs (fixed by structure for

Strassen).

• Number of available machines: 10, 20 or 50

• The speeds of the machines are randomly generated according to a uniform distribution:

τ = (107(U(1000) + 1))−1

• The inverses of the failure rates are randomly generated according to a uniform distri-

bution: λ = BFR/(U(1000)+1). Where BFR is used to scale the probability of failure

and is equal to 1, 10−3 or 10−6.

• The network is homogeneous and the topology supposed to be complete. There is no

latency.

• 10 seeds from 0 to 9 for random graphs (no seed for the Strassen).

All the combinations lead to 525 123 different settings. For each setting, we have com-

puted an approximation of the Pareto-front as follows. For P-HSA and P-HEFT we have

generated as many schedules as available nodes (one with the node with the smallest value

of λτ , one with the two nodes with the smallest λτ etc.). For the 4 other heuristics, we have

used 1000 different values for the compromise parameters to generate the front. Finally,

more than 2.1 billions of schedules have been computed.

Among the 525 123 fronts generated about 34 100 of them have only one point and are

not considered in the evaluation. This is the case where the probability of success of the

schedule is 0 even on the nodes which have the smallest λτ product (e.g. when the generated

platform is highly unreliable or the precedence task graph is very large).

5.1.6. Results

As in section 4.4, we use the Hypervolume [26] unary indicator to compare the fronts of

each heuristic.

We have computed the hypervolume using the same reference point for all the fronts

of the same setting that have at least 2 points. We then compare 2 by 2 each heuristic

29

Page 30: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

xlim

P.HEFT

xlim

0:1

43.50 % (>)32.17 % (=)

1.000[1.00,1.01]

xlim

0:1

84.20 % (>)2.20 % (=)

1.046[1.02,1.11]

xlim0:

1

83.11 % (>)2.04 % (=)

1.048[1.02,1.11]

xlim

0:1

69.08 % (>)4.11 % (=)

1.074[0.995,1.37]

xlim

0:1

69.78 % (>)3.16 % (=)

1.092[0.993,1.39]

log(ratio)x xlim

0:1 P.HSA

xlim

0:1

82.26 % (>)1.72 % (=)

1.046[1.01,1.11]

xlim

0:1

82.70 % (>)2.38 % (=)

1.045[1.01,1.10]

xlim0:

1

68.20 % (>)2.71 % (=)

1.086[0.992,1.36]

xlim

0:1

67.36 % (>)3.60 % (=)

1.077[0.992,1.38]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 G.HEFT

xlim

0:1

42.09 % (>)17.34 % (=)

1.000[0.99,1.01]

xlim

0:1

60.16 % (>)1.42 % (=)

1.054[0.916,1.30]

xlim

0:1

60.18 % (>)1.16 % (=)

1.068[0.915,1.32]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 G.HSA

xlim

0:1

60.74 % (>)1.09 % (=)

1.071[0.915,1.31]

xlim

0:1

60.52 % (>)1.30 % (=)

1.060[0.92,1.32]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 B.HEFT

xlim

0:1

50.68 % (>)20.33 % (=)

1.001[0.997,1.01]

0.5 1.0 2.0

Fre

quen

cyF

n(x)

0.5 1.0 2.0

Fre

quen

cyF

n(x)

0.5 1.0 2.0

Fre

quen

cyF

n(x)

0.5 1.0 2.0

Fre

quen

cyF

n(x)

0.5 1.0 2.0

0:1 B.HSA

Figure 6: Scatter plot of the ratio of the hypervolume indicator for the 6 heuristics

30

Page 31: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

by computing the ratio of the hypervolume of each front of each setting. Fig. 6 shows the

obtained results. The six heuristics are displayed on the diagonal of the figure. On the

lower part, the histogram and the ECDF (empirical cumulative distribution function) of the

hypervolume ratio are displayed for the two heuristics on the corresponding row and column.

On the upper part, we summarized some numerical values that indicate: the percentage of

ratios that are strictly above 1; the percentage of ratios that are equal to 1, the median ratio

and in brackets, the first and the third quartiles5 . For example, we see that 84.2 % of the

hypervolume ratio between P-HEFT and G-HEFT is greater than 1 (the hypervolume of P-

HEFT is greater than the one of G-HEFT in 84.2% of the cases), in 2.2% the hypervolumes

are equal, the median hypervolume ratio is 1.046. Moreover half of the ratios are between

1.02 and 1.11, a quarter of them being under 1.02 and the other quarter above 1.11.

The results show that GPFA-based heuristics (P-HEFT and P-HSA) perform the best

according to the hypervolume indicator. They are much better than geometric heuristics and

outperform aggregation heuristics in more than two thirds of the cases. P-HEFT is slightly

better than P-HSA (it outperforms P-HSA in 43.5% of the cases while P-HSA outperforms

P-HEFT in 24.33% of the cases). Next, geometric heuristics (GEFT and GFA) are better

than aggregation ones (B-HEFT and B-HSA). This is explained by the fact that some fronts

computed by B-HEFT or B-HSA are really bad (as shown by the histograms in the lower

part: some of the hypervolume are more than twice as large are their respective P-HEFT or

P-HSA counterpart). We also see that the HEFT ordering provides better results than the

HSA ordering (G-HEFT is slightly better than G-HSA and B-HEFT is slightly better than

B-HSA).

We have also compared the resources required to compute the fronts. We record for each

schedule the number of used nodes by the schedule. In Fig 7, we present a similar scatter

plot than in Fig 6. The difference is, as we want to minimize the number of nodes, we now

display, in the upper part the fraction of ratios that are lower than 1 instead of greater than

5The first (resp. third) quartile is the value such that 25% of the results are under (resp. above) this

value

31

Page 32: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

xlim

P.HEFT

xlim

0:1

32.98 % (>)20.09 % (=)

1.000[0.98,1.01]

xlim

0:1

11.62 % (>)7.22 % (=)

0.877[0.778,0.97]

xlim

0:1

12.00 % (>)6.39 % (=)

0.872[0.778,0.964]

xlim

0:1

2.87 % (>)8.15 % (=)

0.700[0.603,0.896]

xlim

0:1

4.31 % (>)7.30 % (=)

0.701[0.604,0.887]

log(ratio)x xlim

0:1 P.HSA

xlim

0:1

13.72 % (>)6.98 % (=)

0.885[0.78,0.979]

xlim

0:1

11.63 % (>)7.47 % (=)

0.883[0.781,0.97]

xlim

0:1

3.87 % (>)7.75 % (=)

0.700[0.607,0.908]

xlim

0:1

2.82 % (>)8.62 % (=)

0.701[0.608,0.899]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 G.HEFT

xlim

0:1

43.44 % (>)10.63 % (=)

1.000[0.97,1.03]

xlim

0:1

14.26 % (>)8.10 % (=)

0.867[0.732,0.988]

xlim

0:1

14.26 % (>)7.32 % (=)

0.862[0.728,0.984]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 G.HSA

xlim

0:1

16.06 % (>)7.32 % (=)

0.867[0.734,0.991]

xlim

0:1

14.90 % (>)7.73 % (=)

0.865[0.731,0.986]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 B.HEFT

xlim

0:1

41.08 % (>)13.25 % (=)

1.000[0.986,1.01]

0.6 1.0 1.6

Fre

quen

cyF

n(x)

0.6 1.0 1.6

Fre

quen

cyF

n(x)

0.6 1.0 1.6

Fre

quen

cyF

n(x)

0.6 1.0 1.6

Fre

quen

cyF

n(x)

0.6 1.0 1.6

0:1 B.HSA

Figure 7: Scatter plot of the ratio of the average number of nodess required to compute a given front for the

6 heuristics

32

Page 33: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

1.

The results show that GPFA-based heuristics (P-HEFT and P-HSA) use much less nodes

than the other heuristics (P-HEFT being slightly better than P-HSA, for this metric). Be-

tween 79.3% and 88.97% of the cases are favorable to this type of heuristics. Here again,

geometric heuristics are better than aggregation ones. Last, the difference between the HSA

and HEFT based heuristics is very low with the heuristics based on HEFT being marginally

better.

We also have performed some projections of these results to the different possible pa-

rameters (task cost, number of task, failure rate, etc.). Most of the time we notice that a

variation of a parameter has no or very little influence on the obtained results and hence re-

sults are not displayed here. The only interesting case is related to the number of machines.

When this number is low (10 machines to schedule the whole graph) we see that, for the

hypervolume metric, the B-HEFT and B-HSA heuristics perform the best. For instance the

hypervolume ratio is favorable to B-HEFT in 71.07% of the cases (resp. 74.27%, 93.36%,

92.96%) compared to P-HEFT (resp. P-HSA, G-HEFT, G-HSA) when using 10 nodes.

In conclusion to this section, we see that thanks to our understanding of the problem and

since we have identified the crucial role of the product {failure rate}× {unitary instruction

execution time} we have been able to design a heuristic that provides a good approximation

of the Pareto-front. Such understanding help us to overcome general solution (aggregation

or geometric-based heuristics) that do not use this problem-specific feature. Moreover, the

geometric heuristics are much better than aggregation ones.

5.2. A single chain on m nodes

Computing an approximation algorithm for the general case is a difficult problem. Indeed,

there are no known approximation algorithms for optimizing the makespan of an arbitrary

precedence task graph on related nodes. In fact, even the case of multiple chains has no

known constant approximation algorithm. In this section, we address the following special

case.

33

Page 34: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

5.2.1. Characterizing the Pareto-Front

In this section we are interested in an elementary subcase of the general graph case. The

precedence task graph is a single chain, therefore task i− 1 must be completed before task

i can start its execution.

Precedence constraints may induce idle times in the schedule and the formulation of Cj,

the completion time of a node must be modified to take the precedence into account. The

important point is that the reliability computation contains idle times.

The precedence graph being a single chain, only one of the nodes is working at a time.

Therefore, one can assume that no nodes are faster and more reliable than another one (a

node that is slower and less reliable than an other node could just be ignored). Without loss

of generality, the nodes are ordered from the fastest one to the most reliable one. That is to

say, τj < τj+1 and λjτj > λj+1τj+1; a direct consequence is that λj > λj+1.

Lemma 4. A solution that executes a task on node j after executing a task on node j′, j′ > j

is not Pareto-optimal.

Proof. Let π be such a solution. The proof is done by constructing a solution π′ that Pareto-

dominates π.

Let i be the last task executed on node j′ for which a successor is executed on node j.

π′ is constructed by keeping the same allocation as π except that task i is moved onto node

j. The completion times of all nodes are smaller in π′ than in π. Indeed, the completion

time of each node j′′ which executes a task i′ ≥ i diminishes by (τj′ − τj)pi. The other nodes

complete at the same time. This improves both the makespan and the reliability.

The solutions complying with Lemma 4 have the following structure: at most one interval

of tasks is scheduled on a node and if task i is scheduled on node j then, all the tasks i′ > i

are scheduled on nodes j′ ≥ j. Those solutions are in bijection with the set of partitions of

the chain of n tasks in m intervals, allowing empty intervals.

Those partitions can be enumerated using a recursive function that takes a schedule of

the first x tasks and returns all the solutions that comply with this partial schedule. The

number of such partition is in O(nm−1).

34

Page 35: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

Theorem 5. The number Pareto-optimal solutions for the problem of scheduling of chain of

tasks on related nodes to optimize the makespan and the reliability is in O(nm−1) and can be

enumerated using a algorithm of time-complexity O(nm−1).

Since each task may be scheduled on m different nodes, there are mn valid schedules to

this problem. However, Lemma 4 allows to restrict the number of Pareto-optimal solutions

to O(nm−1) which is significantly better since there are usually more tasks than nodes in

such problems.

5.2.2. Experimental evaluation

Here we compare the optimal front found by the method described above with the one

found by P-HEFT (GPFA implementation with the HEFT heuristic), G-HEFT (geometric

heuristic) and B-HEFT (aggregation heuristic). We did not use the HSA heuristic because

it differs from HEFT only on the ranking of the tasks and as we are dealing with chains, this

ranking is imposed by the chain and is the same for HSA and HEFT.

The experimental setting is the same as in section 4.4. The only difference being that

the tasks are strictly ordered. Moreover, as the optimal algorithm is exponential we limit its

usage to the case where the number of explored solutions is lower than 100 000 000. Moreover,

we discarded the cases where the Pareto-front is reduced to one point (e.g. low number of

tasks).

Finally, we have compared more than 30 000 fronts for each of the 3 heuristics. Each

fronts requiring up to 100 schedules.

Results are displayed in Fig. 8. We see that P-HEFT and G-HEFT find the optimal

result in one percent of the cases. The median ratio is respectively 1.149 and 1.451. P-

HEFT outperfoms G-HEFT in more than 44% of the cases and is outperformed in only

0.62% of the cases. P-HEFT (resp. G-HEFT) outperforms B-HEFT in more than 88%

(resp. 56%) of the cases. Hence, we see that our GPFA-based heuristic is better than the

other heuristics (the aggregation-based one performing particularly poorly). Last, we see

that our GPFA-based heuristic is not too far from the optimal since 75% of the cases have

a ratio lower than 1.47.

35

Page 36: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

xlim

OPT

xlim

0:1

98.89 % (>)1.02 % (=)

1.149[1.04,1.47]

xlim

0:1

98.91 % (>)0.999 % (=)

1.451[1.17,1.57]

xlim

0:1

99.99 % (>)1.665

[1.14,14.75]

log(ratio)x xlim

0:1 P.HEFT

xlim

0:1

44.40 % (>)54.98 % (=)

1.000[1.00,1.36]

xlim0:

1

88.55 % (>)1.265

[1.06,6.93]

log(ratio)x log(ratio)

Fre

quen

cy

x

Fn(

x)

xlim

0:1 G.HEFT

xlim

0:1

56.30 % (>)1.053

[0.902,6.87]

0.02 0.20 2.00 50.00

Fre

quen

cyF

n(x)

0.02 0.20 2.00 50.00

Fre

quen

cyF

n(x)

0.02 0.20 2.00 50.00

0:1 B.HEFT

Figure 8: Scatter plot of the ratio of the hypervolume indicator of the optimal front and the one found by

the 3 heuristics for the Chain case

36

Page 37: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

6. Conclusions

As larger and larger infrastructures are available to execute distributed applications,

reliability becomes a crucial issue. However, optimizing both the reliability and the length

of the schedule is not always possible as they are often conflicting objectives.

Here, we have studied the problem of scheduling tasks on heterogeneous platforms. We

have tackled two metrics: reliability and makespan. As these two objectives are unrelated

and sometimes contradictory, we need to investigate bi-objective approximation algorithms.

In the previous works of the literature, some heuristics have been proposed to solve similar

problems [3, 4, 5]. However, none of them discuss the fundamental properties of a good bi-

objective scheduling algorithm. In this paper, we have tackled important subproblems in

order to determine how to efficiently solve this problem.

We have shown that minimizing the reliability is a polynomial problem but optimizing

both the makespan and the reliability cannot be approximated. For the case of scheduling

independent unitary tasks, we have proposed an approximation algorithm that finds, among

the schedules that do not exceed a given makespan, the one with the best reliability. Based

on this algorithm we have derived a (1+ε,1) approximation algorithm of the Pareto-front.

For the case of independent non-unitary tasks and uniform processors, we have designed

the CMLT algorithm and proved that it is a⟨2, 1⟩-approximation. Finally, we derived a

(2 + ε, 1)-approximation of the Pareto-front of the problem.

The above results have highlighted the role of the {failure rate} × {unitary instruction

execution time} (λτ). For general precedence task graphs, based on the importance of this

product, we have shown that it is easy to extend most of the heuristics designed for optimizing

the makespan by taking into account the reliability. Experiments show that we outperform

the other heuristic of the literature both in terms of the front quality and for the resource

usage. Finally for a single chain with two nodes, we have proved that the Pareto-front can

be obtained in polynomial time.

37

Page 38: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

References

[1] R. Koo, S. Toueg, Checkpointing and rollback-recovery for distributed

systems, IEEE Transactions on Software Engineering 13 (1987) 23–31.

doi:http://doi.ieeecomputersociety.org/10.1109/TSE.1987.232562.

[2] A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. Cappello, MPICH-V: a Mul-

tiprotocol Fault Tolerant MPI, International Journal of High Performance Computing

and Applications 20 (3) (2006) 319–333.

[3] A. Dogan, F. Ozguner, Matching and Scheduling Algorithms for Minimizing Execution

Time and Failure Probability of Applications in Heterogeneous Computing, IEEE Trans.

Parallel Distrib. Syst. 13 (3) (2002) 308–323.

[4] A. Dogan, F. Ozguner, Bi-objective Scheduling Algorithms for Execution Time-

Reliability Trade-off in Heterogeneous Computing Systems, Comput. J. 48 (3) (2005)

300–314.

[5] M. Hakem, F. Butelle, A Bi-objective Algortithm for Scheduling Parallel Applications

on Heterogeneous Systems Subject to Failures, in: Renpar 17, 2006.

[6] R. L. Graham, E. L. Lawler, J. K. Lenstra, A. H. R. Kan, Optimization and approx-

imation in deterministic sequencing and scheduling : a survey, ann. Discrete Math. 5

(1979) 287–326.

[7] M. R. Garey, D. S. Johnson, Computers and Intractability, Freeman, San Francisco,

1979.

[8] T. Gonzalez, O. H. Ibarra, S. Sahni, Bounds for LPT schedules on uniform processors,

SIAM Journal of Computing 6 (1977) 155–166.

[9] D. S. Hochbaum, D. B. Shmoys, A polynomial approximation scheme for scheduling on

uniform processors: Using the dual approximation approach, SIAM Journal on Com-

puting 17 (3) (1988) 539 – 551.

38

Page 39: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

[10] C. Chekuri, M. A. Bender, An efficient approximation algorithm for minimizing

makespan on uniformly related machines., Journal of Algorithms 41 (2001) 212–224.

[11] R. Giroudeau, J. Konig, Multiprocessor Scheduling: Theory and Applications, ARS

publishing, 2007, Ch. Scheduling with Communication Delays.

[12] M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems, 3rd Edition, Springer

Publishing Company, Incorporated, 2008.

[13] A. Girault, E. Saule, D. Trystram, Reliability versus performance for critical applica-

tions, Journal of Parallel and Distributed Computing 69 (3) (2009) 326–336.

URL jpdc09-GST.pdf

[14] A. Benoit, L.-C. Canon, E. Jeannot, Y. Robert, Reliability of task graph schedules with

transient and fail-stop failures: complexity and algorithms, Journal of Scheduling.

[15] S. Shatz, J. Wang, Task allocation for maximizing reliability of distribued computer

systems, IEEE Transactions on Computers 41 (9) (1992) 1156–1169.

[16] A. Benoit, Y. Robert, A. Rosenberg, F. Vivien, Static worksharing strategies for hetero-

geneous computers with unrecoverable interruptions, Parallel Computing 37 (8) (2011)

365 – 378.

[17] D. B. Shmoys, E. Tardos, Scheduling unrelated machines with costs, in: Proceedings

of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 1993,

pp. 448–454.

[18] X. Besseron, S. Bouguerra, T. Gautier, E. Saule, D. Trystram, Fault tolerance and avail-

ability awarness in computational grids, Fundamentals of Grid Computing, Chapman

and Hall/CRC Press, 2009, Ch. 5.

[19] I. Sardina, C. Boeres, L. de A. Drummond, An efficient weighted bi-objective schedul-

ing algorithm for heterogeneous systems, in: H.-X. Lin, M. Alexander, M. Forsell,

39

Page 40: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

A. Knupfer, R. Prodan, L. Sousa, A. Streit (Eds.), Euro-Par 2009 – Parallel Process-

ing Workshops, Vol. 6043 of Lecture Notes in Computer Science, Springer Berlin /

Heidelberg, 2010, pp. 102–111.

[20] J. J. Dongarra, E. Jeannot, E. Saule, Z. Shi, Bi-objective scheduling algorithms for

optimizing makespan and reliability on heterogeneous systems, in: Proc. of SPAA,

2007, pp. 280–288.

[21] E. Jeannot, E. Saule, D. Trystram, Bi-Objective Approximation Scheme for Makespan

and Reliability Optimization on Uniform Parallel Machines, in: The 14th International

Euro-Par Conference on Parallel and Distributed Computing (Euro-Par 2008), Las Pal-

mas de Gran Canaria, Spain, 2008.

[22] J. Y.-T. Leung (Ed.), Handbook of Scheduling. Algorithms, Models and Performance

Analysis, Chapman & Hall/CRC, 2004.

[23] C. H. Papadimitriou, M. Yannakakis, On the approximability of trade-offs and optimal

access of web sources, in: Proc. of FOCS, 2000, pp. 86–92.

[24] A. Legrand, Y. Robert, Algorithmique Parallele, Dunod, 2005.

[25] H. Topcuoglu, S. Hariri, M.-Y. Wu, Task scheduling algorithms for heterogeneous pro-

cessors, 8th IEEE Heterogeneous Computing Workshop (HCW’99) (1999) 3–14.

[26] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, V. Grunert da Fonseca, Performance

Assessment of Multiobjective Optimizers: An Analysis and Review, IEEE Transactions

on Evolutionary Computation 7 (2) (2003) 117–132.

[27] S. Shatz, J.-P. Wang, M. Goto, Task allocation for maximizing reliability of dis-

tributed computer systems, Computers, IEEE Transactions on 41 (9) (1992) 1156 –1168.

doi:10.1109/12.165396.

40

Page 41: Optimizing Performance and Reliability on Heterogeneous ...esaule/public-website/papers/jpdc12-JST.pdfOptimizing Performance and Reliability on Heterogeneous Parallel Systems: Approximation

[28] H. Oh, S. Ha, A static scheduling heuristic for heterogeneous processors, in: L. Bouge,

P. Fraigniaud, A. Mignotte, Y. Robert (Eds.), Euro-Par, Vol. II, Vol. 1124 of Lecture

Notes in Computer Science, Springer, 1996, pp. 573–577.

[29] M. Maheswaran, H. J. Siegel, A dynamic matching and scheduling algorithm for hetero-

geneous computing systems, in: Heterogeneous Computing Workshop, 1998, pp. 57–69.

URL http://computer.org/proceedings/hcw/8365/83650057abs.htm

[30] G. Sih, E. Lee, A compile-time scheduling heuristic for interconnection-constrained het-

erogenous processor architectures, IEEE Transactions on Parallel and Distributed Sys-

tems 4 (2).

[31] L.-C. Canon, E. Jeannot, Evaluation and Optimization of the Robustness of DAG Sched-

ules in Heterogeneous Environments, IEEE Transactions on Parallel and Distributed

Systems 21 (4) (2010) 532–546.

[32] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 2nd

Edition, The MIT Press, 2001.

41