Energy-Efficient Fault-Tolerant Scheduling of Reliable...

15
Energy-Efficient Fault-Tolerant Scheduling of Reliable Parallel Applications on Heterogeneous Distributed Embedded Systems Guoqi Xie , Member, IEEE, Yuekun Chen , Xiongren Xiao, Cheng Xu, Renfa Li , Senior Member, IEEE, and Keqin Li , Fellow, IEEE Abstract—Dynamic voltage and frequency scaling (DVFS) is a well-known energy consumption optimization technique in embedded systems and dynamically scaling down the voltage of a chip has been developed to achieve energy-efficient optimization. However, this operation may lead to a sharp rise in transient failures of processors and consequently weaken the reliability of systems. Reliability goal is an important functional safety requirement and must be satisfied for safety-critical applications. In this study, we aim to implement energy-efficient fault-tolerant scheduling for a reliable parallel application on heterogeneous distributed embedded systems, where the parallel application is described by a directed acyclic graph (DAG). An energy-efficient scheduling with a reliability goal (ESRG) algorithm is presented to reduce the energy consumption while satisfying the reliability goal for the parallel application. Considering that the application’s reliability goal is unreachable if its reliability goal exceeds a certain threshold via ESRG, we further propose an energy-efficient fault-tolerant scheduling with a reliability goal (EFSRG) algorithm to reduce the energy consumption while satisfying the reliability goal based on an active replication scheme. Experimental results confirm that the energy consumption reduced by the proposed EFSRG algorithm is higher than those reduced by other approaches under different scale conditions. Index Terms—Directed acyclic graph (DAG), dynamic voltage and frequency scaling (DVFS), energy-efficient, fault-tolerant scheduling, heterogeneous distributed embedded systems Ç 1 INTRODUCTION 1.1 Background E NERGY consumption management is crucial in embed- ded system design because energy dissipation affects the development and use of the system itself and the living environment of people. For energy conservation and envi- ronmental protection, various adaptive management techni- ques have been established to maximize energy efficiency. An well-known energy consumption optimization technique is dynamic voltage and frequency scaling, which is also called dynamic frequency scaling (DFS), dynamic speed scal- ing (DSS), and dynamic power scaling (DPS), achieves energy-efficient scheduling by simultaneously scaling down the supply voltage and frequency of a processor when tasks are running [1], [2], [3], [4], [5], [6], [7], [8]. Currently, main- stream manufacturers, such as Intel, ARM, and AMD, pro- vide processors that support DVFS technologies, including enhanced Intel SpeedStep for Intel [9], PowerNow for AMD [10], and intelligent energy manager and adaptive voltage scaling for ARM [11]. Although dynamically scaling down the voltage of a chip has been developed to achieve energy-efficient optimization, this operation may lead to a sharp rise in transient failures of processors and consequently weaken the reliability of sys- tems [2], [12], [13], [14], [15], [16]. Reliability is defined as the probability of the success of a schedule, that is, a schedule succeeds to complete its execution [17], [18], [19]. Reliability should be concerned for a safety-critical application; other- wise, a high possibility of risk and disastrous consequences may occur [17]. Fault-tolerance by primary-backup replica- tion, which indicates that a primary task includes zero, one, or multiple backup tasks, is an important reliability enhance- ment mechanism. In general, primary and backups are col- lectively referred to as replicas. Although replication-based fault-tolerance is an important reliability enhancement mechanism [18], [19], [20], [21], [22], any application cannot be 100 percent reliable in practice. If an application satisfies its certified reliability goal (also called reliability require- ment, reliability assurance, and reliability constraint in some studies), then it is considered reliable learned from func- tional safety standards, such as ISO 26,262 for automotive systems, DO-178B for avionics systems, and IEC 61,508 for all kinds of industrial software systems [17], [23]. For exam- ple, ISO 26,262 specifies different exposure levels (Table B.2, Annex B of Part 3) [24]. Exposure means the relative expected frequency of the operational conditions in which the injury can possibly happen. For instance, the reliability goals 0.99, >0.9, and <=0.9 respectively correspond to low-, medium-, and high-risk probabilities learned from ISO 26,262 [24]. Hence, if the reliability goal of an application is 0.9, then the G. Xie, Y. Chen, X. Xiao, C. Xu, and R. Li are with the College of Computer Science and Electronic Engineering, Hunan University, Key Laboratory for Embedded and Network Computing of Hunan Province, Hunan 410082, China. E-mail: {xgqman, xxr, chengxu, lirenfa}@hnu.edu.cn, [email protected]. K. Li is with the College of Computer Science and Electronic Engineering, Hunan University, Key Laboratory for Embedded and Network Computing of Hunan Province, Hunan 410082, China, and the Department of Computer Science, State University of New York, New Paltz, NY 12561. E-mail: [email protected]. Manuscript received 31 Jan. 2017; revised 17 Apr. 2017; accepted 31 May 2017. Date of publication 2 June 2017; date of current version 6 Sept. 2018. (Corresponding author: Xiongren Xiao.) Recommended for acceptance by D. Zhu, M. Shafique, M. Lin, and S. Pasricha. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TSUSC.2017.2711362 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018 167 2377-3782 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of Energy-Efficient Fault-Tolerant Scheduling of Reliable...

  • Energy-Efficient Fault-Tolerant Scheduling ofReliable Parallel Applications on Heterogeneous

    Distributed Embedded SystemsGuoqi Xie ,Member, IEEE, Yuekun Chen , Xiongren Xiao, Cheng Xu,

    Renfa Li , Senior Member, IEEE, and Keqin Li , Fellow, IEEE

    Abstract—Dynamic voltage and frequency scaling (DVFS) is a well-known energy consumption optimization technique in embedded

    systems and dynamically scaling down the voltage of a chip has been developed to achieve energy-efficient optimization. However,

    this operation may lead to a sharp rise in transient failures of processors and consequently weaken the reliability of systems. Reliability

    goal is an important functional safety requirement and must be satisfied for safety-critical applications. In this study, we aim to

    implement energy-efficient fault-tolerant scheduling for a reliable parallel application on heterogeneous distributed embedded systems,

    where the parallel application is described by a directed acyclic graph (DAG). An energy-efficient scheduling with a reliability goal

    (ESRG) algorithm is presented to reduce the energy consumption while satisfying the reliability goal for the parallel application.

    Considering that the application’s reliability goal is unreachable if its reliability goal exceeds a certain threshold via ESRG, we further

    propose an energy-efficient fault-tolerant scheduling with a reliability goal (EFSRG) algorithm to reduce the energy consumption while

    satisfying the reliability goal based on an active replication scheme. Experimental results confirm that the energy consumption reduced

    by the proposed EFSRG algorithm is higher than those reduced by other approaches under different scale conditions.

    Index Terms—Directed acyclic graph (DAG), dynamic voltage and frequency scaling (DVFS), energy-efficient, fault-tolerant scheduling,

    heterogeneous distributed embedded systems

    Ç

    1 INTRODUCTION

    1.1 Background

    ENERGY consumption management is crucial in embed-ded system design because energy dissipation affects thedevelopment and use of the system itself and the livingenvironment of people. For energy conservation and envi-ronmental protection, various adaptive management techni-ques have been established to maximize energy efficiency.Anwell-known energy consumption optimization techniqueis dynamic voltage and frequency scaling, which is alsocalled dynamic frequency scaling (DFS), dynamic speed scal-ing (DSS), and dynamic power scaling (DPS), achievesenergy-efficient scheduling by simultaneously scaling downthe supply voltage and frequency of a processor when tasksare running [1], [2], [3], [4], [5], [6], [7], [8]. Currently, main-stream manufacturers, such as Intel, ARM, and AMD, pro-vide processors that support DVFS technologies, including

    enhanced Intel SpeedStep for Intel [9], PowerNow for AMD[10], and intelligent energy manager and adaptive voltagescaling for ARM [11].

    Although dynamically scaling down the voltage of a chiphas been developed to achieve energy-efficient optimization,this operation may lead to a sharp rise in transient failures ofprocessors and consequently weaken the reliability of sys-tems [2], [12], [13], [14], [15], [16]. Reliability is defined as theprobability of the success of a schedule, that is, a schedulesucceeds to complete its execution [17], [18], [19]. Reliabilityshould be concerned for a safety-critical application; other-wise, a high possibility of risk and disastrous consequencesmay occur [17]. Fault-tolerance by primary-backup replica-tion, which indicates that a primary task includes zero, one,or multiple backup tasks, is an important reliability enhance-ment mechanism. In general, primary and backups are col-lectively referred to as replicas. Although replication-basedfault-tolerance is an important reliability enhancementmechanism [18], [19], [20], [21], [22], any application cannotbe 100 percent reliable in practice. If an application satisfiesits certified reliability goal (also called reliability require-ment, reliability assurance, and reliability constraint in somestudies), then it is considered reliable learned from func-tional safety standards, such as ISO 26,262 for automotivesystems, DO-178B for avionics systems, and IEC 61,508 forall kinds of industrial software systems [17], [23]. For exam-ple, ISO 26,262 specifies different exposure levels (Table B.2,Annex B of Part 3) [24]. Exposuremeans the relative expectedfrequency of the operational conditions in which the injurycan possibly happen. For instance, the reliability goals 0.99,>0.9, and

  • application becomes reliable only when its actual reliabilityexceeds 0.9. Consequently, fault-tolerance can be employedto satisfy the reliability goal.

    1.2 MotivationInterdependencies between reliability and power should beconsidered. On the one hand, reliability goal can be satisfiedthrough fault-tolerance to guarantee that the risk is controlledat an acceptable level for a safety-critical application. On theother hand, fault-tolerance does not come for free and gener-ally involves power/energy overheads because power/energyis a first-class system resource. In response to existing practices,low-power dependable computing should be necessary.

    Energy-efficient scheduling with a reliability goal for anapplication with independent tasks have been studied verywell [1], [2], [25]. However, applications in systems areincreasingly parallel, and tasks in an application exhibit evi-dent data dependencies and precedence constraints [7],[12], [13], [26], [27], [28], [29]. Examples of parallel applica-tions are Gaussian elimination and fast Fourier transform[26]. A parallel application with precedence constrainedtasks at a high level is described by a directed acyclic graph[26], [27], [29], [30]. In a DAG-based parallel application,nodes represent tasks, and edges represent communicationmessages between tasks. In [13], a shared recovery-basedfrequency assignment technique is proposed to reduceenergy consumption with a reliability goal for a parallelapplication on a uniprocessor system. However, multiproc-essors have been used in high-performance embedded sys-tems, such as image recognition, automotive control, andhuman body interaction plus gesture control [29]. Reliabilityand energy management methods have been studied inhomogeneous multiprocessor, manycore, and mutlicoresystems [31], [32], [33]. Particularly, heterogeneous distrib-uted embedded systems have emerged to satisfy additionalfunctional and non-functional requirements [7], [34].

    1.3 Our ContributionsA development life cycle of safety-critical systems usuallycontains analysis, design, implementation, and test phases.In this study, we aim to reduce the energy consumption of areliable parallel application on heterogeneous distributedembedded systems during the design phase [17]. Our con-tributions are summarized as follows:

    (1) We present an energy-efficient scheduling with a reli-ability goal algorithm to reduce energy consumptionwhile satisfying the reliability goal of a DAG-basedparallel application on heterogeneous embedded sys-tems without using fault-tolerance. The problem issolved by dividing it into three sub-problems: priori-tizing tasks, satisfying reliability goal, and reducingenergy consumption.

    (2) Considering that the reliability goal of an applicationis unreachable if the reliability goal exceeds a certainthreshold via ESRG, we further present an energy-efficient fault-tolerant scheduling with a reliabilitygoal algorithm to reduce energy consumption whilesatisfying the reliability goal of a DAG-based parallelapplication on heterogeneous embedded systemsbased on an active replication scheme. The problemis also solved by dividing it into three sub-problems:prioritizing tasks, satisfying reliability goal, andreducing energy consumption.

    (3) Experiments on real parallel applications, includingfast Fourier transform and Gaussian elimination, areconducted in different scales. Experimental results con-firm that the energy consumption reduced by the pro-posed EFSRG algorithm is higher than those reducedby other approaches under different scale conditions.

    The rest of this paper is organized as follows. Section 2reviews related research. Section 3 presents related modelsand problem statement. Sections 4 and 5 propose the ESRGand EFSRG algorithms, respectively. Section 6 verifies theESRG and EFSRG algorithms. Section 7 concludes this study.

    2 RELATED WORK

    This study mostly reviews recent related research on energyconsumption, reliability, and their relationship with a DAG-based parallel application.

    DVFS-based energy-efficient design techniques have beenused for parallel applications with precedence-constrainedtasks. Zong et al. [35] considered energy-aware duplicationscheduling algorithms for a parallel application on homoge-neous systems. Lee and Zomaya [36] presented energy-con-scious scheduling to implement joint reduction between theschedule length and energy consumption of a parallel applica-tion on heterogeneous systems. Li [4], [5], [6] studied the prob-lems of reducing the schedule lengthwith energy consumptionconstraint and reducing energy consumption with schedulelength constraint for an application with precedence-con-strained sequential tasks [4] and precedence constrained paral-lel tasks (i.e., a parallel application) [5], [6] on homogeneoussystems.We [7], [8] explored the problem of reducing schedulelengthwith energy consumption constraint for a parallel appli-cation on heterogeneous systems.Huang et al. [37] investigatedthe problem of reducing energy consumption with a schedulelength constraint for a parallel application on heterogeneoussystems by reclaiming the slack time for each task on its fixedassigned processor [37]. Tang et al. [38] also examined thesame problem as that reported in [37] by switching off ineffi-cient processors to reduce static energy consumption based onslack time reclamation. Li [3] summarized the algorithms, anal-ysis, and performance evaluation of energy-efficient taskscheduling onmultiple heterogeneous systems.

    According to ISO 26,262, random hardware failures (i.e.,transient failures) occur unpredictably during the life of ahardware element, but these failures follow a probability dis-tribution [24]. Shatz and Wang [39] described a widelyaccepted reliability model and demonstrated that transientfailures of each hardware are characterized by a constant fail-ure rate per time unit �. The reliability during the interval oftime t is e��t. The occurrence of failure follows a constant-parameter Poisson law [2], [12], [13], [17], [18], [19]. Reliabil-ity-aware design techniques and algorithms usually aim toreduce certain objectives while simultaneously satisfying thereliability goal. Higher reliability can result in a longerschedule length or larger energy consumption of a parallelapplication, and the problem of optimizing schedule length(or energy consumption) and reliability is considered a typi-cal bi-criteria optima or Pareto optima problem [20], [21],[22], [40], [41]. We [17] solved the problem on resource con-sumption cost reduction of a reliable parallel applicationon heterogeneous distributed embedded systems withoutusing fault-tolerance. Yi et al. [42] presented the DAG_Heualgorithm to reduce the resource cost of a parallel appli-cation with a timing constraint and a reliability goal on

    168 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • heterogeneous systems. Zhao et al. presented MaxRe [18]and RR [19] algorithms to reduce resource consumptionwhile satisfying the reliability goal of a parallel applicationon heterogeneous systems. However, [17], [18], [19], [42]only aim to reduce resource consumption cost, which refersto the resource usage of processors when tasks are running.

    Energy consumption is closely related to reliability. Zhaoet al. [12] established a relationship model between energyconsumption and reliability and solved the problem of maxi-mizing the reliability of a parallel application with deadlineand energy consumption constraints on a uniprocessor. Onthe basis of [12], Zhao et al. further solved the problem ofreducing the energy consumption of a parallel applicationwith deadline constraint and reliability preservation on a uni-processor [13]. In [31], Guo et al. proposed reliability-awarepower management schemes to save energy while guarantee-ing a certain level of system reliability on homogeneousmulti-processors. In [32], Salehi et al. proposed a power-efficientreliability management method through dynamic redun-dancy and voltage scaling under variations on homogeneousmanycore processors. In [33], Salehi et al. proposed anN-mod-ular redundancy (NMR) technique to achieve high reliabilitywith low energy overhead for hard real-time applications onhomogeneous multicore processors. Considering that thisstudy focuses on heterogeneous distributed embedded sys-tems, where the communication between tasksmapped to dif-ferent processors is performed through message passing overthe bus, some works have investigated the reliability andenergy management on these platforms. In [14], Zhang et al.analyzed the problem of maximizing the reliability of a paral-lel application with an energy consumption constraint on het-erogeneous systems. In [15], Zhang et al. further assessed thejoint optimization between the energy consumption and reli-ability of a parallel application with a deadline constraint onheterogeneous systems. In [16], Zhang et al. also investigatedthe bi-objective optimization between the energy consump-tion and reliability of a parallel application in heterogeneous

    systems. All the above works do not use fault-tolerance. In[43], Tang et al. proposed a heuristic reliability-energy awarescheduling algorithm to get good tradeoff among perfor-mance, reliability, and energy consumption with lower com-plexity on heterogeneous distributed systems. However, theaforementionedworks either focus onmaximizing the reliabil-ity of a parallel application with an energy consumption con-straint, or make the tradeoff between reliability and energyconsumption without using fault-tolerance, whereas thisstudy focuses on reducing energy consumptionwith a reliabil-ity goal using fault-tolerance.

    3 MODELS

    Table 1 gives the important notations and their definitionsused in this study.

    3.1 Application ModelWe consider a distributed architecture where several pro-cessors are mounted on the same controller area network(CAN) bus, as shown in Fig. 1 [17], [34]. Each processorcontains a central processing unit (CPU), random-accessmemory (RAM) and non-volatile memory, and a networkinterface card. A task executed completely in one processorsends messages to all its successor tasks, which may belocated in different processors. For example, task n1 isexecuted on processor u1. It then sends a message m1;2 toits successor task n2 located in u6 (see Fig. 1). Let U ¼fu1; u2; . . . ; ujUjg represent a set of heterogeneous process-ors, where jUj represents the size of set U . For any set X,this study uses jXj to denote its size.

    A parallel application running on processors is repre-sented by a DAG G=ðN ,W ,M, C) [7], [8], [17], [36].(1) N represents a set of nodes inG, and each nodeni 2 N

    represents a task. predðniÞ represents the set of theimmediate predecessor tasks of ni. succðniÞ representsthe set of the immediate successor tasks of ni. The taskwhich has no predecessor task is denoted as nentry; andthe task which has no successor task is denoted asnexit. If a function has multiple nentry or multiple nexittasks, then a dummy entry or exit task with zero-weight dependencies is added into the graph.

    (2) W is an jNj � jU j matrix, where wi;k denotes theworst case execution time (WCET) of ni running onuk with the maximum frequency. Each task ni 2 Nhas different WCET values on different processorsdue to the heterogeneity of processors. The WCET ofa task is the maximum execution time among all pos-sible real execution time values when the task is exe-cuted on a specific processor with the maximumfrequency. All the WCETs of the tasks are knownand determined through WCET analysis methodduring the analysis phase [17].

    (3) The communication between tasks mapped to differ-ent processors is performed through message passing

    TABLE 1Important Notations in This Study

    Notation Definition

    ci;j WCRT between the tasks ni and njwi;k WCET of the task ni running on the processor

    uk with the maximum frequencyfk;low Lowest energy-efficient frequency of the

    processor uk< uk; fk;v > Processor and frequency combinationnsðjÞ jth assigned task of the applicationEdðni; uk; fk;vÞ Dynamic energy consumption of the task ni

    on the processor uk with the frequency fk;vRðGÞ Actual reliability of the applicationGSLðGÞ Actual schedule length of the applicationG�k;v Failure rate of the processor uk with the

    frequency fk;vRðni; uk; fk;vÞ Reliability of the task ni executed on the

    processor uk with the frequency fk;vRðniÞ Actual reliability of the task niuprðnb

    iÞ Assigned processor of the replica n

    bi

    fprðnb

    iÞ;hzðnb

    iÞ Assigned frequency of the replica n

    bi on the

    processor uprðnb

    RminðniÞ Minimum reliability value of the task niRmaxðniÞ Maximum reliability value of the task niRminðGÞ Minimum reliability value of the parallel GRmaxðGÞ Maximum reliability value of the

    applicationGRgoalðGÞ Reliability goal of the applicationGEgoalðniÞ Reliability goal of the task ni

    Fig. 1. Distributed embedded system platform.

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 169

  • over the bus. M is a set of communication edges, andeach edge mi;j 2M represents the communicationmessage from ni to nj. Accordingly, ci;j 2 C representsthe worst case response time (WCRT) of mi;j if ni andnj are not assigned to the same processor. The WCRTof amessage is themaximum response time among allpossible real response time values when the messageis transmitted on a specific hardware platform. If niand nj are assigned to the same processor, then thecommunication time is 0. All the WCRTs of the mes-sages are also known and determined through WCRTanalysismethod during the analysis phase [17].

    Fig. 2 shows an example of a DAG-based parallel appli-cation. The example consists of 10 tasks executed on threeprocessors fu1; u2; u3g. The weight 18 of the edge (Fig. 2)between n1 and n2 represents the communication timedenoted as c1;2 if n1 and n2 are not assigned to the same pro-cessor. Table 2 is a matrix of the WCETs with respect to themaximum frequency in Fig. 2. The weight 14 of n1 and u1 inTable 2 represents the WCET denoted by w1;1 = 14. Thesame task has different WCETs on different processorsbecause of the heterogeneity of the processors.

    3.2 Power and Energy ModelsGiven that the relationship between voltage and frequencyis almost linear, voltage and frequency are scaled down byDVFS to save energy. Similar to [2], [7], [8], [12], [13], ourstudy employs the term frequency change to denote thesimultaneous change in voltage and frequency. For theDVFS-capable system, a system-level power model widelyused in [2], [7], [8], [12], [13] is also utilized, where thepower at frequency f is expressed as follows:

    P ðfÞ ¼ Ps þ gðPind þ PdÞ ¼ Ps þ gðPind þ CeffmÞ:Ps represents the static power and can be removed only bypowering off the entire system. Pind represents the fre-quency-independent dynamic power and can be removedby switching the system into a sleep state. Pd represents thefrequency-dependent dynamic power. g represents the sys-tem states and indicates whether dynamic powers are cur-rently being consumed in the system. When the system isactive, g = 1; otherwise, g = 0. Cef represents the effectiveswitching capacitance, and m represents the dynamicpower exponent, which should be no smaller than 2. BothCef andm are processor-dependent constants.

    An excessive overhead is associated with the turningon/off of a system, and Ps is consumed and unmanageable

    [2], [7], [8], [12], [13]. Similar to previous studies, the presentstudy focuses on managing the dynamic power (i.e., Pindand Pd). In view of the Pind, a reduced Pd does not alwaysreduce energy consumption. Therefore, the minimumenergy-efficient frequency fee is observed [2], [7], [8], [12],[13] and expressed as follows:

    fee ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

    Pindðm� 1ÞCef

    m

    s: (1)

    If the frequency of a processor is assumed to vary from aminimum available frequency fmin to the maximum fre-quency fmax, then the lowest energy-efficient frequency toexecute a task should be

    flow ¼ maxðfmin; feeÞ: (2)

    Therefore, any of the actual effective frequencies fhshould belong to the scope of flow4fh4fmax.

    Then, let Edðni; uk; fk;vÞ represent the processor-gener-ated dynamic energy consumption of the task ni on the pro-cessor uk with the frequency fk;v. This expression can becalculated as follows:

    fP1;ind; P2;ind; . . . ; PjUj;indg;frequency-dependent dynamic power set

    fP1;d; P2;d; . . . ; PjUj;dg;effective switching capacitance set

    fC1;ef ; C2;ef ; . . . ; CjUj;efg;dynamic power exponent set

    fm1;m2; . . . ;mjUjg;lowest energy-efficient frequency set

    ff1;low; f2;low; . . . ; fjU j;lowg;

    and actual effective frequency set

    ff1;low; f1;a; f1;b; . . . ; f1;maxg;ff2;low; f2;a; f2;b; . . . ; f2;maxg;...

    ffjUj;low; fjU j;a; fjU j;b; . . . ; fjU j;maxg

    9>>>=>>>;:

    8>>><>>>:

    Fig. 2. Motivating example of a DAG-based parallel application with10 tasks [7], [8], [17], [26], [27], [30].

    TABLE 2WCETs of Tasks on Different Processors with theMaximum Frequency of the Parallel Application in

    Fig. 2 [7], [8], [17], [26], [27], [30]

    Task u1 u2 u3

    n1 14 16 9n2 13 19 18n3 11 13 19n4 13 8 17n5 12 13 10n6 13 16 9n7 7 15 11n8 5 11 14n9 18 12 20n10 21 7 16

    170 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • Then, let Edðni; uk; fk;vÞ represent the processor-gener-ated dynamic energy of task ni on processor uk with fre-quency fk;v. This expression can be calculated as

    Edðni; uk; fk;vÞ ¼ Pk;ind þ Ck;ef � ðfk;vÞmk� �� wi;k � fk;max

    fk;v: (3)

    In this study, the overheads of the frequency transitionscaused by the negligible amount of time (e.g., 10 ms-150 ms[36]) are disregarded.

    3.3 Reliability and Fault-Tolerance ModelsFor a non-DVFS-enable system, Shatz and Wang first pro-posed that the reliability probability of a processor is subjectto Poisson distribution [39]. This idea has been widelyaccepted in previous studies [1], [12], [13], [14], [15], [39]. �krepresents the failure rate per time unit of the processor uk.The reliability of ni executed on uk in its WCET is calculatedby using the following expression

    R ni; ukð Þ ¼ e��kwi;k : (4)

    In a DVFS-capable system, different frequencies yieldvarious failure rates according to relevant research summa-ries [12], [13], [14], [15]. Hence, we let �k;max represent thefailure rate of the processor uk with the maximum fre-quency, then the failure rate �k;v of the uk with the frequencyfk;v is calculated as follows:

    �k;v ¼ �k;max � 10dðfk;max�fk;vÞfk;max�fk;min ; (5)

    where d is a constant, which represents the sensitivity offailure rates to voltage scaling.

    We then build the relationship between task reliabilityand frequency according to Eqs. (4) and (5), that is, the reli-ability of the task ni executed on the processor uk with thefrequency fk;v is calculated as follows:

    Rðni; uk; fk;vÞ ¼ e��k;v�wi;k�fk;max

    fk;v

    ¼ e��k;max�10dðfk;max�fk;vÞfk;max�fk;min�wi;k�fk;max

    fk;v :

    (6)

    In Eq. (6), the relationship between reliability and fre-quency is constantly increasing on the same processor.Therefore, dynamically scaling down the voltage and fre-quency to reduce dynamic energy consumption can resultin low reliability.

    Two main types of primary-backup replication schemesexist: passive replication [44], [45], [46] and active replication[18], [19], [21], [22]. For the passive scheme, a task is resched-uled whenever a processor fails to proceed on a backup pro-cessor. The system is subsequently restarted when aprocessor crashes to continue from the checkpoint just as ifno failure had occurred. This scheme is called checkpointand restart scheme and thus considered an improved versionof the passive scheme [19]. For the active replication scheme,each task is simultaneously replicated on several processors,and a task succeeds if at least one of the replicas is completed.This study uses the active replication scheme because it candirectly shield the failed tasks in the process, and failurerecovery time is almost close to zero [20], [21], [22].

    We define numi (numi4jU j) as the number of replicas of nibased on the active replication scheme. Considering that

    checkpoint and restart scheme is a passive scheme, we cannotassign two replicas of the same task to the same processor inthe active replication. Hence, the replica set of ni is fn1i ; . . . ;nbi ; . . . ; n

    numii g, where n1i is the primary replica and other tasks

    are backup replicas. As long as one replica of ni is successfullycompleted, the occurrence of failure is not observed in ni, andthe reliability of ni is updated to the following:

    R nið Þ ¼ 1�Ynumib¼1

    1�R nbi ; uprðnbiÞ; fprðnb

    iÞ;hzðnb

    � �� �; (7)

    where uprðnb

    iÞ and fprðnb

    iÞ;hzðnb

    iÞ respectively represent the ass-

    igned processor and frequency, respectively, of the replicanbi .

    Considering that the reliability of the application is theproduct of the reliability values of all tasks [12], [13], [14], [15],we denote the reliability value of an application as follows:

    RðGÞ ¼Yni2N

    R nið Þ: (8)

    The dynamic energy consumption of the task ni is thesum of all the dynamic energy consumptions of its replicas

    EdðniÞ ¼Xnumix¼1

    Ed nbi ; uprðnb

    iÞ; fprðnb

    iÞ;hzðnb

    � �: (9)

    Then, the dynamic energy of the application is the sum ofthe energy consumptions of all the tasks

    EdðGÞ ¼XjNji¼1

    EdðniÞ: (10)

    Let EsðGÞ represent the processor-generated static energyconsumption of the applicationG and is calculated by

    Es Gð Þ ¼XjU jk¼1

    Pk;s � SLðGÞ� �

    ; (11)

    where SLðGÞ represents the generated schedule length ofthe application G. In other words, static energy consump-tion is always present and is directly related to the schedulelength of the application.

    Considering that the application’s total energy consump-tion EtotalðGÞ is the sum of its static energy consumptionEsðGÞ and dynamic energy consumption EdðGÞ, EtotalðGÞ iscalculated by

    EtotalðGÞ ¼ EdðGÞ þ EsðGÞ: (12)

    3.4 Reliability GoalThe minimum and maximum reliability values can beobtained by traversing all the processors, and these valuesare respectively calculated using the following equations

    RminðniÞ ¼ minuk2U

    Rðni; uk; fk;lowÞ; (13)

    and

    Rmax nið Þ ¼ 1�Yuk2U

    1�R ni; ukÞ; fk;max� �� �

    ; (14)

    respectively.Considering that the reliability of the application G is the

    product of the reliability values of all the tasks, (Eq. (8)), wecalculate the minimum and maximum reliability values ofG by using the following equations

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 171

  • RminðGÞ ¼Yni2N

    Rmin nið Þ; (15)and

    RmaxðGÞ ¼Yni2N

    Rmax nið Þ: (16)

    If the reliability goalRgoalðGÞ can be satisfied, then the applica-tion is reliable. Note that RgoalðGÞ should be larger than orequal to RminðGÞ; otherwise, RgoalðGÞ is always satisfied.RgoalðGÞ should be less than or equal to RmaxðGÞ; otherwise,RgoalðGÞ cannot always be satisfied. Hence, this study assumesthatRgoalðGÞ belongs to the scopeRminðGÞ andRmaxðGÞ:

    RminðGÞ4RgoalðGÞ4RmaxðGÞ: (17)

    3.5 Problem DescriptionThe problem addressed in this study can be formallydescribed as follows. We assume that we are given a parallelapplication G and a heterogeneous processor set U that sup-port different frequency levels. The problem involvesassigning replicas to processor and frequency combinationsfor each task while reducing the total energy consumptionand ensuring that the obtained reliability of the applicationRðGÞ is larger than or equal to its reliability goal RgoalðGÞ.The objective is to determine the processor and frequencycombinations of all tasks to reduce

    EtotalðGÞ ¼ EdðGÞ þ EsðGÞ;subject to the constraint

    RðGÞ ¼Yni2N

    R nið Þ5RgoalðGÞ:

    The problem of mapping tasks to multiprocessors is NP-hard [47]. Therefore, we apply heuristic list scheduling tosolve the subject problem. List scheduling is a well-knownmethod for a DAG-based parallel application [26], [27], [34],and thismethod includes two phases. The first phase arrangestasks based on the descending order of priorities (prioritizingtasks), and the second phase allocates each task to the appro-priate processor (allocating tasks). In this study, allocatingtasks is divided into two sub-problems: satisfying the reliabil-ity goal and reducing the dynamic energy consumption.

    4 ENERGY-EFFICIENT SCHEDULING WITHRELIABILITY GOAL

    Considering the state-of-the-art studies do not use fault-tol-erance to implement joint and bi-objective optimizationsbetween dynamic energy consumption and reliability of aparallel application [14], [15], [16], this section first presentsthe energy-efficient scheduling with reliability goal notinvolving fault-tolerance for easy understanding.

    4.1 Prioritizing TasksIn the problem description section, the first step is prioritiz-ing tasks. Prioritizing tasks problem is an important prob-lem for DAG list scheduling on heterogeneous distributedsystems. There are some typical prioritizing task schemes,such as upward rank value [26], optimistic cost table (OCT)[48], and heterogeneous selection value (HSV) [27]. There isno proof that a prioritizing task scheme is better than anyother schemes. Ordering tasks according to the descendingorder of upward rank value (ranku) of tasks (Eq. (18)) is

    considered as the de facto prioritizing task criterion forDAG list scheduling on heterogeneous distributed systems,because it has been widely used in energy-efficient schedul-ing [7], [8], [36], [37], [38], [49] and reliability-aware schedul-ing [17], [18], [19]. Considering that this study focuses onenergy-efficient fault-tolerant scheduling, we also use theupward rank value as the prioritizing task criterion in thisstudy. Thus, the tasks are arranged in descending order ofranku, which is obtained by using Eq. (18), as follows:

    rankuðniÞ ¼ wi þ maxnj2succðniÞ

    fci;j þ rankuðnjÞg; (18)

    where wi is the average WCET of the task ni with the maxi-mum frequencies and is calculated as follows:

    wi ¼XjUjk¼1

    wi;k

    !=jU j:

    Table 3 shows the upward rank values of all the tasks inFig. 2. If all the predecessors of ni have been assigned, ni isprepared to be assigned. Two tasks ni and nj are assumedto satisfy rankuðniÞ > rankuðnjÞ. If no precedence con-straint exists between ni and nj, ni does not necessarily takeprecedence for nj to be assigned. Therefore, the assignmentorder of the tasks in G is fn1; n3; n4; n2; n5; n6; n9; n7; n8; n10g.

    Two types of fault-tolerant scheduling exist for an end-to-end distributed function, namely, the strict schedule andthe general schedule [50]. In the strict schedule, each taskshould wait for the completion of all the replicas of its pred-ecessors before starting its execution. In the general sched-ule, the execution of each task can start as soon as onereplica of each predecessor has successfully completed. Inother words, the strict schedule is equivalent to a compile-time scheduling, whereas the general schedule is equivalentto a run-time scheduling. In this study, we discuss the strictschedule because we focus on the design phase of life cycle.

    We can calculate the actual start time (AST) and actual fin-ish time (AFT) of each task according to the task assignmentsusing the proposed algorithms. Given that the strict scheduleis used, the AST andAFT of nbi are calculated as follows:

    AST ðnbentry; uprðentryÞÞ ¼ 0

    AST ðnbi ; uprðnbiÞÞ ¼ max

    avail½k�;maxnh2predðniÞ;a2½1;numh � AFT ðnahÞ þ c

    0h;i

    n o( ) ;8><>:

    (19)

    and

    AFT�nbi ; uprðnb

    �¼ AST

    �ni; uprðnb

    �þ w

    i;prðnbiÞ �

    fprðnb

    iÞ;max

    fk;hzðnb

    iÞ:

    (20)

    avail½k� is the earliest available time when ECU uk is readyfor task execution. c

    0h;i represents the WCRT of message

    between nah and nbi . If n

    ah and n

    bi are allocated to the same

    ECU, then c0h;i ¼ 0; otherwise, c

    0h;i ¼ ch;i.

    TABLE 3Upward Rank Values for Tasks of the Motivating

    Parallel Application

    Task n1 n2 n3 n4 n5 n6 n7 n8 n9 n10

    rankuðniÞ 108 77 80 80 69 63.3 42.7 35.7 44.3 14.7

    172 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • Then, the actual schedule length of the application is theAFT of the replica of the exit task nexit; this replica has themaximum AFT among all replicas of nexit. That is, we have

    SLðGÞ ¼ maxb2½1;numexit�

    AFT�nbexit; uprðnb

    exitÞ

    : (21)

    4.2 Satisfying Reliability GoalThe second task is to obtain the reliability goal of each taskto implement heuristic scheduling. The reliability value ofan application is the product of the reliability value of each

    task. We letffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiRgoalðGÞjNj

    pbe the upper bound on the reliabil-

    ity goal of the task, namely,

    Rup goalðniÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiRgoalðGÞjNj

    q: (22)

    Hence, if the reliability values of all tasks exceed the upper

    bound on the reliability goal of a given task, then the reliabil-

    ity value of the applicationmust exceed its reliability goal.

    Then, we apply the following heuristic strategy: assumethat the task to be assigned is nsðyÞ, where nsðyÞ representsthe jth assigned task, fnsð1Þ; nsð2Þ; . . . ; nsðy�1Þg represents thetask set with the assigned tasks, and fnsðyþ1Þ; nsðyþ2Þ; . . . ;nsðjNjÞg denotes the task set with the unassigned tasks. Toensure that the reliability of the application is satisfied ateach task assignment, we presuppose that each task infnsðyþ1Þ; nsðyþ2Þ; . . . ; nsðjNjÞg is assigned to the processor withthe upper bound on its reliability goal (Eq. (22)). Hence,when assigning nsðyÞ, we express the reliability value of theapplication as follows:

    RðGÞ ¼Yy�1x¼1

    RðnsðxÞÞ �RðnsðyÞÞ �YjNj

    z¼yþ1Rup goalðnsðzÞÞ:

    Then, the actual RðGÞ must be larger than or equal to theRgoalðGÞ according to the problem statement. That is, wehave

    Yy�1x¼1

    RðnsðxÞÞ �RðnsðyÞÞ �YjNj

    z¼yþ1Rup goalðnsðzÞÞ5RgoalðGÞ:

    Therefore, the actual reliability value for the task nsðyÞshould have the constraint as follows:

    RðnsðyÞÞ5 RgoalðGÞQy�1x¼1 RðnsðxÞÞ �

    QjNjz¼yþ1 Rup goalðnsðzÞÞ

    :

    To this end, we can let the reliability goal for the task nsðyÞbe

    RgoalðnsðyÞÞ ¼ RgoalðGÞQy�1x¼1 RðnsðxÞÞ �

    QjN jz¼yþ1 Rup goalðnsðzÞÞ

    : (23)

    Therefore, the reliability goal of the application is trans-ferred to each task. As long as each task satisfies its reliabil-ity goal in Eq. (24)

    RðnsðyÞÞ5RgoalðnsðyÞÞ; (24)then the reliability goal of the application can also be satisfied.

    4.3 Reducing Dynamic Energy ConsumptionGiven that the reliability goal of the application is transferred toeach task, dynamic energy consumption reduction is also trans-ferred to each task. Then, the strategy of reducing dynamicenergy consumption involves the following procedures: tra-versing all the processor and frequency combinations andselecting the combination with the minimum dynamic energyconsumptionwhile satisfying the reliability goal of the task.

    On the basis of the reliability goal for each task (Eq. (23)),we present the heuristic algorithm ESRG described in Algo-rithm 1 to reduce dynamic energy consumption while satis-fying the reliability goal of the application.

    Algorithm 1. The ESRG Algorithm

    Input: G ¼ ðN;W;M;CÞ, U , RgoalðGÞOutput: RðGÞ, EtotalðGÞ and its related values1: Sort the tasks in a list downward task list by descending

    order of ranku values;2: while (there are tasks in downward task list) do3: ni downward task list:outðÞ;4: Calculate RgoalðniÞ using Eq. (23);5: while (RðniÞ < RgoalðniÞ) do6: for (each processor uk 2 U) do7: for (each frequency fk;v in from fk;low and fk;max ) do8: Calculate R ni; uk; fk;v

    � �for the task ni;

    9: if (Rðni; uk; fk;vÞ5RgoalðniÞ) then10: Calculate Edðni; uk; fk;vÞ using Eq. (3);11: if (Edðni; uk; fk;vÞ < EdðniÞ) then12: EdðniÞ Edðni; uk; fk;vÞ;13: RðniÞ Rðni; uk; fk;vÞ;14: break;15: end if16: end if17: end for18: end for19: end while20: end while21: Calculate RðGÞ using Eq. (8);22: Calculate EsðGÞ using Eq. (11);23: Calculate EdðGÞ using Eq. (10);24: Calculate EtotalðGÞ using Eq. (12);25: Calculate SLðGÞ using Eq. (21);

    The main idea of ESRG is that the reliability goal of theapplication is transferred to each task by presupposingunassigned tasks with the upper bound on its reliabilitygoal. Each task only selects the processor and frequencycombination with the minimum dynamic energy consump-tion while satisfying its reliability goal. The main details areexplained as follows:

    (1) In Line 1, ESRG sorts the tasks in a listdownward task list by a descending order of rankuvalues (prioritizing task).

    (2) In Lines 2-20, ESRG iteratively schedules each task ofthe application according to the standard of prioritiz-ing task.

    (3) In Line 4, ESRG obtains the reliability goal of the cur-rent task by applying Eq. (23) before it is prepared tobe assigned.

    (4) In Lines 5-19, ESRG traverses all processor and fre-quency combinations to select the combination withthe minimum dynamic energy consumption whilesatisfying its reliability goal.

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 173

  • (5) In Lines 21-25, ESRG calculates the actual reliabilityvalue, static energy consumption, dynamic energyconsumption, total energy consumption, and sched-ule length of the application, respectively.

    Specifically, when a frequency is found in a processorthat has the minimum dynamic energy consumption, thenthe remaining frequencies in this processor can be skipped(Line 14). The reason is that reliability and frequency aremonotonically increasing on the same processor accordingto the relationship function between them, as expressed inEq. (6)). As there is monotonic increase between energy-effi-cient frequency and dynamic energy consumption accord-ing to Eq. (3)), reliability and dynamic energy consumptionare monotonically increasing on the same processor. There-fore, higher frequencies that generate higher dynamicenergy consumptions can be skipped.

    4.4 Time Complexity of the ESRG AlgorithmThe time complexity of the ESRG algorithm is analyzed asfollows:

    (1) Determining the reliability of the application musttraverse all tasks, which can be performed withinOðjNjÞ time (Lines 2-20).

    (2) Calculating the reliability goal of the current taskmust traverse all tasks, which can be conductedwithin OðjNjÞ time (Line 4).

    (3) Selecting the processor and frequency combinationto find the minimum dynamic energy consumptionwhile satisfying its reliability goal must traverse allprocessor and frequency combinations, which can bedone within OðjUj � jF jÞ time, where jF j representsthe maximum number of discrete frequencies fromthe lowest to the maximum frequencies (Line 5-19).

    Therefore, the time complexity of the ESRG algorithm isOðjNj2 � jUj � jF jÞ, and ESRG implements efficient fault-tolerance without increasing time complexity.

    4.5 Example of the ESRG AlgorithmWe assume that the power parameters for all processors areknown and shown in Table 4, where the maximum fre-quency fk;max for each processor is 1 and the frequency pre-cision is set at 0.01. Thus, we can obtain the lowest energy-efficient frequency fk;low for each processor according toEq. (2). We can also calculate that the minimum andmaximum reliability values are RminðGÞ ¼ 0:879238and RmaxðGÞ ¼ 0:999998 according to Eqs. (15) and (16),

    respectively, for the motivating parallel application. We canthen set the reliability goal of G as RgoalðGÞ = 0.95.Example 1. Table 5 shows the processor and frequency

    combination assignments for the tasks of the motivatingparallel application using the ESRG algorithm. Each rowshows the assigned processor and frequency combination,corresponding reliability value, and dynamic energy con-

    sumption. For example, when assigning n1, its reliability

    goal of n1 isffiffiffiffiffiffiffiffiffi0:9510p ¼ 0:994884; then its assigned processor

    and frequency combination is u2 and 0.89. In addition, theactual reliability value is 0.994926, and final dynamic energyconsumption is 10.1229. Next, the reliability goal for n3should be 0:95

    0:994926�0:9948848 ¼ 0:994841 calculated by Eq. (23);n3’s assigned processor and frequency combination is u2and 0.84. In addition, the actual reliability value is 0.994886and final dynamic energy consumption is 7.6249. Theremaining tasks use the same pattern of n3. Finally, theactual reliability value, schedule length, anddynamic energyconsumption of the application G are 0.950111, 140.32,70.1648 (calculated by Eqs. (8), (21), and (10)), respectively.

    The static energy consumption of the application isrelated with its schedule length according to Eq. (11). There-fore, we have EsðGÞ ¼ 0:005� 140:32 ¼ 2:1048, Finally, thetotal energy consumption calculated by Eq. (12) isEtotalðGÞ ¼ 70:1648þ 2:1048 ¼ 72:2696.

    5 ENERGY-EFFICIENT FAULT-TOLERANTSCHEDULING WITH RELIABILITY GOAL

    5.1 Limitations of ESRGAlthough ESRG can implement energy consumption reduc-tion while satisfying the reliability goal of the application, ithas the following limitations:

    (1) The reliability goal of the application cannot bereached if the reliability goal exceeds a certain

    TABLE 4Power and Failure Parameters of Processors (u1, u2, and u3)

    uk Pk;s Pk;ind Ck;ef mk fk;low fk;max �k;max

    u1 0.03 0.005 0.8 2.9 0.26 1.0 0.0005u2 0.03 0.005 0.7 2.5 0.27 1.0 0.0002u3 0.03 0.005 1.0 2.5 0.29 1.0 0.0009

    TABLE 5Processor and Frequency Combination Assignments for Tasks of the Motivating Parallel Application Using ESRG

    ni Task’sreliability

    goal RgoalðniÞ

    Assigned processorand frequency combination

    < uprðniÞ; fprðniÞ;hzðniÞ >

    AST AFT Actual reliability RðniÞ Dynamic energyconsumption EdðniÞ

    n1 0.994884 < u2, 0:89 > 0 17.98 0.994926 10.1229n3 0.994841 < u2, 0:84 > 17.98 33.45 0.994886 7.6249n4 0.994839 < u2, 0:73 > 33.45 44.41 0.994877 3.9311n2 0.994846 < u2, 0:93 > 44.41 64.84 0.994917 12.7454n5 0.994813 < u2, 0:84 > 64.84 80.32 0.994886 7.6249n6 0.994811 < u2, 0:89 > 80.32 98.3 0.994926 10.1229n9 0.994768 < u2, 0:82 > 98.3 112.93 0.994849 6.8227n7 0.994803 < u1, 0:91 > 56.45 64.15 0.994924 4.9121n8 0.994763 < u1, 0:83 > 113.3 119.32 0.994901 2.9881n10 0.994745 < u2, 0:7 > 130.32 140.32 0.994861 3.2697

    RðGÞ ¼ 0:950111 > RgoalðGÞ ¼ 0:95, SLðGÞ ¼ 140:32, EdðGÞ ¼ 70:1648

    174 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • threshold. The reason is that only one replica existsfor each task, such that the maximum reachable reli-ability value of the application using ESRG is

    Rreach esrgðGÞ ¼Yni2N

    maxuk2U

    Rðni; uk; fk;maxÞ: (25)

    If the reliability goal RgoalðGÞ belongs to the scope of(Rreach esrgðGÞ, RmaxðGÞ], namely,

    Rreach esrgðGÞ < RgoalðGÞ4RmaxðGÞ;then RgoalðGÞ cannot always be satisfied. For exam-ple, the maximum reachable reliability value of themotivating application is Rreach esrgðGÞ = 0.974335; ifthe reliability goal is 0.98, then it cannot be satisfied.

    (2) Even if the reliability goal RgoalðGÞ belongs to thescope of [RminðGÞ, Rreach esrgðGÞ], namely,

    RmaxðGÞ4RgoalðGÞ4Rreach esrgðGÞ;RgoalðGÞ may still not be satisfied. The reason is thatif the upper bound of the reliability goal is too high,then the maximum reachable reliability values ofpartial tasks may be less than the upper bound of thereliability goal in practice (refer to Sections 6.3 and6.4 for additional details). Using fault-tolerance toenhance reliability can obviously solve the abovementioned problem.

    (3) The energy consumption reduction may be limited.For example, the energy consumption of the motivat-ing application is EtotalðGÞ ¼ 72:2696 using ESRG.We can see that all the tasks are assigned with rela-tive high frequency values (refer to Sections 6.2, 6.3,and 6.4 for additional details).

    The above problems can be solved by fault-tolerancebecause it can increase the space of selecting the processorand frequency combinations in satisfying the reliabilitygoal. Therefore, in the following, we use fault-tolerance toimplement energy consumption reduction while satisfyingthe reliability goal.

    5.2 Energy-Efficient Fault-Tolerant SchedulingThe non-fault-tolerance scheduling algorithm ESRG, whichinvolves the selection of one processor and frequency com-bination by each task. On the contrast, the fault-tolerantscheduling with active replication scheme will select multi-ple processor and frequency combinations with the mini-mum dynamic energy consumption. To implement lowtime complexity fault-tolerant scheduling, we still considerthe heuristic method. An intuitive approach is to assigneach task to all the processors (i.e., full replication) withthe minimum energy-efficient frequency values. For exam-ple, the replicas of n1 are assigned to u1, u2, and u3 withlowest energy-efficient frequency values 0.26, 0.27, and0.29, respectively. Although we can obtain the minimumdynamic energy consumption of the n1 as 10.0013, the reli-ability calculated by Eq. (7) for n1 is merely 0.993571,which cannot satisfy the n1’s reliability goal of 0.994884.

    The problem of energy-efficient fault-tolerant schedulingwith reliability goal still has three phases: prioritizing task,satisfying the reliability goal, and reducing the dynamicenergy consumption. Similar to the ESRG algorithm, priori-tizing task is implemented according to a descending order

    of ranku values, and satisfying the reliability goal is imple-mented by transferring the reliability goal of the applicationto each task. Therefore, the core of the energy-efficient fault-tolerant scheduling is quantitatively selecting the processorand frequency combination that generates less dynamicenergy consumption for each task until its reliability goal issatisfied with low time complexity.

    Before proposing the fault-tolerant scheduling algorithm,weuse the taskn1 to explain the procedure of how to quantitativelyselect its processor and frequency combination as follows:

    (1) The reliability goal of n1 is 0.994884 calculated byEq. (23).

    (2) We list all the possible processor and frequency com-bination assignments of n1 in an ascending order of

    Edðnbi ; uprðnbiÞ; fprðnb

    iÞ;hzðnb

    iÞÞ, as shown in Table 6,

    where each row shows the replica’s dynamic energyconsumption value, processor, frequency, replica’sreliability value, task’s reliability value, and task’senergy value of the possible assignment.

    (3) We first select the first combination assignment . In this assignment, 2.4817 is the mini-mum dynamic energy consumption value for theprocessor u1. However, the reliability of n1 is0.763967, which cannot satisfy the reliability goal of0.994884. Therefore, we must further explore thenext assignments.

    (4) We then consider the second assignment which isalso assigned to u1. Given that u1 can only beassigned once using the active replication scheme,we must delete the previous first assignment andselect the current assignment. Therefore, the reliabil-ity of n1 is 0.770294, which still cannot satisfy the reli-ability goal of 0.994884. Therefore, we must explorethe next assignments again.

    (5) We continue to consider the remaining assignmentsby non-stop frequency replacement u1 until the fre-quency is 0.47. The reason is that the next assign-ments are moved to u3, where the minimum energy-

    TABLE 6Possible Assignments of the Task n1

    Edðn1; uk; fk;vÞ uk fk;v Rðn1; uk; fk;vÞ Rðn1Þ Edðn1Þ2.4817 u1 0.26 0.763967 0.763967 2.48172.4863 u1 0.27 0.777776 0.777776 2.4863

    ......

    3.5617 u1 0.47 0.869299 0.869299 3.56173.5779 u3 0.29 0.756305 0.968149 7.1397

    ......

    3.6372 u3 0.34 0.788597 0.972369 7.19893.6520 u1 0.48 0.873037 0.97316 7.28923.6636 u3 0.35 0.794596 0.973921 7.31563.6940 u3 0.36 0.800447 0.974664 7.3463.7283 u3 0.37 0.806153 0.975389 7.38033.7451 u1 0.49 0.876676 0.976094 7.47343.7661 u3 0.38 0.811715 0.97678 7.51123.8074 u3 0.39 0.817137 0.977449 7.55253.8410 u1 0.5 0.880218 0.978096 7.64833.8518 u3 0.4 0.880218 0.978729 7.69283.8993 u3 0.41 0.827566 0.979346 7.74033.9396 u1 0.51 0.883666 0.979940 7.83893.9417 u2 0.27 0.888235 0.997758 11.7806

    ......

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 175

  • efficient frequency is 0.29. Therefore, both assign-ments are selected, and the reliability value of thetask under these two assignments reaches 0.968149.However, the reliability goal of 0.994884 is still notsatisfied, so the next assignments must be continued.

    (6) Finally, the assignments donated in bold in Table 6are selected because the reliability value of n1 is0.997758, which exceeds the reliability goal of0.994884. The remaining assignments can be skippedwithout verification.

    Similar to n1, the remaining task of the application alsoquantitatively selects the processor and frequency combina-tions that generate minimum dynamic energy consumptionuntil its reliability goal is satisfied.

    5.3 The EFSRG AlgorithmBased on the aforementioned exploration procedureexplained in the preceding section, we then propose the heu-ristic algorithm EFSRG described in Algorithm 2 to reducethe energy consumption while satisfying the application’sreliability goal.

    The main idea of EFSRG is that the reliability goal of theapplication is still transferred to each task. Each task quanti-tatively selects multiple processor and frequency combina-tions that generate less dynamic energy consumption untilits reliability goal is satisfied with low time complexity. Themain details are explained as follows:

    (1) In Line 1, EFSRG sorts the tasks in a listdownward task list in descending order of ranku val-ues (prioritizing task).

    (2) In Lines 2-27, EFSRG iteratively schedules each taskof the application according to the standard of priori-tizing task.

    (3) In Line 4, EFSRG calculates the reliability goal of thecurrent task using Eq. (23).

    (4) In Lines 5-12, EFSRG calculates and sorts the possi-ble assignments in the list poss assi listðniÞ inascending order of Edðni; uk; fk;vÞ values.

    (5) In Lines 13-26, EFSRG quantitatively selects the rep-licas and available processors that generate mini-mum dynamic energy consumption for the currenttask until its reliability goal is satisfied with low timecomplexity. The procedure has been explained indetail in Section 5.3

    (6) In Lines 28-32, EFSRG calculates the actual reliabilityvalue, static energy consumption, dynamic energyconsumption, total energy consumption, and sched-ule length of the application, respectively.

    5.4 Time Complexity of the EFSRG AlgorithmThe time complexity of the EFSRG algorithm is analyzed asfollows:

    (1) Calculating the reliability of the application must tra-verse all tasks, which can be done within O(jNj) time(Lines 2-27).

    (2) Calculating the reliability goal of the current taskmust traverse all tasks, which can be done withinO(jN j) time (Line 4).

    (3) Calculating and sorting the possible assignmentsin the list poss assi listðniÞ in ascending order ofEdðni; uk; fk;vÞ values for the current task, which canbe done within OðjU j2 � jF j2Þ time (Lines 5-12).

    (4) The maximum number of the iterative quantitativeprocedures for the current task is jU j � jF j(Lines 12-26), and each procedure must calculatethe reliability of the current task by traversing theexisting replicas (Line 20). Therefore, this processshould be done within OðjU j2 � jF jÞ time.

    Algorithm 2. The EFSRG Algorithm

    Input: G ¼ ðN;W;M;CÞ, U , RgoalðGÞOutput: RðGÞ, EtotalðGÞ and its related values1: Sort the tasks in a list downward task list by descending

    order of ranku values;2: while (there are tasks in downward task list) do3: ni downward task list:outðÞ;4: Calculate RgoalðniÞ using Eq. (23);5: poss assi listðniÞ NULL;6: for (each processor uk 2 U) do7: for (each frequency fk;v in from fk;low and fk;max ) do8: Calculate R ni; uk; fk;v

    � �for the task ni;

    9: Calculate Edðni; uk; fk;vÞ using Eq. (3);10: end for11: end for12: Sort the possible assignments in the list poss assi listðniÞ

    by ascending order of Edðni; uk; fk;vÞ values;13: Define the actual assignments in the list act assi listðniÞ;14: while (RðniÞ < RgoalðniÞ) do15: for (each replica nbi 2 poss assi listðniÞ) do16: if (u

    prðnbiÞ exists in act assi listðniÞ) then

    17: Remove the old uprðnb

    iÞ assignment

    from act assi listðniÞ;18: end if19: Add the new u

    prðnbiÞ assignment into act assi listðniÞ;

    20: Calculate RðniÞ based on the actual assignmentsin the list act assi listðniÞ;

    21: if (RðniÞ5RgoalðniÞ) then22: Calculate EdðniÞ based on the actual assignments

    in the list act assi listðniÞ;23: break;24: end if25: end for26: end while27: end while28: Calculate RðGÞ using Eq. (8);29: Calculate EsðGÞ using Eq. (11);30: Calculate EdðGÞ using Eq. (10);31: Calculate EtotalðGÞ using Eq. (12);32: Calculate SLðGÞ using Eq. (21);

    In considering that (2), (3), and (4) are not nested in thealgorithm, the time complexity of the EFSRG algorithm is

    OðjNj � jUj2 � jF j2 þ jNj2 � jUj2 � jF jÞ, which is higherthan OðjN j2 � jU j � jF jÞ of the ESRG algorithm. Therefore,using fault-tolerance will increase the time complexity.However, our experiments will show that EFSRG couldgenerate less energy consumption than ESRG in generaland it is not time consuming in practice (refer to Section 6.5for additional details).

    5.5 Example of the EFSRG Algorithm

    Example 2. The same parameter values with aforemen-tioned example are used. Table 7 shows the processor

    176 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • and frequency combination assignments for tasks of themotivating parallel application using EFSRG algorithm.Each row also shows the selected processor and frequencycombinations, the actual reliability values, and the dynamicenergy consumption for each task. The reliability goal ofthe first task n1 using EFSRG algorithm is similar to that ofusing ESRG, shown in Table 5. However, the processor andfrequency combination assignments are different. Whenusing EFSRG, n1’s replicas are assigned to three processorswith low frequency values (illustrated in Table 6), such thatthe actual reliability values and dynamic energy consump-tions of ni are also different. Finally, the actual reliabilityvalue, schedule length, and dynamic energy consumptionof the application G are 0.9502, 379.05 and 65.7395 (calcu-lated by Eqs. (8), (21), and (10)), respectively.

    The static energy consumption of the application isEsðGÞ ¼ 0:005� 379:05� 3 ¼ 5:6858, Finally, the totalenergy consumption calculated by Eq. (12) is EtotalðGÞ ¼65:7395þ 5:6858 ¼ 71:4253.

    6 EXPERIMENTS

    6.1 Experimental Metrics and Parameter ValuesThe performance metrics for comparison are the actual reli-ability valueRðGÞ (Eq. (8)) and the total energy consumptionEtotalðGÞ (Eq. (12)) of application. The compared algorithmswith our proposed ESRG and EFSRG are the state-of-the-artMRCRG algorithm [17] because all of them have the sameparallel application model and aim to reduce certain perfor-mance while satisfying the reliability goal of the application.As mentioned earlier, MRCRG aims to reduce resource costby presupposing that each unassigned task is assigned to theprocessor with the maximum reliability. Therefore, we canextend MRCRG to the reducing energy consumption with

    reliability goal (MECRG) algorithm as long as the objectiveof reducing the resource cost to reducing energy consump-tion. Then MECRG can directly compare with the proposedalgorithms in this paper. Finally, the sole difference betweenESRG and MECRG is that they have individual pre-assign-ment reliability values for unassigned tasks.

    Processor and application parameters taken from [12],[19] are as follows: 10 ms 4wi;k4100 ms, 10 ms 4ci;j4100 ms, 0:034Pk;ind40:07, 0:84Ck;ef41:2, 2:54mk43:0,Pk;s ¼ 0:001, and fk;max ¼ 1GHz. All frequencies are discrete,and the precision is 0.1 GHz. The aforementioned values aregenerated with uniform distribution. All parallel applica-tions will be executed in a simulated heterogeneous platformwith 64 processors implemented by Java on a standard desk-top computer (2.6 GHz Intel CPU and 4 GBmemory). To ver-ify the effectiveness and reality, we use two real parallelapplications (fast Fourier transform and Gaussian elimina-tion applications) to compare the results of all the algorithms[14], [19], [26], because fast Fourier transform and Gaussianelimination are two typical parallel applications with high

    TABLE 7Processor and Frequency Combination Assignments for Tasks of the Motivating Parallel Application Using EFSRG

    ni Task’sreliability

    goal

    Processorand frequencyassignments

    ASTs and AFTsof task assignments

    Actualreliability

    Dynamic energyconsumption

    RgoalðniÞ u1 u2 u3 u1 u2 u3 RðniÞ EdðniÞ

    n1 0.994884 0.51 0.27 0.41AST=0 AST=0 AST=0

    0.999144 11.7806AFT=21.95 AFT=27.45 AFT=59.26

    n3 0.990641 0.52 0.27 -AST=59.26 AST=71.26

    - 0.995776 6.3776AFT=80.41 AFT=119.41

    n4 0.989754 0.26 0.42 -AST=80.41 AST=119.41

    - 0.994812 4.5906AFT=130.41 AFT=138.46

    n2 0.989825 0.6 0.744 -AST=130.41 AST=138.46

    - 0.995154 9.271AFT=152.08 AFT=208.83

    n5 0.989557 0.48 0.27 -AST=152.08 AST=208.83

    - 0.994391 6.3329AFT=177.08 AFT=256.97

    n6 0.990047 0.51 - 0.35AST=177.08

    -AST=73.26

    0.990136 7.3217AFT=202.57 AFT= 98.97

    n9 0.994794 0.36 0.45 -AST=256.97 AST=269.97

    - 0.995017 7.1694AFT=306.97 AFT=296.64

    n7 0.994661 0.77 0.27 -AST=202.57 AST=296.64

    - 0.999027 7.3762AFT=211.66 AFT=352.2

    n8 0.990536 0.69 - -AST=211.66

    - - 0.990539 2.1938AFT=218.91

    n10 0.988973 - 0.71 - -AST=369.2

    - 0.99509 3.3258AFT=379.05

    RðGÞ ¼ 0:9502 > RgoalðGÞ ¼ 0:95, SLðGÞ ¼ 379:05, EdðGÞ ¼ 65:7395

    Fig. 3. Example of real parallel applications.

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 177

  • and low parallelism, respectively. Both of them have beenimplemented in embedded systems [51], [52].

    (1) A new parameter r is used as the size of the fastFourier transform application. The total number oftasks is jNj ¼ ð2� r� 1Þ þ r� log 2r, where r ¼ 2yfor some integer y [26]. Fig. 3a shows an example ofthe fast Fourier transform application with r=4.Notably, r exit tasks exist in the fast Fourier trans-form application with the size of r. To adopt theapplication model of this study, we add a virtual exittask, and the last r tasks are set as the immediatepredecessor tasks of the virtual exit task.

    (2) A new parameter r is used as the matrix size of theGaussian elimination application, and the total number

    of tasks is jNj ¼ r2þr�22 [26]. Fig. 3b shows an example oftheGaussian elimination parallel applicationwith r=5.

    Note that the plotted values in each experiment areobtained by executing one run of the algorithms for one appli-cation. Many applications with the equal parameter valuesand scales are tested and show relatively stable results.

    6.2 Small-Scale Parallel Applications for VaryingReliability Goals

    We first consider the small-scale parallel applications withabout 230 tasks. Two experiments are done, and their indi-vidual details are follows:

    Experiment 1. This experiment compares the actual reliabil-ity values and the total energy consumptions of a small-scale fast Fourier transform application with r ¼ 32 (i.e.,jNj ¼ 223) for varying reliability requirements. RgoalðGÞ ischanged from 0.91 to 0.99 with 0.01 increments.

    Fig. 4a shows the actual reliability values of a small-scalefast Fourier transform application on different reliabilitygoals. We can see that all the algorithms can satisfy givenreliability goals in all cases. Specifically, all the reliabilityvalues are close to the given reliability goals. A small differ-ence is that EFSRG obtains a slightly larger reliability valuethan MECRG and ESRG. The reason is that obtaining tighterreliability values by fault-tolerance is a combinatorial opti-mization problem, whereas EFSRG uses a heuristic search.Thus, EFSRG-generated results as compared to those usingMECRG and ESRG have a certain deviation.

    Fig. 4b shows the energy consumptions of the small-scale fast Fourier transform application on different reliabilitygoals. We can obviously see that EFSRG generates minimumenergy consumptions followed by ESRG and MECRG whenthe reliability goal exceeds 0.95. In details, EFSRG-generatedenergy consumptions are only 66.38-68.48 percent of thosegenerated by MECRG in all the cases. EFSRG outperforms

    ESRG by 44.7 percent in saving energy when the reliabilitygoal is 0.99. Twomain reasons are explained as follows:

    (1) The reason for the worst results for MECRG is that itpresupposes that each unassigned task is assigned tothe processor and frequency combination with themaximum reliability. Such pre-assignment inMECRGis too pessimistic to unfair reliability usage amongtasks and thus results in the limited energy reduction.

    (2) The reason for the best results for EFSRG is that ithas more space than MECRG and ESRG to select theprocessor and frequency combinations in satisfyingthe reliability goal.

    Experiment 2. This experiment compares the actual reli-ability values and the total energy consumptions of asmall-scale Gaussian elimination application with r ¼ 21(i.e., jN j=230). The total number of task for the Gaussianelimination is similar to that of the fast Fourier transformapplication for varying reliability requirements. RgoalðGÞis also changed from 0.91 to 0.99 with 0.01 increments.

    Figs. 5a and 5b show the same regular pattern for the actualreliability values and final energy consumptions as those inFigs. 4a and 4b. The values in Experiment 1 and Experiment 2are basically the same. The results indicate that parallelismdegrees do not affect the scopes of actual reliability values andthe total number of replicas in the approximate equal scales.

    6.3 Middle-Scale Parallel Applications for VaryingReliability Goals

    We then consider the middle-scale parallel applicationswith approximately 1,150 tasks. Two experiments are per-formed, and their individual details are as follows:

    Experiment 3. This experiment compares the actual reliabil-ity values and the total energy consumptions of a middle-scale fast Fourier transform application with r ¼ 128 (i.e.,jNj ¼ 1151) for varying reliability requirements. RgoalðGÞis also changed from 0.91 to 0.99 with 0.01 increments.

    Fig. 6a shows the actual reliability values of the middle-scale fast Fourier transform application on different reliabil-ity requirements. In contrast to the small-scale applicationin Fig. 4a, MECRG and ESRG yield reliability values of 0when the reliability goals exceed 0.97 and 0.95, respectively.Correspondingly, the energy consumptions are null, asshown in Fig. 6b. The reasons can be explained as follows:

    (1) The reason forMECRG is that it does not use fault-toler-ance such that the maximum reachable reliability calcu-lated by Eq. (25) is 0.977021. Therefore, if the reliabilitygoal exceeds 0.97, thenMECRGno longer plays a role.

    (2) The reason for ESRG is that it does not also use fault-tolerance. Hence, ESRG also does not work if the reli-ability goal exceeds 0.97. However, ESRG still does

    Fig. 4. Results of the small-scale fast Fourier transform application ondifferent reliability requirements (Experiment 1).

    Fig. 5. Results of the large-scale fast Gaussian elimination application ondifferent reliability requirements (Experiment 2).

    178 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

  • fail when the reliability goals are 0.96 and 0.97 becauseESRG presupposes that each unassigned task isassigned to the processor and frequency combinationwith the upper bound on the reliability goal calculatedby Eq. (22). If the upper bound on the reliability goal istoo high, then the maximum reachable reliability val-ues of partial tasks may be less than that the upperbound on the reliability goal Rup goalðGÞ in practice.This phenomenon is also the most remarkable draw-back of ESRG although less energy is consumed byESRG than byMECRG in the case of reliability.

    Fortunately, EFSRG eliminates the unpredictability of ESRGin a fault-tolerant manner and further reduces energy con-sumption by increasing the processor and frequency combina-tion space. EFSRG outperformsMECRG and ESRG in terms ofsaving energy on average by 66 and 32 percent, respectively.

    Experiment 4. This experiment compares the actual reliabil-ity values and the total energy consumptions of a middle-scale Gaussian elimination application with r ¼ 47 (i.e.,jNj ¼ 1127) for varying reliability requirements. RgoalðGÞis also changed from 0.91 to 0.99with 0.01 increments.

    The results are shown in Figs. 7a and 7b.Experiment 4 showssimilar patterns and values as Experiment 2 in actual reliabilityvalues and total energy consumptions for all the algorithms.Therefore, the results of Experiment 2 and Experiment 4 furtherindicate that parallelism degrees do not affect the results ofactual reliability values and total energy consumptions.

    6.4 Large-Scale Parallel Applications for VaryingReliability Goals

    Considering that small-scale and middle-scale parallelapplications show different scenarios, where some higher-reliability goals cannot be satisfied with MECRG and ESRG,we should understand the quality on a large scale. There-fore, we continue to use fast Fourier transform and Gauss-ian elimination applications in the experiments.

    Experiment 5. This experiment compares the actual reliabil-ity values and the total energy consumptions of a large-scale fast Fourier transform application with r ¼ 256 (i.e.,jNj ¼ 2559) for varying reliability requirements. RgoalðGÞis also changed from 0.91 to 0.99 with 0.01 increments. Theresults are shown in Figs. 8a and 8b.

    Experiment 6. This experiment compares the actual reliabil-ity values and the total energy consumptions of a large-scale fast Gaussian elimination application with r ¼ 71(i.e., jNj ¼ 2555) for varying reliability requirements.RgoalðGÞ is also changed from 0.91 to 0.99 with 0.01 incre-ments. The results are shown in Figs. 9a and 9b.

    (1) Fast Fourier transform and Gaussian eliminationapplications show nearly consistent results in thesame scales compared with those in Figs. 8a and8b and Figs. 9a and 9b.

    (2) ESRG has been completely ineffective for large-scale applications, whose results are null. MECRGremains valid when the reliability goals are lessthan the maximum reachable reliability. Third,EFSRG is always effective, and the generatedenergy consumptions are always the lowest.

    6.5 Computation Time Values on Different ScalesThe proposed EFSRG algorithm can be time consuming insome cases because the time complexity of it is higher thanthose of MECRG and ESRG. Therefore, we list the computa-tion time values of the algorithms of the preceding experi-ments. Table 8 shows the minimum, average, and maximumcomputation time values of all the algorithms on differentscales of various applications. The computation time valuesare few in the three algorithms. In particular, MECRG andESRG are completed within 1 second. Although EFSRG con-sumes about 3.5 seconds for large-scale applications, thisduration is shorter and thus a fully acceptable value. There-fore, EFSRG is not time consuming in practice.

    7 CONCLUSIONS

    In this study, two energy-efficient scheduling algorithmsESRG and EFSRG are proposed for reliable DAG-based par-allel applications in a heterogeneous distributed embeddedsystem. Both ESRG and EFSRG are solved by dividing it intothree sub-problems: prioritizing tasks, satisfying reliabilitygoal, and reducing energy consumption. The ESRG algo-rithm aims to reduce the energy consumption while satisfy-ing the reliability goal of applications by presupposingunassigned tasks with the upper bound on its reliability goal.A fault-tolerant EFSRG algorithm is further proposed toreduce the energy consumption while satisfying the

    Fig. 6. Results of the middle-scale fast Fourier transform application ondifferent reliability requirements (Experiment 3).

    Fig. 7. Results of the middle-scale Gaussian elimination application ondifferent reliability requirements (Experiment 4).

    Fig. 8. Results of the large-scale fast Fourier transform application ondifferent reliability requirements (Experiment 5).

    Fig. 9. Results of the large-scale Gaussian elimination application ondifferent reliability requirements (Experiment 6).

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 179

  • reliability goal of the application and thus eliminate theunreachability of high reliability goal via ESRG. Our experi-ments on real parallel applications with different scales con-firm that the energy consumption reduced by EFSRG isgreater than those reduced by ESRG and existing approaches.Although the time complexity of EFSRG is higher than thoseof ESRG and existing approaches because of fault-tolerance,our experiments show that the proposed EFSRG algorithm isnot time consuming in practice. Hence, the proposed EFSRGalgorithm can effectively facilitate an energy-efficient designfor reliable parallel applications in heterogeneous distributedembedded systems. A future research direction would beto develop energy-efficient fault-tolerance scheduling consi-dering reliability goal and timing constraint together fora real-time embedded application on these platforms.

    ACKNOWLEDGMENTS

    The authors would like to express their gratitude to theanonymous reviewers whose constructive comments havehelped to improve the manuscript. This work was partiallysupported by the National Key Research and DevelopmentPlan of China under Grant No. 2016YFB0200405, theNational Natural Science Foundation of China with GrantNos. 61672217, 61432005, 61379115, 61402170, 61370097,61502162 and 61502405, the CERNET Innovation Projectunder Grant No. NGII20161003, and the China PostdoctoralScience Foundation under Grant No. 2016M592422.

    REFERENCES[1] M. Lin, Y. Pan, L. T. Yang, M. Guo, and N. Zheng, “Scheduling co-

    design for reliability and energy in cyber-physical systems,” IEEETrans. Emerg. Topics Comput., vol. 1, no. 2, pp. 353–365, Dec. 2013.

    [2] D. Zhu and H. Aydin, “Reliability-aware energy management forperiodic real-time tasks,” IEEE Trans. Comput., vol. 58, no. 10,pp. 1382–1397, Oct. 2009.

    [3] K. Li, “Energy-efficient task scheduling onmultiple heterogeneouscomputers: Algorithms, analysis, and performance evaluation,”IEEE Trans. Sustain. Comput., vol. 1, no. 1, pp. 7–19, Jan.-Jun. 2017.

    [4] K. Li, “Scheduling precedence constrained tasks with reducedprocessor energy on multiprocessor computers,” IEEE Trans. Com-put., vol. 61, no. 12, pp. 1668–1681, Dec. 2012.

    [5] K. Li, “Power and performance management for parallel compu-tations in clouds and data centers,” J. Comput. Syst. Sci., vol. 82,no. 2, pp. 174–190, Mar. 2016.

    [6] K. Li, “Energy and time constrained task scheduling on multipro-cessor computers with discrete speed levels,” J. Parallel Distrib.Comput., vol. 95, pp. 15–28, Sep. 2016.

    [7] X. Xiao, G. Xie, R. Li, and K. Li, “Minimizing schedule length ofenergy consumption constrained parallel applications on hetero-geneous distributed systems,” in Proc. 14th IEEE Int. Symp. ParallelDistrib. Process. Appl., 2016, pp. 1471–1476.

    [8] G. Xie, X. Xiao, R. Li, and K. Li, “Schedule length minimizationof parallel applications with energy consumption constraintsusing heuristics on heterogeneous distributed systems,” Concur-rency Comput.-Parctice Experience, Oct. 2016, doi: 10.1002/cpe.4024.

    [9] “Enhanced intel speedstep technology for the intel pentium Mprocessor,” Mar. 2014. [Online]. Available: http://download.intel.com/design/network/papers/30117401.pdf

    [10] “Amd powernow!? technology informational white paper,”Nov. 2000. [Online]. Available: http://www.amd-k6.com/wp-content/uploads/2012/07/24404a.pdf

    [11] K. Flautner, D. Flynn, and M. Rives, “A combined hardware-soft-ware approach for low-power SoCs: Applying adaptive voltagescaling and intelligent energy management software,” in Proc.High-Performance Syst. Des. Conf., 2003, pp. 1–17.

    [12] B. Zhao, H. Aydin, and D. Zhu, “Onmaximizing reliability of real-time embedded applications under hard energy constraint,” IEEETrans. Ind. Inform., vol. 6, no. 3, pp. 316–328, Aug. 2010.

    [13] B. Zhao, H. Aydin, and D. Zhu, “Shared recovery for energy effi-ciency and reliability enhancements in real-time applications withprecedence constraints,” ACM Trans. Des. Autom. Electron. Syst.,vol. 18, no. 2, pp. 99–109, Mar. 2013.

    [14] L. Zhang, K. Li, Y. Xu, J. Mei, F. Zhang, and K. Li, “Maximizingreliability with energy conservation for parallel task scheduling ina heterogeneous cluster,” Inf. Sci., vol. 319, no. C, pp. 113–131,Oct. 2015.

    [15] L. Zhang, K. Li, K. Li, and Y. Xu, “Joint optimization of energyefficiency and system reliability for precedence constrained tasksin heterogeneous systems,” Int. J. Electr. Power Energy Syst.,vol. 78, pp. 499–512, Jun. 2016.

    [16] L. Zhang, K. Li, C. Li, and K. Li, “Bi-objective workflow schedul-ing of the energy consumption and reliability in heterogeneouscomputing systems,” Inf. Sci., vol. 379, pp. 241–256, 2017.

    [17] G. Xie, Y. Chen, Y. Liu, Y.Wei, R. Li, and K. Li, “Resource consump-tion cost minimization of reliable parallel applications on heteroge-neous embedded systems,” IEEE Trans. Ind. Inform., vol. PP, no. 99,p. 1, Dec. 2016.

    [18] L. Zhao, Y. Ren, Y. Xiang, and K. Sakurai, “Fault-tolerant schedul-ing with dynamic number of replicas in heterogeneous systems,”in Proc. 12th IEEE Int. Conf. High Performance Comput. Commun.,2010, pp. 434–441.

    [19] L. Zhao, Y. Ren, and K. Sakurai, “Reliable workflow schedulingwith less resource redundancy,” Parallel Comput., vol. 39, no. 10,pp. 567–585, Jul. 2013.

    [20] A. Girault and H. Kalla, “A novel bicriteria scheduling heuristicsproviding a guaranteed global system failure rate,” IEEE Trans.Depend. Secure Comp., vol. 6, no. 4, pp. 241–254, Oct.–Dec. 2009.

    [21] A. Benoit, M. Hakem, and Y. Robert, “Fault tolerant scheduling ofprecedence task graphs on heterogeneous platforms,” in Proc.22th IEEE Int. Parallel Distrib. Process. 2008, pp. 1–8.

    [22] A. Benoit and M. Hakem, “Optimizing the latency of streamingapplications under throughput and reliability constraints,” inProc. 45th Int. Conf. Parallel Process., 2009, pp. 325–332.

    [23] J. Machrouh, et al., “Cross domain comparison of system assur-ance,” in Proc. Embedded Real Time Softw. Syst., Toulouse, France,pp. 1–3, 2012.

    [24] Road Vehicles-Functional Safety, ISO 26262, 2011.[25] Z. Li, L. Wang, S. Li, S. Ren, and G. Quan, “Reliability guaranteed

    energy-aware frame-based task set execution strategy for hardreal-time systems,” J. Syst. Softw., vol. 86, no. 12, pp. 3060–3070, Dec.2013.

    [26] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-effective andlow-complexity task scheduling for heterogeneous computing,”IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 3, pp. 260–274,Mar. 2002.

    [27] G. Xie, R. Li, and K. Li, “Heterogeneity-driven end-to-end syn-chronized scheduling for precedence constrained tasks and mes-sages on networked embedded systems,” J. Parallel Distrib.Comput., vol. 83, pp. 1–12, Sep. 2015.

    [28] G. Xie, L. Liu, L. Yang, and R. Li, “Scheduling trade-off ofdynamic multiple parallel workflows on heterogeneous distrib-uted computing systems,” Concurrency Comput.-Parctice Experi-ence, vol. 29, pp. 1–18, 2017.

    [29] G. Xie, G. Zeng, L. Liu, R. Li, andK. Li, “Mixed real-time schedulingof multiple dags-based applications on heterogeneous multi-coreprocessors,”Microprocess. Microsy., vol. 47, pp. 93–103, Nov. 2016.

    TABLE 8Computation Time Values (Unit: MS) on Different Applications

    and Scales Using Different Algorithms

    Small-scale fastFourier transform

    Small-scaleGaussian elimination

    Minimum Average Maximum Minimum Average Maximum

    MECRG 15 20 26 16 20 26ESRG 16 28 69 17 30 141EFSRG 57 80 123 60 77 106

    Middle-scale fastFourier transform

    Middle-scaleGaussian elimination

    Minimum Average Maximum Minimum Average Maximum

    MECRG 122 188 336 114 143 257ESRG 115 168 304 110 128 267EFSRG 313 583 1539 325 578 1674

    Large-scale fastFourier transform

    Large-scaleGaussian elimination

    Minimum Average Maximum Minimum Average Maximum

    MECRG 395 538 889 357 393 419ESRG - - - - - -EFSRG 1058 1943 3422 954 1479 2240

    180 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

    http://download.intel.com/design/network/papers/30117401.pdfhttp://download.intel.com/design/network/papers/30117401.pdfhttp://www.amd-k6.com/wp-content/uploads/2012/07/24404a.pdfhttp://www.amd-k6.com/wp-content/uploads/2012/07/24404a.pdf

  • [30] M. A. Khan, “Scheduling for heterogeneous systems using con-strained critical paths,” Parallel Comput., vol. 38, no. 4, pp. 175–193, May 2012.

    [31] Y. Guo, D. Zhu, and H. Aydin, “Reliability-aware power manage-ment for parallel real-time applicationswith precedence constraints,”inProc. Int. Green Comput. Conf. andWorkshops., 2011, pp. 1–8.

    [32] M. Salehi, et al., “Drvs: Power-efficient reliability managementthrough dynamic redundancy and voltage scaling under var-iations,” in Proc. IEEE/ACM Int. Symp. Low Power Electron. Des.,2015, pp. 225–230.

    [33] M. Salehi, A. Ejlali, and B. M. Al-Hashimi, “Two-phase low-energy n-modular redundancy for hard real-time multi-core sys-tems,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 5, pp. 1497–1510, May 2016.

    [34] G. Xie, G. Zeng, L. Liu, R. Li, and K. Li, “High performance real-time scheduling of multiple mixed-criticality functions in hetero-geneous distributed embedded systems,” J. Syst. Architect.,vol. 70, pp. 3–14, Oct. 2016.

    [35] Z. Zong, A. Manzanares, X. Ruan, and X. Qin, “EAD and PEBD:Two energy-aware duplication scheduling algorithms for paralleltasks on homogeneous clusters,” IEEE Trans. Comput., vol. 60,no. 3, pp. 360–374, Mar. 2011.

    [36] Y. C. Lee and A. Y. Zomaya, “Energy conscious scheduling fordistributed computing systems under different operating con-ditions,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 8, pp. 1374–1381, Aug. 2011.

    [37] Q. Huang, S. Su, J. Li, P. Xu, K. Shuang, and X. Huang, “Enhancedenergy-efficient scheduling for parallel applications in cloud,” inProc. 12th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput., 2012,pp. 781–786.

    [38] Z. Tang, L. Qi, Z. Cheng, K. Li, S. U. Khan, and K. Li, “An energy-efficient task scheduling algorithm in DVFS-enabled cloud envi-ronment,” J. Grid Comput., vol. 14, no. 1, pp. 55–74, Mar. 2016.

    [39] S. M. Shatz and J. P. Wang, “Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer sys-tems,” IEEE Trans. Rel., vol. 38, no. 1, pp. 16–27, Apr. 1989.

    [40] A. Do�gan and F. €Ozg€uner, “Biobjective scheduling algorithms forexecution time–reliability trade-off in heterogeneous computingsystems,” Comput. J., vol. 48, no. 3, pp. 300–314, Mar. 2005.

    [41] J. J. Dongarra, E. Jeannot, E. Saule, and Z. Shi, “Bi-objective sched-uling algorithms for optimizing makespan and reliability on het-erogeneous systems,” in Proc. 19th ACM Int. Symp. ParallelAlgorithms Architectures., 2007, pp. 280–288.

    [42] J. Yi, Q. Zhuge, J. Hu, S. Gu, M. Qin, and H. M. Sha, “Reliability-guaranteed task assignment and scheduling for heterogeneousmultiprocessors considering timing constraint,” J. Signal Process.Syst., vol. 81, no. 3, pp. 1–17, Dec. 2015.

    [43] X. Tang and W. Tan, “Energy-efficient reliability-aware schedul-ing algorithm on heterogeneous systems,” Sci. Program., vol. 2016,pp. 1–13, Mar. 2016.

    [44] X. Qin, H. Jiang, and D. R. Swanson, “An efficient fault-tolerantscheduling algorithm for real-time tasks with precedence con-straints in heterogeneous systems,” in Proc. 31th Int. Conf. ParallelProcess., 2002, pp. 360–368.

    [45] X. Qin and H. Jiang, “A novel fault-tolerant scheduling algorithmfor precedence constrained tasks in real-time heterogeneous sys-tems,” Parallel Comput., vol. 32, no. 5, pp. 331–356, Jun. 2006.

    [46] Q. Zheng, B. Veeravalli, and C.-K. Tham, “On the design of fault-tolerant scheduling strategies using primary-backup approach forcomputational grids with low replication costs,” IEEE Trans. Com-put., vol. 58, no. 3, pp. 380–393, Mar. 2009.

    [47] J. D. Ullman, “NP-complete scheduling problems,” J. Comput.Syst. Sci., vol. 10, no. 3, pp. 384–393, Jun. 1975.

    [48] H. Arabnejad and J. G. Barbosa, “List scheduling algorithm forheterogeneous systems by an optimistic cost table,” IEEE Trans.Parallel Distrib. Syst., vol. 25, no. 3, pp. 682–694, Mar. 2014.

    [49] G. Xie, J. Jiang, Y. Liu, R. Li, andK. Li, “Minimizing energy consump-tion of real-time parallel applications on heterogeneous systems,”IEEE Trans. Ind. Inform., vol. 13, no. 3, pp. 1068–1078, Jun. 2017.

    [50] A. Benoit, L.-C. Canon, E. Jeannot, and Y. Robert, “Reliability of taskgraph schedules with transient and fail-stop failures: Complexityand algorithms,” J. Scheduling, vol. 15, no. 5, pp. 615–627, Oct. 2012.

    [51] J. Hascoet, J.-F. Nezan, A. Ensor, and B. D. de Dinechin,“Implementation of a fast fourier transform algorithm onto amanycore processor,” in Proc. Conf. Design Architecturesr SignalImage Process., 2015, pp. 1–7.

    [52] T. Mladenov, S. Nooshabadi, and K. Kim, “Implementation andevaluation of raptor codes on embedded systems,” IEEE Trans.Comput., vol. 60, no. 12, pp. 1678–1691, Dec. 2011.

    Guoqi Xie received the PhD degree in computerscience and engineering from Hunan University,China, in 2014. He was a postdoctoral researcherwith Nagoya University, Japan, from 2014 to2015. Since 2015, he has been working as a post-doctoral researcher with Hunan University. Hereceived the best paper award from ISPA 2016.His major interests include embedded and real-time systems, parallel and distributed systems,software engineering, and methodology. He is amember of the IEEE, the ACM, and the CCF.

    Yuekun Chen is currently working toward the PhDdegree at Hunan University. Her research interestsinclude energy-efficient computing, reliability-aware computing, and software engineering.

    Xiongren Xiao is working toward the PhD degreeand is an assistant professor in the College ofComputer Science and Electronic Engineering,Hunan University, China. His main research inter-ests include energy-efficient computing, reliabil-ity-aware computing, and embedded systems.He is a member of the ACM and the CCF.

    Cheng Xu is a full professor in the College ofComputer Science and Electronic Engineering,Hunan University, China. His major researchinclude embedded systems and cyber-physicalsystems. He is member of the ACM and the CCF.

    Renfa Li is a professor of computer science andelectronic engineering, and the dean of the Collegeof Computer Science and Electronic Engineering,Hunan University, China. He is the director ofthe Key Laboratory for Embedded and NetworkComputing, Hunan Province, China. He is also anexpert committee member of the National Super-computing Center, Changsha, China. His majorinterests include computer architectures, embed-ded computing systems, cyber-physical systems,and Internet of things. He is a member of the coun-

    cil of the CCF, and a seniormember of the IEEE and the ACM.

    Keqin Li is a SUNY distinguished professor ofcomputer science. His current research interestsinclude parallel computing and high-performancecomputing, distributed computing, energy-effi-cient computing and communication, heteroge-neous computing systems, cloud computing, bigdata computing, CPU-GPU hybrid and coopera-tive computing, multicore computing, storage andfile systems, wireless communication networks,sensor networks, peer-to-peer file sharing sys-tems, mobile computing, service computing,

    Internet of things, and cyber-physical systems. He has published morethan 470 journal articles, book chapters, and refereed conferencepapers, and has received several best paper awards. He is currently orhas served on the editorial boards of the IEEE Transactions on Paralleland Distributed Systems, the IEEE Transactions on Computers, theIEEE Transactions on Cloud Computing, the IEEE Transactions onServices Computing, and the IEEE Transactions on Sustainable Com-puting. He is a fellow of the IEEE.

    XIE ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT SCHEDULING OF RELIABLE PARALLEL APPLICATIONS ON HETEROGENEOUS... 181

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntent