Hybrid Heuristics for Optimizing Energy Consumption in ...

Hybrid Heuristics for Optimizing EnergyConsumption in Embedded Systems

Maha IDRISSI AOUAD1, Rene SCHOTT2 and Olivier ZENDRA1

1 INRIA Nancy - Grand Est / LORIA. 615, Rue du Jardin Botanique,54600 Villers-Les-Nancy, France

{Maha.IdrissiAouad, Olivier.Zendra}@inria.fr2 IECN - LORIA, Nancy-Universite, Universite Henri Poincare.

54506 Vandoeuvre-Les-Nancy, [email protected]

Abstract. Memory energy reduction becomes crucial for many embed-ded systems designers. In this paper, we propose Hybrid Heuristics formemory management which are, to the best of our knowledge, new origi-nal alternatives to the best known existing heuristic (BEH ). Our HybridHeuristics outperform BEH. In fact, our Hybrid Heuristics manage toconsume nearly from 76% up to 98% less memory energy than BEH indifferent configurations. In addition our Hybrid Heuristics are easy toimplement and do not require list sorting (contrary to BEH).

Keywords: Energy consumption reduction, Genetic algorithms, hybridheuristics, memory allocation management, optimizations, Tabu Search.

1 Introduction

Embedded systems become more and more energy greedy due to technologyevolution. Indeed, these systems must integrate multiple complex functionali-ties (videophone, Internet, etc.) which needs bigger battery and memory. Thesesystems are mainly our daily life objects, such as: cell phones, laptops, DVDplayers, console games, etc. Depending on that, memory will become the majorenergy consumer in an embedded system (see trends in [17]). Hence, reducingmemory energy consumption of embedded systems is of great importance. Todo so, numerous options to save energy, hence increase autonomy, exist. Thesevarious approaches can be classified in two main categories: hardware optimiza-tions and software optimizations. Hardware techniques fall beyond the scope ofthis paper, but a large amount of literature about them is available [11].

In this paper, we will tackle the issue of optimal energy usage in real-timeembedded systems from a software point of view, working on the memory man-agement and focusing on software, compiler-assisted techniques. In order to re-duce memory energy consumption, most authors rely on Scratch-Pad Memories(SPMs) rather than caches. A large variety of chips with SPM is available inthe market [2, 8]. Consequently, many authors have tried to benefit from the

advantages of SPMs and various related research directions have been investi-gated. These techniques, synthesized in [6], try to optimally allocate applicationcode and/or data to SPM in order to reduce the energy consumption of embed-ded systems. The interested reader can look at [7] for a comprehensive list ofreferences. Cache memory is random access memory (RAM ) that a computermicroprocessor can access more quickly than it can access regular RAM. As themicroprocessor processes data, it looks first in the cache memory and if it findsthe data there (from a previous reading of data), it does not have to do themore time-consuming reading of data from larger memory [22]. Although cachememories, help a lot with program speed, they are not the most appropriatefor embedded systems. In fact, caches increase the system size and its energycost because of cache area plus managing logic. In contrast, SPMs have inter-esting features. SPM also known as local store in computer terminology, is ahigh-speed internal memory used for temporary storage of calculations, data,and other work in progress. It can be considered as similar to an L1 cache inthat it is the memory next closest to the ALU’s after the internal registers, withexplicit instructions to move data from and to main memory. Like cache, there-fore, SPM consists of small, fast SRAM, but the main difference is that SPMis directly and explicitly managed at the software level, either by the developeror by the compiler, whereas cache requires extra dedicated circuits. Its softwaremanagement makes it more predictable as we avoid cache miss cases which isan important feature in real-time embedded systems. Compared to cache, SPMthus has several advantages [16]. SPM requires up to 40% less energy and 34%less area than cache [4]. Further, the run-time with an SPM using a simple staticknapsack-based [4] allocation algorithm is 18% better as compared to a cache.[5] show the effectiveness of using an SPM in a memory architecture where a sav-ing about 35% in energy consumption is achieved when compared to a memoryarchitecture without an SPM. [1] use statistical methods and the IndependentReference Model (IRM ) to prove that SPMs, with an optimal mapping based onaccess probabilities, will always outperform the direct-mapped cache, irrespec-tive of the layout influencing the cache behavior. Additionally, manufacturingSPM cost is lower.

The rest of the paper is organized as follows. Section 2 gives some relatedworks and describes some classical existing heuristics. Section 4 presents ourHybrid Heuristics to find the optimized memory data allocation reducing energyconsumption. Section 5 gives the memory energy consumption model we usedin order to estimate the energy consumed by our different heuristics. Section6 shows the various experimental results obtained. Finally, Section 7 concludesand gives some perspectives.

2 Related Works and Classical Heuristics

The correct choice of an optimized memory allocation is of great importance.The approaches presented in this section try to determine the most appropriatememory allocation reducing energy consumption according to memory type (ac-

cess speed, energy cost, large number of miss access cases, etc.) and applicationbehavior. In these techniques, because of the reduced SRAM size, one tries tooptimally allocate data in it in order to realize energy savings. An approach isto place interesting data in memory with a low access cost (SPM ) whereas theother data will be placed in a memory with low storage cost (DRAM ). Thesedata could be the most frequently accessed/used data [3, 21, 24, 25] or it couldbe data that is the most cache-conflict prone [18, 23].

In order to determine interesting data, these methods use profile data togather memory access frequency information. This information can be collectedeither statistically by analyzing the source code of the application or dynamicallyby profiling the application (number of times data is accessed, data size, accessfrequency, etc.). Thus, most of the authors use one of the following strategies:

Allocate data into SPM by size: the smaller data are allocated into SPMas there is space available else they are allocated in DRAM. This method hasthe advantage of being simple to implement since it only considers the size of thedata but has the disadvantage of allocating the largest data in DRAM. Theselargest data could be often accessed, which will imply a very few energy economy.

Allocate data into SPM by number of accesses: the most frequentlyaccessed/used data are allocated into SPM as there is space available else theyare allocated in DRAM. This strategy is optimal than the previous one, since themost frequently accessed/used data will be allocated in a memory that consumesless energy and therefore will achieve more savings as explained and demon-strated in [20, 21]. However, we can note granularity problems in some casessuch as a structure which only one part is often accessed/used.

Allocate data memory into SPM by number of accesses and size(BEH ): this is somehow a combination of the two previous strategies. The ideahere is to combine their advantages. If we consider the example of a structure inwhich only a part is the most frequently accessed/used, we take into account theaverage number of access to this structure. This avoids granularity problems.Here, data are sorted according to their ratio (access number/size) in descend-ing order. The data with the highest ratio is allocated first into SPM as there isspace available. Otherwise it is allocated in DRAM. This heuristic uses a sortingmethod which can be computationally expensive for a large amount of data. Ad-ditionally, this sorting method will not work very well in a dynamic perspectivewhere the maximum capacity of the SPM is not known in advance. This is, sofar, the best known existing heuristic (BEH ).

In the rest of this paper, we will refer to the strategy BEH as a basis for ourmemory energy optimizations.

3 Optimization Problem

Our problem is a combinatorial optimization problem like the famous knapsackproblem [13]. Suppose memory is a big knapsack and data are items. We wantto fill this knapsack that can hold a total weight of W with some combination

of items from a list of N possible items each with weight wi and value vi so thatthe value of the items packed into the knapsack is maximized. This problemhas a single linear constraint, a linear objective function which sums the valuesof the items in the knapsack, and the added restriction that each item will bein the knapsack or not. If N is the total number of items, then there are 2N

subsets of the item collection. So an exhaustive search for a solution to thisproblem generally takes exponential running time. Therefore, the obvious bruteforce approach is infeasible. In this paper, we propose Hybrid Heuristics basedon TS and GAs to solve this problem.

4 Hybrid Heuristics

In this section, we propose different heuristics based on Tabu Search (TS ) [10]and on Genetic algorithms (GAs) [19] to solve the problem of optimizing thememory data allocation. These heuristics are an alternative to BEH presented inSection 2. Each of these heuristics is explained in details in the subsections below.However, we give before a brief explanation on how TS and GA approaches areimplemented. For both approaches, we use the same following encoding : if N isthe total number of data, then a solution is just a finite sequence s of N termssuch that s[n] is either 0 or the size of the nth data. s[n] = 0 if and only if thenth data is not selected in the solution. This solution must satisfy the constraintof not exceeding the maximum SPM capacity (i.e.

∑Ni=1 s[i] ≤ C).

TS: The memory (SPM) is filled such that it can hold a maximum capacityof C with a combination of data from a list of N possible data each with sizesizei and access number ani so that the access number of the data allocatedinto SPM is maximized. In the implemented TS algorithm, an initial solution isfirst generated randomly. A maximum number of iterations and a lifespan on thetabu list are also set. Initially, the optimal solution equals the initial solution,the optimal access number is the access number of the initial solution and thetabu list is empty. As long as the number of iterations is not exceeded, repeat:

– The eth neighborhood of the current solution is generated.– A new matrix containing the neighboring vectors is computed.– Based on the solutions contained in this matrix, a vector of corresponding

current size values and a vector of corresponding current access numbervalues are calculated.

– Best solution are kept from neighborhood.– The tabu list is updated to make a transition back to the old solution im-

possible for a period.– Finally, update is performed if this new access number is better than the

existing optimal access number.

GA: An individual is a single solution. A population is a collection of in-dividuals. The two important aspects of population used in GAs are the initialpopulation generation and the population size. Ideally, the initial population

should have a gene pool as large as possible in order to be able to explore thewhole search space. Hence, to achieve this, the initial population is, in most ofthe cases, chosen randomly. Other key elements of GA are listed below:

– Fitness: the value of an objective function. It not only indicates how goodthe solution is, but also corresponds to how close the chromosome is to theoptimal one. At each stage (generation), the solution points are evaluatedfor fitness (according to how much of the memory capacity they fill), andthe best and worst performers are identified.

– Crossover: the process of taking two parent solutions and producing fromthem a child. After the selection process, the population is enriched with bet-ter individuals. Here, when conditions are ripe for breeding, the best solutionmates with a random (non-extreme) solution, and the offspring replaces theworst one. The child inherits a part of each parent’s genes. We used thefollowing crossover techniques and probability:• Single Point Crossover: the two mating chromosomes are cut once at

a randomly selected point and the sections after the cuts are exchanged.• Two Points Crossover: two points are chosen randomly and the con-

tents between these points are exchanged between two mated parents.• Crossover Probability (Pc): describes how often crossover will be per-

formed. If there is no crossover, offspring are exact copies of parents. IfPc = 1, then all offspring are made by crossover. If Pc = 0, whole newgeneration is made from exact copies of chromosomes from old popula-tion. Crossover is made in hope that new chromosomes will contain goodparts of old chromosomes and therefore will be better.

– Mutation: prevents the algorithm to be trapped in local optima. It is seenas a background operator to maintain genetic diversity in the population byvarying the gene pool. The genes of a solution are just its sequence terms.In our case, a mutation of a solution is a random change of up to half of itscurrent genes. A gene change sets a term to 0 if the term is currently nonzero,and sets a term to the corresponding data size if the term is currently zero.• Mutation Probability (Pm): decides how often parts of chromosome

will be mutated. If there is no mutation, offspring are generated immedi-ately after crossover without any change. If Pm = 1, whole chromosomeis changed, if Pm = 0, nothing is changed. Mutation should not occurvery often, because then GA will in fact change to random search.

– Search Termination: various stopping conditions exist: after a specifiedtime has elapsed, if there is no change to the population’s best fitness for aspecified number of generations or during an interval of time, etc. Here, westop GA when the maximum number of generations is reached.

TS - GA In this heuristic, we start by applying TS first. Then, we apply GA.In order to achieve this, nothing is changed in the way TS is implemented. Incontrast, for GA, instead of considering the solution which satisfy the most thefitness from the initial population as the solution to improve, we consider thebest solution found by TS. That way, if GA finds a better solution than the onefound by TS, GA takes its solution. Else, GA keeps the solution found by TS.

GA - TS In this heuristic, we start by applying GA first. Then, we apply TS.To do so, nothing is changed in the way GA is implemented. In contrast, forTS, instead of generating an initial solution randomly, we take the best solutionfound by GA as the initial solution of TS and then we try to improve it. Thatway, if TS finds a better solution than the one found by GA, TS takes its solution.Else, TS keeps the solution found by GA.

GA Hybrid In this heuristic, we consider the GA approach. This time, insteadof applying all habitual GA operators (crossover and mutation), we replace themutation operator by TS. This heuristic is more elaborate than the two previousones, in the sense that it does not implement a simple combination. By replacingthe mutation operator by TS, we look for optimizing the solution given by GA. Infact, TS is much more armed to point out an interesting solution than mutationwith its random way of selection.

5 Memory Energy Estimation Model

In order to compute the energy cost of the system for each configuration, wepropose in this section an energy consumption estimation model for our consid-ered memory architecture composed by an SPM, a DRAM and an instructioncache memory. In our model, we distinguish between the two cache write poli-cies: Write-Through (WT ) and Write-Back (WB). In a WT cache, every write tothe cache causes a synchronous write to DRAM. Alternatively, in a WB cache,writes are not immediately mirrored to DRAM. Instead, the cache tracks whichof its locations have been written over and then, it marks these locations asdirty. The data in these locations is written back to DRAM when those data areevicted from the cache [22]. The energy estimation model is presented below:

E = Etspm + Etic + Etdram

E = Nspmr ∗ Espmr (1)

+ Nspmw ∗ Espmw (2)

+

Nicr∑k=1

[hik ∗ Eicr + (1− hik) ∗ [Edramr + Eicw

+(1−WPi) ∗DBik ∗ (Eicr + Edramw)]] (3)

+

Nicw∑k=1

[WPi ∗ Edramw + hik ∗ Eicw + (1−WPi) ∗

(1− hik) ∗ [Eicw + DBik ∗ (Eicr + Edramw)]] (4)

+ Ndramr ∗ Edramr (5)

+ Ndramw ∗ Edramw (6)

Lines (1) and (2) represent respectively the total energy consumed duringa reading and during a writing from/into SPM. Lines (3) and (4) represent

respectively the total energy consumed during a reading and during a writingfrom/into instruction cache. When, lines (5) and (6) represent respectively thetotal energy consumed during a reading and during a writing from/into DRAM.The various terms used in this energy model are explained in Table 1.

Table 1. List of terms.

Term Meaning

Etspm Total energy consumed in SPM.

Etic Total energy consumed in instruction cache.

Etdram Total energy consumed in DRAM.

Espmr Energy consumed during a reading from SPM.

Espmw Energy consumed during a writing into SPM.

Nspmr Reading access number to SPM.

Nspmw Writing access number to SPM.

Eicr Energy consumed during a reading from instruction cache.

Eicw Energy consumed during a writing into instruction cache.

Nicr Reading access number to instruction cache.

Nicw Writing access number to instruction cache.

Edramr Energy consumed during a reading from DRAM.

Edramw Energy consumed during a writing into DRAM.

Ndramr Reading access number to DRAM.

Ndramw Writing access number to DRAM.

WPi The considered cache write policy: WT or WB.In case of WT, WPi = 1 else, in case of WB then WPi = 0.

DBik Dirty Bit used in case of WB to indicate during the accessk if the instruction cache line has been modified before(DBi = 1) or not (DBi = 0).

hik Type of the access k to the instruction cache.In case of cache hit, hik = 1. In case of cache miss, hik = 0.

6 Experimental Results

For our experiments, we consider a memory architecture composed by an SPM,a DRAM and an instruction cache memory. We performed experiments witheleven benchmarks from six different suites. Table 2 gives a description of them.

In order to compute the energy cost of the system for each configuration, weused our developed energy consumption estimation model presented in Section 5.This model is based on the OTAWA framework [9] to collect information aboutnumber of accesses and on the energy consumption estimation tool CACTI [26]in order to collect information about energy per access to each kind of memory.OTAWA (Open Tool for Adaptative WCET Analysis) is a framework of C++classes dedicated to static analyses of programs in machine code and to the com-putation of Worst Case Execution Time (WCET ). OTAWA is freely available(under the LGPL license) and is designed to support different architectures like

Table 2. List of Benchmarks.

Benchmark Suite Description

Sha MiBench [12] The secure hash algorithm that produces a 160-bitmessage digest for a given input.

Bitcount MiBench Tests the bit manipulation abilities of a processor bycounting the number of bits in an array of integers.

Fir SNU-RT Finite impulse response filter (signal processingalgorithms) over a 700 items long sample.

Jfdctint SNU-RT Discrete-cosine transformation on 8x8 pixel block.

Adpcm Malardalen Adaptive pulse code modulation algorithm.

Cnt Malardalen Counts non-negative numbers in a matrix.

Compress Malardalen Data compression using lzw.

Djpeg Mediabenchs JPEG decoding.

Gzip Spec 2000 Compression.

Nsichneu Wcet Benchs Simulate an extended Petri net. Automaticallygenerated code with more than 250 if-statements.

Statemate Wcet Benchs Automatically generated code.

PowerPC, ARM or M68HC. In our case, we focus on PowerPC architectures.Our presented Hybrid Heuristics and the BEH strategy have been implementedwith the C language on a PC Intel Core 2 Duo, with a 2.66 GHz processor and3 Gbytes of memory running under Mandriva Linux 2008.

As the standard benchmarks contain uniform data leading to a big numberof local minima, and in order to put some trouble in the BEH strategy andsee if it gives the best solution, we decided to modify slightly our benchmarks.Concretely, our modification consists in adding one variable to each benchmark.This variable performs an output (so did not change the benchmarks features)and is big enough to provide energy savings if it is chosen for an SPM allocation.We referred to a modified benchmark as benchmarkCE.

In [15], we have shown that TS alone, performs as well as BEH on the stan-dard benchmarks presented in Table 2. But, with the modified benchmarks, bothTS and BEH gave the same energy savings as with the standard benchmarksso both did not give the optimal solution. In contrast, in [14], we have shownthat GA alone, outperforms BEH in terms of energy savings with the modifiedbenchmarks but did’nt do as well as BEH with the standard benchmarks. So, bycombining both TS and GA, we look for optimizing the energy usage neverthelesswe are considering the standard and the modified benchmarks. Hence, we applyeach heuristic on both standard and modified benchmarks. In all experiments,30 different executions for each heuristic are generated as the solution given dif-fers from an execution to another. Heuristic Mean refers to the average resultsobtained on 30 executions of the considered heuristic. In contrast, Heuristic Bestrefers to the best solution obtained from the thirty executions performed on theconsidered heuristic. For BEH, the solution founded does not change from arunning to another one. In the following, as the shapes of curves obtained whencomparing BEH and Hybrid Heuristics on the benchmarks assuming the WT or

WB cache policy are slightly the same, just the results obtained with the WBcache policy are plotted. Nearly, the same amounts are recorded for the WTmode. One can show that EWTmode 6= EWBmode.

TS - GA We start by investigating the performances of combining TS firstwith GA considering the standard benchmarks. TSGA1 represents the resultsobtained for Pm = 0.1, Pc = 0.5 and for a single point crossover. TSGA2 repre-sents the results obtained for Pm = 0.1, Pc = 0.5 and for a two points crossover.Figure 1 presents the results obtained when comparing BEH and our TSGAHeuristics on the standard benchmarks assuming the WB cache policy.

Fig. 1. Energy consumed by standard benchmarks with WB mode.

As it can be seen from this figure, TSGA Heuristics achieve same energy sav-ings as BEH on most of benchmarks. For some others, at worst, TSGA1 Meanconsumes 6,65% (Djpeg) more energy than BEH, TSGA1 Best consumes 1,03%(Gzip) more energy than BEH, TSGA2 Mean consumes 8,60% (Djpeg) moreenergy than BEH and TSGA2 Best consumes 2,40% (Djpeg) more energy thanBEH in the WB mode. It has been shown that saving more energy is not possibleas BEH already gives the optimal solution thanks to a developed backtrackingalgorithm.

Thus, we continue by investigating the performances of combining TS firstwith GA considering the modified benchmarks. In this case, TSGA1 and TSGA2give similar results. We therefore call TSGA (1,2) the common value of theirexecutions. Figure 2 presents the results obtained when comparing BEH and ourTSGA Heuristic on the modified benchmarks assuming the WB cache policy.

Fig. 2. Energy consumed by modified benchmarks with WB mode.

As it can be seen from this figure, TSGA (1,2) achieves better performancesthan BEH on energy savings on the modified benchmarks. In fact, these resultsshow that TSGA (1,2) consumes from 76.23% (StatemateCE) up to 98.92%(ShaCE) less energy than BEH in the WB mode. For BEH, although we usedthe modified benchmarks, we still obtain the same energy savings as with thestandard benchmarks. The BEH strategy didn’t give the optimal solution any-more as one could expect as proven by our developed backtracking algorithm,but TSGA (1,2) gives it.

GA - TS Now, we investigate the performances of combining GA first with TSconsidering the standard benchmarks. GATS3 represents the results obtainedfor Pm = 0.1, Pc = 0.5 and for a single point crossover. GATS4 represents theresults obtained for Pm = 0.1, Pc = 0.5 and for a two points crossover. Figure3 presents the results obtained when comparing BEH and our GATS Heuristicson the standard benchmarks assuming the WB cache policy.

As it can be seen from this figure, GATS Heuristics achieve same energy sav-ings as BEH on most of benchmarks. For some others, at worst, GATS3 Meanconsumes 3,81% (Gzip) more energy than BEH, GATS3 Best consumes 2,20%(Gzip) more energy than BEH, GATS4 Mean consumes 3,68% (Gzip) more en-ergy than BEH and GATS4 Best consumes 2,07% (Gzip) more energy than BEHin the WB mode. It has been shown that saving more energy is not possible asBEH already gives the optimal solution thanks to a developed backtracking al-gorithm.

Thus, we continue by investigating the performances of combining TS firstwith GA considering the modified benchmarks. In this case, GATS3 and GATS4


give similar results. We therefore call GATS (3,4) the common value of theirexecutions. Figure 4 presents the results obtained when comparing BEH and ourTSGA Heuristic on the modified benchmarks assuming the WB cache policy.


As it can be seen from this figure, GATS (3,4) achieves better performancesthan BEH on energy savings on the modified benchmarks. In fact, these results

show that GATS (3,4) consumes from 76.23% (StatemateCE) up to 98.92%(ShaCE) less energy than BEH in the WB mode. For BEH, although we usedthe modified benchmarks, we still obtain the same energy savings as with thestandard benchmarks. The BEH strategy didn’t give the optimal solution any-more as one could expect as proven by our developed backtracking algorithm,but GATS (3,4) gives it.

GA Hybrid We continue by investigating the energy performances of the GAHybrid Heuristic considering the standard benchmarks. GA Hybrid5 representsthe results obtained for Pm = 0.1, Pc = 0.5 and for a single point crossover.GA Hybrid6 represents the results obtained for Pm = 0.1, Pc = 0.5 and fora two points crossover. In this case, GA Hybrid5 and GA Hybrid6 give similarresults. We therefore call GA Hybrid (5,6) the common value of their executions.Figure 5 presents the results obtained when comparing BEH and our GA HybridHeuristic on the standard benchmarks assuming the WB mode.


As it can be seen from this figure, GA Hybrid (5,6) achieves same energysavings as BEH on most of benchmarks. For some others, at worst, GA Hybrid(5,6) consumes 7,67% (Djpeg) more energy than BEH in the WB mode. It hasbeen shown that saving more energy is not possible as BEH already gives theoptimal solution thanks to a developed backtracking algorithm.

Thus, we continue by investigating the performances of the GA HybridHeuristic considering the modified benchmarks. Figure 6 presents the results

obtained when comparing BEH and our GA Hybrid Heuristic on the modifiedbenchmarks assuming the WB cache policy.


As it can be seen from this figure, GA Hybrid (5,6) achieves better energysavings than BEH on the modified benchmarks. In fact, these results show thatGA Hybrid (5,6) consumes from 76.23% (StatemateCE) up to 98.92% (ShaCE)less energy than BEH in the WB mode. For BEH, although we used the mod-ified benchmarks, we still obtain the same energy savings as before. The BEHstrategy didn’t give the optimal solution anymore as one could expect as provenby our developed backtracking algorithm, but GA Hybrid (5,6) gives it.

Results analysis First, concerning the standard benchmarks, results show thatwhen we are looking for the optimization of the average results obtained throughall the executions, it is better to consider the TSGA Heuristic. In fact, thiscombinatorial heuristic is the one which consumes less mean energy than all otherHybrid Heuristics when compared to BEH. In contrast, when we are looking fora best solution without taking care of the average results, it is better to considerthe GATS Heuristic. In fact, this combinatorial heuristic is the one which findsthe best solution for a reduced energy consumption when compared to BEH.

Second, concerning the modified benchmarks, the fact that BEH didn’t givethe optimal solution anymore is normal. This is due to the fact that BEH is asort of access number/size of data as we explain in Section 2. The variable weadd in each benchmark has a given access number/size (this ratio depends onthe data profiling of each benchmark) so that this variable is not a priority in

the sorting made by the BEH method. This is done on purpose so that when itwill be the turn of this variable to be treated by BEH, the remaining space inthe SPM will not be enough to take this variable and hence it will be allocatedin main memory. Whereas, an optimal solution would be to start by allocatingthis variable first into the SPM. In contrast, all our Hybrid Heuristics are robustenough to overcome this problem and find the optimal solution as it can benoticed. In fact, results show that all the proposed Hybrid Heuristics consumefrom 76.23% (StatemateCE) up to 98.92% (ShaCE) less energy than BEH.

7 Conclusion and Perspectives

In this paper, we have proposed new Hybrid Heuristics for reducing memoryenergy consumption in embedded systems which are more efficient than BEH.In fact, our Hybrid Heuristics manage to consume nearly from 76% up to 98%less memory energy than BEH in different memory configurations. In additionour Hybrid Heuristics are easy to implement and do not require list sorting(contrary to BEH). In future work, we plan to investigate the same problemwith relaxing the memory constraints by considering random memory sizes onone hand. On the other hand, we plan to explore other evolutionary heuristics(Markov Decision Processes, Simulated Annealing, ANT method, etc.).

8 Acknowledgements

This work is financed by the french national research agency (ANR) in the FutureArchitectures program.

References

1. J. Absar and F. Catthoor. Analysis of scratch-pad and data-cache performanceusing statistical methods. In ASP-DAC, pages 820–825, 2006.

2. M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. TheNext Generation of Intel IXP Network Processors. Intel Technology Journal, 6(3),Aug. 2002.

3. O. Avissar, R. Barua, and D. Stewart. An optimal memory allocation schemefor scratch-pad-based embedded systems. Transaction. on Embedded ComputingSystems., 1(1):6–26, 2002.

4. R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpadmemory: design alternative for cache on-chip memory in embedded systems. InCODES, pages 73–78, New York, NY, USA, 2002. ACM Press.

5. H. Ben Fradj, A. El Ouardighi, C. Belleudy, and M. Auguin. Energy aware mem-ory architecture configuration. volume 33, pages 3–9. ACM SIGARCH ComputerArchitecture News, 2005.

6. L. Benini and G. De Micheli. System-level power optimization: Techniques andtools. In ISLPED-99:ACM/IEEE, pages 288–293, 1999.

7. L. Benini and G. De Micheli. System-level power optimization: techniques andtools. IEEE Design and Test, 17(2):74–85, 2000.

8. D. Brash. The ARM architecture Version 6 (ARMv6). In ARM Ltd., January2002. White Paper.

9. H. Casse and C. Rochange. OTAWA, Open Tool for Adaptative WCET Analysis.In DATE, Nice, April 2007. Poster session.

10. M. Gendreau. An introduction to tabu search, volume 57. Kluwer Academic Pub-lishers, Boston, MA, 2003.

11. R. Graybill and R. Melhem. Power aware computing. Kluwer Academic Publishers,Norwell, MA, USA, 2002.

12. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B.Brown. Mibench: A free, commercially representative embedded benchmark suite.In WWC ’01: Proceedings of the Workload Characterization, pages 3–14, Washing-ton, DC, USA, 2001. IEEE Computer Society.

13. U. Pferschy H. Kellerer and D. Pisinger. Knapsack Problems. Springer, Berlin,Germany, 2004.

14. M. Idrissi Aouad, R. Schott, and O. Zendra. Genetic Heuristics for ReducingMemory Energy Consumption in Embedded Systems. Submitted.

15. M. Idrissi Aouad, R. Schott, and O. Zendra. A Tabu Search Heuristic for Scratch-Pad Memory Management. In ICSET’2010: Proceedings of International Confer-ence on Software Engineering and Technology, 2010. To appear.

16. M. Idrissi Aouad and O. Zendra. A Survey of Scratch-Pad Memory ManagementTechniques for low-power and -energy. In 2nd ECOOP Workshop on Implemen-tation, Compilation, Optimization of Object-Oriented Languages, Programs andSystems (ICOOOLPS’2007), July 2007.

17. ITRS. System drivers, 2007. http://www.itrs.net/Links/2007ITRS/2007 Chapters/2007 SystemDrivers.pdf.

18. P. R. Panda, N. Dutt, and A. Nicolau. Efficient utilization of scratch-pad memoryin embedded processor applications. In DATE, 1997.

19. S. N. Sivanandam and S. N. Deepa. Introduction to Genetic Algorithms. SpringerPublishing Company, Incorporated, 2007.

20. J. Sjodin, B. Froderberg, and T. Lindgren. Allocation of global data objects inon-chip ram. In Workshop on Compiler and Architectural Support for EmbeddedComputer Systems. ACM, December 1998.

21. S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and dataobjects to scratchpad for energy reduction. In DATE, page 409. IEEE ComputerSociety, 2002.

22. A. Tanenbaum. Architecture de l’ordinateur 5e edition. November 2005.23. D. N. Truong, F. Bodin, and A. Seznec. Improving cache behavior of dynamically

allocated data structures. In IEEE PACT, pages 322–329, 1998.24. S. Udayakumaran, B. Narahari, and R. Simha. Application specific memory parti-

tioning for low power. In ACM COLP 2002 (Compiler and Operating Systems forLow Power). ACM Press, 2002.

25. L. Wehmeyer, U. Helmig, and P. Marwedel. Compiler-optimized usage of parti-tioned memories. In WMPI, 2004.

26. S.J.E. Wilton and N.P. Jouppi. Cacti: An enhanced cache access and cycle timemodel. IEEE Journal of Solid-State Circuits, 1996.

Hybrid Heuristics for Optimizing Energy Consumption in ...

Documents

Transcript of Hybrid Heuristics for Optimizing Energy Consumption in ...