[IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) -...

6
Energy-efficient Multi-task Scheduling based on MapReduce for Cloud Computing Xiaoli Wang School of Computer Science and Technology Xidian University Xi’an, Shaanxi, China E-mail: [email protected] Yuping Wang School of Computer Science and Technology Xidian University Xi’an, Shaanxi, China E-mail: [email protected] Abstract—For the problem that the energy efficiency of the cloud computing data center is low, from the point of view of the energy efficiency of the servers, we propose a new energy- efficient multi-task scheduling model based on Google’s massive data processing framework. To solve this model, we design a practical encoding and decoding method for the individuals, and construct an overall energy efficiency function of the servers as the fitness value of the individual. Meanwhile, in order to accelerate the convergent speed and enhance the searching ability of our algorithm, a local search operator is introduced. Finally, the experiments show that the proposed algorithm is effective and efficient. Keywords-Energy-efficient; multi-task; scheduling; Cloud computing; MapReduce I. INTRODUCTION Cloud computing [1] is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. As a new business model while being favored by providing services such as on-demand self-service, broad network access and rapid elasticity, cloud computing faces some new challenges. One of the prominent issues is the energy efficiency of data centers. According to Amazon’s CEMS project [2] , based on a 3- year amortization schedule for servers and 15-year amortization schedule for other infrastructure, the monthly capital investment of the data center is illustrated in figure 1. As can be seen from this figure, energy-related costs amount to 41.62% of the total. In other words, the largest investment to build data centers for cloud computing is not only to purchase thousands of server equipment, but also to buy the distribution and cooling infrastructure and to pay the bill for energy consumption of all these facilities. In order to illustrate the importance of energy consumption for data centers, we introduce the concept, power usage effectiveness (PUE), which was developed by a consortium called The Green Grid. Definition 1 Power Usage Effectiveness [3] , is the ratio of total amount of power used by a data center facility to the power delivered to computing equipment. It is a measure of how efficiently a computer data center uses its power. power equipment IT power facility Total PUE = Where, the IT Equipment Power is the power delivered to the critical load, the servers in the data center, while the total facility power in addition to the servers also includes other energy facilities, specifically, the energy consumed by distribution and cooling infrastructure which accounts for the main part. A PUE of 2.0 states that for every watt delivered to the servers, we dissipate 1 watt in cooling system and power distribution. In the Environmental Protection Agency (EPA)’s report [4] to the U.S. Congress, it is expected that equipment efficiency improvements alone, with current practices, could result in a 2011 PUE of 1.9. Beyond that, the EPA predicted that "state-of-the-art" data centers could reach a PUE of 1.2. By now, Google has claimed that their data centers, on average for all, have exceeded the EPA's most optimistic scenario [5] , which is of course accompanied by doubt voices from other cloud computing providers [2] . $2,997,090 $1,296,902 $284,686 $1,042,440 Servers Power & Cooling Infrastructure Power Other Infrastructure Figure 1. Monthly costs of the data center To reduce the energy consumption of data centers and improve energy efficiency, many scholars have done some related research, such as literatures [6-10]. Overall, we can make efforts in three aspects: (1) Reduce power loss during distribution. However, the statistics from Amazon’s CEMS project show that for a data center with a PUE of 1.7, an overall power distribution loss only accounts for 8% of total energy consumption. Even with better technology, the reduction will not exceed 8% [2] . (2) Reduce energy consumed by cooling system. For example, you can use Google’s “free cooling” mode. Google claims that there is no cooling equipment in its data centers in Belgium [11] . The climate in Belgium will support free cooling almost year-round. If the weather gets hot, Google says it will turn off equipment as needed in Belgium and shift computing load to other data centers. Although the “free cooling” mode can reduce the energy consumed by cooling 2011 Seventh International Conference on Computational Intelligence and Security 978-0-7695-4584-4/11 $26.00 © 2011 IEEE DOI 10.1109/CIS.2011.21 57

Transcript of [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) -...

Page 1: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

Energy-efficient Multi-task Scheduling based on MapReduce for Cloud Computing

Xiaoli Wang School of Computer Science and Technology

Xidian University Xi’an, Shaanxi, China

E-mail: [email protected]

Yuping Wang School of Computer Science and Technology

Xidian University Xi’an, Shaanxi, China

E-mail: [email protected] Abstract—For the problem that the energy efficiency of the cloud computing data center is low, from the point of view of the energy efficiency of the servers, we propose a new energy-efficient multi-task scheduling model based on Google’s massive data processing framework. To solve this model, we design a practical encoding and decoding method for the individuals, and construct an overall energy efficiency function of the servers as the fitness value of the individual. Meanwhile, in order to accelerate the convergent speed and enhance the searching ability of our algorithm, a local search operator is introduced. Finally, the experiments show that the proposed algorithm is effective and efficient.

Keywords-Energy-efficient; multi-task; scheduling; Cloud computing; MapReduce

I. INTRODUCTION Cloud computing[1] is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. As a new business model while being favored by providing services such as on-demand self-service, broad network access and rapid elasticity, cloud computing faces some new challenges. One of the prominent issues is the energy efficiency of data centers.

According to Amazon’s CEMS project[2], based on a 3-year amortization schedule for servers and 15-year amortization schedule for other infrastructure, the monthly capital investment of the data center is illustrated in figure 1. As can be seen from this figure, energy-related costs amount to 41.62% of the total. In other words, the largest investment to build data centers for cloud computing is not only to purchase thousands of server equipment, but also to buy the distribution and cooling infrastructure and to pay the bill for energy consumption of all these facilities. In order to illustrate the importance of energy consumption for data centers, we introduce the concept, power usage effectiveness (PUE), which was developed by a consortium called The Green Grid.

Definition 1 Power Usage Effectiveness [3], is the ratio of total amount of power used by a data center facility to the power delivered to computing equipment. It is a measure of how efficiently a computer data center uses its power.

powerequipmentITpowerfacilityTotalPUE =

Where, the IT Equipment Power is the power delivered to the critical load, the servers in the data center, while the total facility power in addition to the servers also includes other energy facilities, specifically, the energy consumed by distribution and cooling infrastructure which accounts for the main part. A PUE of 2.0 states that for every watt delivered to the servers, we dissipate 1 watt in cooling system and power distribution. In the Environmental Protection Agency (EPA)’s report [4] to the U.S. Congress, it is expected that equipment efficiency improvements alone, with current practices, could result in a 2011 PUE of 1.9. Beyond that, the EPA predicted that "state-of-the-art" data centers could reach a PUE of 1.2. By now, Google has claimed that their data centers, on average for all, have exceeded the EPA's most optimistic scenario [5], which is of course accompanied by doubt voices from other cloud computing providers [2].

$2,997,090$1,296,902

$284,686

$1,042,440

Servers

Power & CoolingInfrastructure

Power

OtherInfrastructure

Figure 1. Monthly costs of the data center

To reduce the energy consumption of data centers and improve energy efficiency, many scholars have done some related research, such as literatures [6-10]. Overall, we can make efforts in three aspects:

(1) Reduce power loss during distribution. However, the statistics from Amazon’s CEMS project show that for a data center with a PUE of 1.7, an overall power distribution loss only accounts for 8% of total energy consumption. Even with better technology, the reduction will not exceed 8% [2].

(2) Reduce energy consumed by cooling system. For example, you can use Google’s “free cooling” mode. Google claims that there is no cooling equipment in its data centers in Belgium [11]. The climate in Belgium will support free cooling almost year-round. If the weather gets hot, Google says it will turn off equipment as needed in Belgium and shift computing load to other data centers. Although the “free cooling” mode can reduce the energy consumed by cooling

2011 Seventh International Conference on Computational Intelligence and Security

978-0-7695-4584-4/11 $26.00 © 2011 IEEE

DOI 10.1109/CIS.2011.21

57

Page 2: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

system, it has a key prerequisite that the providers have sufficient enough financial and technical strength to run several data centers around the world and the data can backup across those data centers with seamless migration of computing load. This is hardly possible for majority of cloud computing providers.

(3) Improve energy efficiency of servers. Say a data center with a PUE of 2.0, only 50% of the power can be used on severs. Therefore, it becomes critical whether servers have used all the energy to complete the workload. We are aware that low energy utilization of a server is mainly due to its idle state caused by low CPU utilization. Even at a very low load, such as 10% CPU utilization, the power consumed is over 50% of the peak power[12]. Thus, the energy efficiency of servers plays an important role for the entire energy efficiency of the data center.

This paper mainly focuses on how to improve the energy efficiency of servers through appropriate scheduling strategies. We propose a new energy-efficient multi-task scheduling model based on MapReduce. As the basics of our model, section II highlights Google’s MapReduce framework; Section III gives the mathematical description of the problem and the corresponding model. In order to solve this model, a genetic algorithm is designed in Section IV. Finally, simulation experiments show the proposed algorithm is effective and efficient in Section V.

II. MAPREDUCE FRAMEWORK MapReduce[13] is Google’s massive data processing

framework. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Figure 2 shows the overall flow of a MapReduce operation. When the user program calls the MapReduce function, the following sequence of actions occurs:

Figure 2. Overall flow of a MapReduce operation

Step1. The MapReduce library first splits the input files into M pieces of typically 64 megabytes (MB) per piece.

Step2. The master picks idle workers and assigns each one a map task or a reduce task.

Step3. A worker who is assigned a map task parses key/value pairs out of the input data and passes each pair to the user-defined Map function.

Step4. The locations of these intermediate pairs on the local disk are passed back to the master.

Step5. When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers.

Step6. The reduce worker passes the key and the corresponding set of intermediate values to the user's Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition.

Step7. When all map tasks and reduce tasks have been completed, the master wakes up the user program.

III. ENERGY-EFFICIENT MULTI-TASK SCHEDULING MODEL BASED ON MAPREDUCE

The problem of improving the energy efficiency of servers can not be solved as easy as balancing loads among servers so as to make all the servers’ CPU utilization reach 100%. Instead, there exists an optimal performance and energy point for each server [12]. Energy consumption per task is influenced by the CPU utilization of servers. When the CPU utilization is low, idle power is not amortized effectively and hence the energy per task is high. At high CPU utilization on the other hand, energy consumption is high due to the competition for resources among tasks, which leads to performance degradation and longer execution time. Typical variation of energy per task with CPU utilization can be expected to result in a “U”-shaped curve. Therefore, it can be assumed that the servers achieve the maximum energy efficiency when all servers running at its optimal performance and power point.

Problem description: Assuming that there are N servers and the current CPU utilization of sever k is kCS , and its optimal point is kCO ; there are F projects

},,,{ 21 FAAA �=Α and the input data of project qA is qD which will be divided into qm splits, so there are

�=

=F

qqm

1m splits. To ensure the reliability of data, each

split will choose three different servers for storage. We use a 3×m matrix P to represent the storage location, where

element ijp indicates the storage location of split i . Provided that the CPU utilization required for each map task of project qA is qCM , and for each reduce task is qCR .

The problem is how to assign these

��==

+=F

qq

F

qq rmv

11tasks on N servers, so that the energy

efficiency of all servers reaches the highest point. Here we give the single-objective optimization model for

the energy-efficient multi-task scheduling problem:

� ��= ==

×+×+−N

k

F

qq

qk

F

qq

qkkk CRNRCMNMCSCO

1

2

11)))()(((min

s.t. (1) for scheduling scheme S ,

���

++=∈=∈

.,,2,1],,1[.,,2,1},,,{ 321

vmmiforNsmiforppps

i

iiii�

58

Page 3: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

(2) ,|{| kssNM iiqk ==

.|},,2,11

0

1

0

1

0q

q

jj

q

jj

q

jj mmmmi +++= ���

=

=

=�

where 00 =m , Nk �,2,1= and Fq ,,2,1 �= .

(3) ,|{| kssNR iiqk ==

.|},,2,11

0

1

0

1

0q

q

jj

q

jj

q

jj rrmrmrmi ++++++= ���

=

=

=�

where 00 =r Nk �,2,1= and Fq ,,2,1 �= .

(4) .1)()(11

≤×+×+ ��==

F

qq

qk

F

qq

qkk CRNRCMNMCS

where Nk �,2,1= , ]1,0[∈qCM , ]1,0[∈qCR and Fq ,,2,1 �= .

Where, constraints (1) expresses that if a map task i is assigned to server is , then this server must have stored the corresponding input data. Constrains (2) and (3) computes the number of map tasks q

kNM and reduce tasks qkNR of

project qA assigned to server k respectively. Constrains (4) indicates that the CPU utilization of any server should not exceed 100% after the task scheduling.

IV. AN ENERGY-EFFICIENT MULTI-TASK SCHEDULING ALGORITHM BASED ON MAPREDUCE

Task scheduling is an NP problem, and the genetic algorithm based on evolutionary theory is very suitable for complex optimization problems. Here we give the energy-efficient multi-task scheduling algorithm in detail.

A. Encoding and decodingn. In the genetic algorithm, the encoding method is of great

significance. We adopt the integer coding and use vector ),,,( 21 vsssS �= as an individual to represent a scheduling

scheme. To compute its fitness value, we need to decode the individual first. The individual decoding method as follows:

Algorithm 4-1 Step1. let 0=q

kNM and 0=qkNR where Nk �,2,1=

and qq �,2,1= . Empty set kM and kR . Step2. For each element is of individual S , set isk = and

.00 =m For each project qA , where Fq ,,2,1 �= , if

q

q

jj

q

jj

q

jj mmmmi +++= ���

=

=

=

1

0

1

0

1

0,,2,1 � q

kNM plus 1

and put i into set kM otherwise,

q

q

jj

q

jj

q

jj rrmrmrmi ++++++= ���

=

=

=

1

0

1

0

1

0,,2,1 � q

kNR

plus 1, put i into set kR .

B. Modified operator As the CPU utilization of each server can not exceed

100% after task scheduling and the population initialization can not guarantee this, so the new generated individuals may need to be modified. The following shows the specific steps for the modified operator:

Algorithm 4-2 Step1. Decode individual S according to Algorithm 4-1. Step2. For Nk �2,1= ,

If 1)()(11

>×+×+ ��==

F

qq

qk

F

qq

qkk CRNRCMNMCS , go to

step4; otherwise, if Nk > , stop. Step4. If 0≤− kk CSCO , let

��==

×+×=F

qq

qk

F

qq

qk CRNRCMNMcut

11)()( ; Otherwise, let

.)()(11

k

F

qq

qk

F

qq

qkk COCRNRCMNMCScut −×+×+= ��

==

Step 5. Remove excess map tasks For kNMx �,2,1= , take the xth map task i from set kM . There exists an integer

],1[ Fs ∈ which satisfies ��+

==≤≤

1

11

s

qq

s

qq mim .

If 0<− sCMcut , go to step 8; otherwise, reassign task i on a new server w which satisfies kw ≠ and },,{ 321 iii pppw∈ . Let wsi = . Set 1+= xx and sCMcutcut −= .

Step6. Remove excess reduce tasks: For kNRx �,2,1= , take the xth reduce task i from set kR . There exists an

integer ],1[ Fs ∈ which satisfies ��+

==+≤≤+

1

11

s

qq

s

qq rmirm .

If 0<− sCRcut , then go to step1; otherwise, reassign this task on a new server w which satisfies ],1[ Nw∈ and kw ≠ . Let wsi = . Set 1+= xx and sCRcutcut −= .

C. Crossover operator We adopt the multi-point crossover operator for the

evolution of individuals. Take two projects 2=F as an example, and the crossover process is as follows:

Algorithm 4-3 Step1. Say the crossover probability is pc . For each

individual in the population, generate a real number ]1,0[∈q . If pcq <= , then put this individual into pl .

Step2. Select two individuals 1S and 2S from pl without replacement. Generate four random integers ],1[1 1mc ∈ , ],1[2 1 mmc +∈ , ],1[3 1rmmc ++∈ and

59

Page 4: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

],1[4 1 vrmc ++∈ as the crossover points. Generate new

individuals 3S and 4S as follows:

��

���

==

++++

++++),,,,,,,,,,,,,,(),,,,,,,,,,,,,,(

2214

24

213

23

212

22

211

21

21

2

1114

14

113

13

112

12

111

11

11

1

vcccccccc

vccccccccssssssssssSssssssssssS

������������������

� � � � �

��

���

==

++++

++++),,,,,,,,,,,,,,(),,,,,,,,,,,,,,(

2214

14

113

23

212

12

111

21

21

4

1114

24

213

13

112

22

211

11

11

3

vcccccccc

vccccccccssssssssssSssssssssssS

������������������

Step3. Modify and locally search individuals 3S and 4S according to Algorithm 4-2 and Algorithm 4-5 respectively.

D. Mutation operator We use single-point mutation operator for the evolution

of individuals. The mutation process is as follows: Algorithm 4-4 Step1. Suppose that the mutation probability is pm . For

the individual S , generate a real number ]1,0[∈q . If pmq <= , go to step2; otherwise, stop.

Step2. Randomly generate an integer ],1[ vi ∈ . If mi ≤ , reassign this task to a new server w which satisfies isw ≠ and },,{ 321 iii pppw∈ . Let wsi = ; otherwise, randomly generate an integer ],1[ Nk ∈ that satisfies isk ≠ . Let ksi = . Modify and locally search the new generated individual according to Algorithm 4-2 and Algorithm 4-5 respectively.

E. Local search operator In order to accelerate the convergent speed and enhance

the searching ability of the proposed algorithm, a local search operator is designed in this paper.

Algorithm 4-5 Step1. Say the fitness value of S is f .Let SS =′ . Step2. Decode individual S′ according to Algorithm 4-1. Step3. Among all the servers, there exists a server k with

the highest CPU utilization. If 0<− kk CSCO ,

let ��==

×+×=F

qq

qk

F

qq

qk CRNRCMNMcut

11)()( ; otherwise,

let .)()(11

k

F

qq

qk

F

qq

qkk COCRNRCMNMCScut −×+×+= ��

==

Step4. Remove excess map tasks: For kNMx �,2,1= , take the xth map task i from set kM . There exists an integer

],1[ Fs ∈ which satisfies ��+

==≤≤

1

11

s

qq

s

qq mim .

If 0<− sCMcut , go to step5; otherwise, reassign task i on a new server w which satisfies kw ≠ and },,{ 321 iii pppw∈ . Let wsi = and sCMcutcut −= .

Step5. Remove excess reduce tasks: For kNRx �,2,1= , take the xth reduce task i from set kR . There exists an

integer ],1[ Fs ∈ which satisfies ��+

==+≤≤+

1

11

s

qq

s

qq rmirm .

If 0<− sCRcut , compute the fitness value of individual S′ ; otherwise, reassign task i on a new server w which satisfies ],1[ Nw∈ and kw ≠ . Let wsi = and sCRcutcut −= .

Step6. If ff <′ , let SS ′= and go to step2; otherwise, let SS =′ . Decode individual S′ according to Algorithm 4-1.

Step7. Among all the servers, there exists a server k with the lowest CPU utilization. Let

��==

×+×+−=F

qq

qk

F

qq

qkkk CRNRCMNMCSCOadd

11)()( .

Step8. Add map tasks: Denote all tasks which can be assigned on server k as set kMM . There exists a map task

kMMp ∈ and ks p ≠′ . For this task, pick an integer

],1[ Fs ∈ which satisfies ��+

==≤≤

1

11

s

qq

s

qq mpm .

If 0<− pCMadd , go to step9; otherwise, let ksp =′

and pCMaddadd −= , go to step8. Step9. Add reduce tasks: There exists task

],1[ vmp +∈ and ks p ≠′ . For this task, pick an integer

],1[ Fs ∈ which satisfies ��+

==+≤≤+

1

11

s

qq

s

qq rmprm .

If 0<− pCRadd , go to step10; otherwise, let

ksp =′ and pCRaddadd −= , go to step9.

Step10. If ff <′ , then S′ is better than S . Let SS ′= , go to step7; otherwise, stop.

F. An energy-efficient multi-task scheduling algorithm based on MapReduce Algorithm 4-6 Step1. Initializing. Generate an initial population P .

Modify each individual according to Algorithm 4-2, and compute its fitness values. Set generation number 0=t .

Step2. Crossover. Execute crossover by Algorithm 4-3. The offspring set is denoted as 1P and compute each individual’s fitness value.

Step3. Mutation. Execute mutation on 1P by Algorithm 4-4. The offspring set is denoted as 2P and compute each individual’s fitness value.

Step4. Elitist strategy. Sort the individuals in set 21 PPP �� according to its fitness value, and select the

best k individuals directly to form the next generation population, while the others are selected by using roulette wheel method on the set 21 PPP �� .

Step5. If stopping criterion is not met, let 1+= tt , go to step2; otherwise, stop.

60

Page 5: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

V. EXPERIMENTS AND ANALYSIS

A. Parameter values Given that there are 200 servers in a data center and 2

projects need to be processed. The data sizes of the projects are 500G and 750G respectively, which means 80001 =m and 120002 =m . Suppose that the number of reduce tasks required for the two projects are 1801 =r and 2702 =r respectively. Based on a 3-year amortization schedule for servers in a data center, different server may have different optimal performance-energy point for how long it has been used. Here we assume that 1/3 servers have been used for one year with its optimal point of 0.9 and other 1/3 servers have been used for two years with its optimal point of 0.7, while the others with its optimal point of 0.5. Take random real numbers over ]35.0,0[ as servers’ initial CPU utilization value. We set some special initial states of servers as follows:

5.05 =CS ; 7.025 =CS ; 9.045 =CS ; 5.075 =CS ; 7.095 =CS ; 9.0115 =CS ; 5.0145 =CS ; 7.0165 =CS ; 9.0195 =CS ;

We adopt the following parameters values for genetic algorithm: population size 100=X ; crossover probability

6.0=pc ; mutation probability 02.0=pm ; elitist number 5=k and stop criterion 1000=t .

B. Simulation results and comparions We conduct three sets of comparative experiments

between the proposed algorithm and the general load balancing method based on MapReduce.

Comparison 1: Set 0055.01 =CM , 0046.02 =CM 0017.01 =CR and 0022.0=CR . The experimental results of

the proposed algorithm in this paper are shown in Figure 3(a), while the results of the load balancing method are shown in Figure 3(b).

From Figure 3(a), it can be seen that the proposed algorithm in this paper can effectively schedule multi-task on servers according to each server’s optimal performance-energy point. For the 5th, 75th and 145th servers with the same initial CPU utilization of 0.5, the proposed algorithm only assign tasks on the 5th and 75th servers, while the 145th server stays at its original state, since the optimal points of these three servers are 0.9, 0.7 and 0.5. Similarly, for the 25th, 95th and 165th servers with the same initial CPU utilization of 0.7, the proposed algorithm only assign tasks on the 25th server. Also, for the 45th, 115th and 195th servers with the same initial CPU utilization of 0.9, the proposed algorithm does not assign any tasks on them.

Comparison 2 Suppose that the input data is relatively small. Set 005.01 =CM , 004.02 =CM , 0015.01 =CR and

002.0=CR . The experimental results of the proposed algorithm are shown in Figure 4(a), while the results of the load balancing method are shown in Figure 4(b).

From Figure 4(a), it can be seen that even when the input data is relatively small, the proposed algorithm can effectively schedule multi-task on servers according to each

server’s optimal performance-energy point. Although the CPU utilizations of all servers are not able to reach their optimal points after the scheduling, each server’s CPU utilization is near as much as possible to its optimal point.

Comparison 3 Suppose that the input data is relatively large. Set 006.01 =CM , 005.02 =CM , 002.01 =CR and

0025.0=CR . The experimental results of the proposed algorithm in this paper are shown in Fig. 5-3(a), while the results of the load balancing method based on MapReduce are shown in Figure 5(b).

From Figure 5(a), it can be seen that even when the input data to be processed is relatively large, the proposed algorithm in this paper can effectively schedule multi-task on servers according to each server’s optimal performance-energy point. Although the CPU utilizations of all servers are beyond their optimal points after the scheduling, each server’s CUP utilization is near as much as possible to its optimal point.

VI. CONCLUSION This paper mainly focuses on how to improve the energy

efficiency of servers through appropriate scheduling strategies. We propose a new energy-efficient multi-task scheduling model based on MapReduce. Meanwhile, we design a practical encoding and decoding method for the individuals, and construct an overall energy efficiency function of the servers as the fitness value of the individual. Also, in order to accelerate the convergent speed and enhance the searching ability of our algorithm, a local search operator is introduced. Finally, the experiments show that the proposed algorithm is effective and efficient.

ACKNOWLEDGMENT This work was supported by National Natural Science

Foundation of China (No.60873099), the PhD Programs Foundation of Education Ministry of China (No.20090203110005) and the Fundamental Research Funds for the Central Universities (No. k50510030014).

REFERENCES [1] Mell P, Grance T. The NIST definition of cloud computing[J].

National Institute of Standards and Technology, 2009,53(6). [2] Hamilton J. Cooperative expendable micro-slice servers (CEMS): low

cost, low power servers for internet-scale services[C]. Citeseer. [3] Belady C. The Green Grid Data Center Power Efficiency Metrics:

PUE and DCiE [J]. White paper: Metrics & Measurements, 2007. [4] ENERGY S. Report to Congress on Server and Data Center Energy

Efficiency Public Law 109-431[J]. Public law, 2007,109:431. [5] Efficiency measurements .http://www.google.com/corporate

/datacenter/efficiency-measurements.html. [6] Beloglazov A, Buyya R. Energy efficient allocation of virtual

machines in cloud data centers, 2010[C]. IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[7] Berl A, Gelenbe E, Di Girolamo M, et al. Energy-efficient cloud computing[J]. The Computer Journal, 2010,53(7):1045.

[8] Buyya R, Beloglazov A, Abawajy J. Energy-Efficient management of data center resources for cloud computing: A vision, architectural elements, and open challenges[J]. Arxiv preprint arXiv:1006.0308, 2010.

61

Page 6: [IEEE 2011 Seventh International Conference on Computational Intelligence and Security (CIS) - Sanya, Hainan, China (2011.12.3-2011.12.4)] 2011 Seventh International Conference on

[9] Baliga J, Ayre R W A, Hinton K, et al. Green cloud computing: Balancing energy in processing, storage, and transport[J]. Proceedings of the IEEE, 2011,99(1):149-167.

[10] Barroso L A, H Lzle U. The datacenter as a computer: An introduction to the design of warehouse-scale machines[J]. Synthesis Lectures on Computer Architecture, 2009,4(1):1-108.

[11] Miller R. Google s Chiller-less Data Center [J]. Datacenterknowledge.com, 2009.

[12] Srikantaiah S, Kansal A, Zhao F. Energy aware consolidation for cloud computing, 2008[C]. USENIX Association.

[13] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1):107-113

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CPU

util

izat

ion

of e

ach

serv

er Initial stateProject 1Project 2

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CP

U u

tiliz

atio

n of

eac

h se

rver

Initial stateProject 1Project 2

Figure 3.(a) Comparative experiment 1 Figure 3.(b) Comparative experiment 1

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CP

U u

tiliz

atio

n of

eac

h se

rver Initial state

Project 1Project 2

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CP

U u

tiliz

atio

n of

eac

h se

rver Initial state

Project 1Project 2

Figure 4.(a) Comparative experiment 2 Figure 4.(b) Comparative experiment 2

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CP

U u

tiliz

atio

n of

eac

h se

rver Initial state

Project 1Project 2

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Server No.

CP

U u

tiliz

atio

n of

eac

h se

rver Initial state

Project 1Project 2

Figure 5.(a) Comparative experiment 3 Figure 5.(b) Comparative experiment 3

62