402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016...

14
402 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016 Cooling-Aware Energy and Workload Management in Data Centers via Stochastic Optimization Tianyi Chen, Xin Wang, Senior Member, IEEE, and Georgios B. Giannakis, Fellow, IEEE Abstract—While the quest of end users for fast and convenient Internet services grows steadily, energy-hungry data centers cor- respondingly expand in both numbers and scale - a fact that raises global warming and climate change concerns. In addition, high penetration of renewables, development of energy-efficient cooling facilities, and flexibility of distributed storage units, all call for a system-wide energy and workload management policy for future sustainable data centers. As implementing ofine management policies is practically infeasible due to complexity and the lack of future information, real-time management schemes are consid- ered here under a systematic framework. Leveraging stochastic optimization tools, a unified management approach is proposed allowing data centers to adaptively respond to intermittent avail- ability of renewables, variability of cooling efficiency, information technology (IT) workload shift, and energy price fluctuations under long-term quality-of-service (QoS) requirements. Meanwhile, it is rigorously established that when storage devices have sufficiently high capacity, or, the difference between electricity purchase and selling prices is small, the proposed algorithm yields a feasible and near-optimal management strategy without knowing the dis- tributions of the independently and identically distributed (i.i.d.) workload, renewable, and electricity price processes. Numerical results further demonstrate that the proposed algorithm works well not only for i.i.d. processes, but also in real-data scenarios, where the underlying randomness is highly correlated over time. Index Terms—Cooling-aware, cost minimization, data center, distributed storage, renewable generation, stochastic optimization. NOMENCLATURE A. Indices, numbers, and sets , Number and index of time slots. , Number and index of ‘must-serve’ workloads. , Number and index of delay-tolerant workloads. Manuscript received May 04, 2015; revised September 09, 2015; accepted October 17, 2015. Date of publication November 12, 2015; date of current version February 11, 2016. Work in this paper was supported in part by the U.S. National Science Foundation under Grants 1509040, 1508993, 1509005, 1423316, 1442686, and 1202135, in part by the China Recruitment Program of Global Young Experts, in part by the Program for New Century Excellent Tal- ents in University, in part by the Innovation Program of Shanghai Municipal Education Commission, and in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant 2012ZX03001013. The guest editor coordinating the review of this paper and approving it for publication was Prof. Jean-Christophe Pesquet. T. Chen and G. B. Giannakis are with the Department of Electrical and Com- puter Engineering and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: [email protected]; georgios@umn. edu). X. Wang is with the Key Laboratory for Information Science of Electromag- netic Waves (MoE), Department of Communication Science and Engineering, Fudan University, Shanghai 200433, China, and also with the Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTSP.2015.2500189 , Number and index of batteries. Iteration index of the dual subgradient ascent. Set of the total scheduling horizon. Feasible set of primal variables in partial Lagrangian per slot . B. Constants Demand of th ‘must-serve’ workloads per slot . Total demand of th delay-tolerant workloads per slot . , Vectors collecting and . Vector collecting all the random variables at time . Maximum parallelization of th delay-tolerant workloads. Data center total IT capacity per time slot. QoS threshold of delay-tolerant workloads. , Parameters of outside-air and chilled-water cooling. , IT rack and outside air temperatures. Maximum capacity of outside-air cooling. Auxiliary variable in cooling equations. , , Initial, minimum and maximum energy levels of battery . , Minimum and maximum (dis)charging power at battery . Capacity of conventional generator (CG). , Maximum ramping-up/down rates of CG. Auxiliary variable introduced by the ramping constraints. Optimality loss introduced by the ramping constraints. Auxiliary variable introduced by the optimality gap. Renewable generation per slot . , , Buying, selling energy prices and CG cost at slot . , Auxiliary variables of energy prices. Ratio of selling price over buying price. , Selected and minimum stepsizes. C. Decision variables Allocated th delay-tolerant workloads per time . Vector collecting . Data center total IT demand at slot . 1932-4553 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016...

Page 1: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

402 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Cooling-Aware Energy and Workload Managementin Data Centers via Stochastic OptimizationTianyi Chen, Xin Wang, Senior Member, IEEE, and Georgios B. Giannakis, Fellow, IEEE

Abstract—While the quest of end users for fast and convenientInternet services grows steadily, energy-hungry data centers cor-respondingly expand in both numbers and scale - a fact that raisesglobal warming and climate change concerns. In addition, highpenetration of renewables, development of energy-efficient coolingfacilities, and flexibility of distributed storage units, all call for asystem-wide energy and workload management policy for futuresustainable data centers. As implementing offline managementpolicies is practically infeasible due to complexity and the lack offuture information, real-time management schemes are consid-ered here under a systematic framework. Leveraging stochasticoptimization tools, a unified management approach is proposedallowing data centers to adaptively respond to intermittent avail-ability of renewables, variability of cooling efficiency, informationtechnology (IT)workload shift, and energy price fluctuations underlong-term quality-of-service (QoS) requirements. Meanwhile, it isrigorously established that when storage devices have sufficientlyhigh capacity, or, the difference between electricity purchase andselling prices is small, the proposed algorithm yields a feasibleand near-optimal management strategy without knowing the dis-tributions of the independently and identically distributed (i.i.d.)workload, renewable, and electricity price processes. Numericalresults further demonstrate that the proposed algorithm workswell not only for i.i.d. processes, but also in real-data scenarios,where the underlying randomness is highly correlated over time.Index Terms—Cooling-aware, cost minimization, data center,

distributed storage, renewable generation, stochastic optimization.

NOMENCLATURE

A. Indices, numbers, and sets

, Number and index of time slots., Number and index of ‘must-serve’ workloads., Number and index of delay-tolerant workloads.

Manuscript received May 04, 2015; revised September 09, 2015; acceptedOctober 17, 2015. Date of publication November 12, 2015; date of currentversion February 11, 2016. Work in this paper was supported in part by theU.S. National Science Foundation under Grants 1509040, 1508993, 1509005,1423316, 1442686, and 1202135, in part by the China Recruitment Program ofGlobal Young Experts, in part by the Program for New Century Excellent Tal-ents in University, in part by the Innovation Program of Shanghai MunicipalEducation Commission, and in part by the National Science and TechnologyMajor Project of the Ministry of Science and Technology of China under Grant2012ZX03001013. The guest editor coordinating the review of this paper andapproving it for publication was Prof. Jean-Christophe Pesquet.T. Chen and G. B. Giannakis are with the Department of Electrical and Com-

puter Engineering and the Digital Technology Center, University of Minnesota,Minneapolis, MN 55455 USA (e-mail: [email protected]; [email protected]).X. Wang is with the Key Laboratory for Information Science of Electromag-

netic Waves (MoE), Department of Communication Science and Engineering,Fudan University, Shanghai 200433, China, and also with the Department ofComputer and Electrical Engineering and Computer Science, Florida AtlanticUniversity, Boca Raton, FL 33431 USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSTSP.2015.2500189

, Number and index of batteries.Iteration index of the dual subgradient ascent.Set of the total scheduling horizon.Feasible set of primal variables in partial Lagrangianper slot .

B. ConstantsDemand of th ‘must-serve’ workloads perslot .Total demand of th delay-tolerantworkloads per slot .

, Vectors collecting and .Vector collecting all the random variablesat time .Maximum parallelization of thdelay-tolerant workloads.Data center total IT capacity per time slot.QoS threshold of delay-tolerant workloads.

, Parameters of outside-air and chilled-watercooling.

, IT rack and outside air temperatures.Maximum capacity of outside-air cooling.Auxiliary variable in cooling equations.

, , Initial, minimum and maximum energylevels of battery .

, Minimum and maximum (dis)chargingpower at battery .Capacity of conventional generator (CG).

, Maximum ramping-up/down rates of CG.Auxiliary variable introduced by theramping constraints.Optimality loss introduced by the rampingconstraints.Auxiliary variable introduced by theoptimality gap.Renewable generation per slot .

, , Buying, selling energy prices and CG costat slot .

, Auxiliary variables of energy prices.Ratio of selling price over buying price.

, Selected and minimum stepsizes.

C. Decision variablesAllocated th delay-tolerant workloads pertime .Vector collecting .Data center total IT demand at slot .

1932-4553 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 403

Amount of IT demand allocated forchilled-water cooling per time .Amount of IT demand allocated for outside-aircooling per time .State of charge (SoC) in battery at thebeginning of slot .(Dis)charging power from battery at slot .Energy output of CG per slot .Total energy consumption per slot .Total energy supply per slot .

Vector collecting .Vector collecting all primal variables at time .Matrix collecting all primal variables.

, Lagrange multipliers.

, Stochastic estimates of Lagrange multipliers.

, Vectors collecting and ., Vectors collecting and .,

Vectors collecting and ., , Limiting time averages.

D. Functions

Outside-air cooling power consumption.Chilled-water cooling power consumption.Data center cooling power consumption.Microgrid energy transaction cost.Revenue from delay-tolerant workloads.Control policy from the random state to theoptimization variable per slot .Operational net-cost per slot .Partial Lagrangian function.Dual function.

Notation: Boldface lower (upper) case letters representcolumn vectors (matrices); denotes the expectation operator;and denotes the -norm of vector . Inequalities forvectors, e.g., , are defined entry-wise; denotes theentry-wise division of vectors and . ;the indicator function is 1 if the event is true and 0otherwise; and stands for vector and matrix transposition.

I. INTRODUCTION

I N the past decade, major improvements have been seen inthe worldwide Internet systems. Along with the ever-in-

creasing demand for Internet applications, data centers nowa-days are rapidly proliferating all over the world. However, asthousands of millions of end users enjoy the convenience ofInternet services, such as social networks, video, and contentdistribution networks, these data centers incur huge electricitybills. According to [1], data centers consume about 1.3% of theworldwide electricity supply currently, and this is expected tobe 8% by 2020. Not only reducing the electricity cost is thus

of great interest, but also improving the sustainability and ef-ficiency of data centers is essential. In addition, two thirds ofthe worldwide electricity is generated by conventional genera-tors, such as gas plants, which emit much more carbon footprintwhen compared to renewable generators such as wind turbinesand solar panels [2]. A surprising fact is that Google's carbonemission in 2010 is almost equivalent to that emitted by 280,000cars [3].As a consequence, integrating renewable energy sources

(RES) to the existing data center power supply systems hasgained popularity both in academic and industry researchover the past three years [4]–[8]. For instance, Apple hasbeen building its sustainable data centers powered by 100%RES with zero greenhouse gas emissions across the country[8]. However, the challenge of integrating renewables is thattheir high penetration leads to variations in the traditional“supply follows demand” motto, and the benefits of RES canonly be harvested by appropriately mitigating their inherentlyhigh variability, which also motivates advancing distributedstorage units [9]–[12]. But these works deal with energy supplyside management by leveraging the distributed storage units,without workload scheduling. A few recent works deal withinformation technology (IT) workload management for datacenters from a demand response (DR) perspective [13]–[16].Yet, these works either ignore the power supply structure, or,do not consider cooling power consumption, despite the factthat a substantial amount of energy in data centers goes to theircooling systems [17].Cooling structures were accounted for in the joint energy and

workload management of [18] and [19]. Assuming that the fu-ture workload and RES information is known a-priori, [18] in-vestigated energy and workload management offline. On onehand, the computational complexity in [18] can become prohib-itively high as the scheduling horizon grows large. On the otherhand, future RES and IT workloads are generally hard to pre-dict accurately. Online energy and workload management wasaddressed in [19], with a simplified single source cooling struc-ture and power supply structure. In addition, neither [18] nor[19] considered a two-way energy trading mechanism for thedata center to potentially sell its surplus energy to the market ata fair price in order to reduce operating costs.In this paper, we consider a practical data center design con-

sisting of power supply, cooling, and IT operating systems. Thepower supply system comprises a conventional generator, RES,distributed energy storage units, and a mechanism to performtwo-way energy trading with the external electricity market.While the cooling system combines two subsystems with dif-ferent cooling coefficients,1 the IT operating system can intel-ligently schedule the workloads under QoS constraints. In thiscontext, we develop an online energy and workload manage-ment approach, which dynamically makes instantaneous deci-sions without a-priori knowledge of any statistics of the under-lying random workload, renewable, and electricity price pro-cesses. To this end, the intended task is formulated as an infi-nite time horizon optimization problem aiming to minimize thetime-average operational net-cost. Targeting a low-complexityonline solution, we adopt relaxation techniques to decouple the

1Cooling coefficient is the power consumed for cooling divided by the ITdemand [18], thus representing the cooling efficiency.

Page 3: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

404 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Fig. 1. A smart-grid powered sustainable data center.

decision variables across time. Then leveraging Lagrange re-laxation and stochastic approximation techniques, we developa novel online control algorithm. Based on the revealed char-acteristics of the optimal schedules, we formally establish thatwhen the storage device has sufficiently high capacity, or, whenthe difference between electricity purchase and selling prices issmall, the proposed algorithm yields a feasible and near-optimalresource management strategy for the original problem.The rest of the paper is organized as follows. The system

models are described in Section II. The proposed dynamic man-agement scheme is developed in Section III. Performance guar-antees of the resultant algorithm are established in Section IV.Numerical results are provided in Section V, followed by con-cluding remarks in Section VI.

II. SYSTEM MODELS

Consider a data center composed of three subsystems: the ITsystem, the cooling system coping with the heat generated bythe IT system, and the power supply system supporting IT andcooling equipments; see Fig. 1.

A. Workload Model

In general, workloads in data centers fall under two cat-egories: delay-sensitive (or ‘must-serve’) and delay-tolerantworkloads [13]. The first category includes voice and mul-timedia services, as well as real-time user requests, whichhave to be served usually within a few seconds. Delay-tolerantworkloads include HTTP and email deliveries that can bescheduled to run when the energy cost is low, or, when thesystem workload is low. This second category provides ampleoptimization opportunities for workload management adaptiveto the time-varying amounts of RES and cooling supply.Consider an infinite scheduling horizon, indexed by the set

, and suppose that there are types of ‘must-serve’ workloads with the central operator having to allocate ITcapacity per slot for type . On the other hand, suppose thatthere are classes of delay-tolerant workloads, where work-loads in class have total demand at slot and maximumparallelization .With denoting the IT capacity allocated

to the delay-tolerant workloads in class at slot , it must holdthat

(1)

and the total IT demand (consumption) at slot is given by

(2)

Supposing that the total IT capacity is , theper-slot IT demand should clearly satisfy

(3)

In order to accommodate QoS requirements, a limiting time-average constraint is also introduced to bound the fraction ofpending delay-tolerant requests; that is,

(4)

where is a prescribed threshold. We will assume that unservedrequests or their fractions will be automatically requested in theensuing slot(s).With , and likewise for and , as-

sume for simplicity that random processes are inde-pendently and identically distributed (i.i.d.) across time. Under(1)–(4), the IT system variables to optimize are .

B. Cooling Structure

Along with the increasing density of IT equipment in datacenters, a considerable amount of electricity is consumed by thecooling system that generally operates in two modes [20], [18]:outside-air (OA) and chilled-water (CW) cooling.The energy usage of outside-air cooling is mainly the power

consumed by blowers, which can be approximated as a cubicfunction of the blower speed [21]. From basics of heat transferand the general fan laws, it turns out that the blower speed under

Page 4: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 405

tight control is proportional to the IT demand [22]. As a re-sult, the outside-air cooling power consumption can be modeledas a convex function of , namely

(5)

where depends on the temperature difference betweenthe (hot) exhausting air temperature from the IT racks andthe outside air temperature . The maximum capacity of out-side-air cooling in (5) can bemodeled as ,with proportional to the maximal outside air mass flowrate. Clearly, the cooling efficiency of outside-air cooling isgreatly affected by the air temperature. As a consequence, thisapproach is usually complemented by more stable cooling re-sources, such as chillers.The chilled-water cooling model here is built on the actual

measurement of an operational chiller whose power consump-tion can be approximated as [23]

(6)

where is again the IT demand in (2), and is a constantdepending on the specific chiller characteristics.Clearly, the two approaches have different cooling efficien-

cies and capacities, which provides the possibility to optimizethe power consumption for cooling by properly combing thesedecoupled sources. In particular, for a given , there is an op-timal allocation between air- and water-based cooling. Letand denote the amounts of IT demand allocated for waterand air cooling, respectively. The optimal cooling power con-sumption is (cf. (5) and (6) with )

(7)

Letting , the convex problem in(7) can be solved in closed form

otherwise (8)

with the optimal demands split between cooling models as

otherwise(9)

Note that and , and thus , as well as in (8) arerandom. And it is worth stressing that although we adopt a spe-cific cooling model here, our approach applies to any nonde-creasing and convex function in (8).

C. Power Supply Model

Consider a data center supplied by a RES-integrated mi-crogrid consisting of a conventional generator (CG) (e.g., fuelgenerator), an on-site renewable generator (RG) (e.g., wind orsolar), and distributed energy storage units (e.g., batteries)[24], [25]. The distributed storage units in this model caninclude batteries deployed at renewable generators, batteries inelectric vehicles, and uninterrupted power supply (UPS) unitsinside the data center itself; see e.g., [26]. Since the consideredenergy management task is within a geographically small area

(e.g., a microgrid around a data center), the cost of movingenergy is deemed negligible.Let denote the energy output of the CG per slot upper

bounded by ; that is,

(10)

The change of the CG energy outputs in two consecutive slotsis bounded by the following so-termed ramping constraints:

(11a)

where and are known maximum ramping-up andramping-down rates. In particular, if ,the ramping constraints can be compactly expressed as

(11b)

where reflects tightness of the ramping requirements.The renewable energy generated from the on-site RG per

slot is assumed i.i.d. across slots to simplify performance anal-ysis. But as will be seen in our simulated tests, the proposed al-gorithm remains operational without any modification to non-i.i.d. processes too. Yet, performance guarantees in thenon-i.i.d. case require more elaborate multi-slot Lyapunov drifttechniques along the lines of [27].Let and denote the initial amount of stored energy and

the state of charge (SoC) in the -th storage unit at the begin-ning of time slot . Each unit has finite capacity . Further-more, for reliability purposes, it may be required to ensure thata minimum energy level is maintained at all times2; thisnecessitates the two-sided inequalities

(12)

Let denote the power delivered to or drawn from the -thstorage unit (battery) at slot , which amounts to either charging

or discharging . Hence, the stored energyobeys the dynamic equation

(13)

The amount of power (dis)charged is bounded by

(14)

where and are set by physical limits.Overall, the total consumption of the data center per slotincludes the IT demand , the cooling power consumption

, and the charged power ; that is,

(15)

Likewise, the total energy supply per slot is given by

(16)

2Storage devices become unreliable with high depth-of-discharge(DoD)—percentage of maximum charge removed during a discharge cycle;hence, a minimum level can avoid high DoD. Such a level can alsosupport the data center operation in the event of a grid outage.

Page 5: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

406 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Besides the IT variables , under constraints (10)–(16),the power supply variables to optimize are CG and batterypower amounts , where .

D. Cost-Revenue ModelIn addition to the internal energy resources (namely, CG, RG,

storage units), the data center can resort to the external en-ergy markets in an on-demand manner. With a two-way energytrading facility, the data center can buy energy from the externalenergy markets when in a deficit , or, sell energyto the markets in the case of a surplus . Clearly, theshortage energy purchased by the data center is ;while the surplus energy that can be sold is . Boththe shortage and surplus energies are non-negative, and at mostone of them is positive per time slot .Let denote per unit the CG cost at slot . Suppose that

the energy can be bought from the external energy markets atprice , while the energy is sold to them at price per slot. Notwithstanding, we shall always set to avoid lessrelevant buy-and-sell activities of the data center for profit.Again, we will suppose for simplicity that the prices

are random i.i.d. over time. Per slot , the energytransaction cost for the data center is therefore

(17)Note that a linear cost of CG is introduced only to simplify theproofs in Section IV. Any convex and Lipschitz continuous costcould replace the linear one and lead to similar results.Since the revenue from ‘must-serve’ workloads is fixed,

we account only for the revenue from the delay-tolerant work-loads. Specifically, the revenue per slot is given by

(18)

where is the revenue per unit of workloads in class , andcaptures the total revenue of class- delay-tolerant work-

loads earned per slot . (Here too, any concave function couldreplace the linear combination in (18).)At this point, it is instructive to collect all sources of random-

ness into the state vector defined as

(19)

and also the optimization variable into the vector

(20)

where the last equality denotes the control strategy thatdepends on the state to output the settings per slot.

III. DYNAMIC ENERGY AND WORKLOAD MANAGEMENT

Based on the models of Section II, we pursue in this sec-tion optimal power and workload management of a data center,starting with the operational net-cost per slot that is given by[cf. (17) and (18)]

(21)

Random process is generally nonstationary. Be-sides , the nonstationarity of is also due to the

time-varying , which affects (dis)charging decisions .However, as numerical tests will also corroborate, canbe safely assumed mean ergodic in several practical settings;that is, limiting time averages involving will behenceforth assumed to exist in the appropriate sense.3Over the scheduling horizon, the central operator of the data

center seeks an optimal schedule for flexible workloads ,CG energy generation , and battery charging energy ,in order to minimize the limiting time-averaged net-cost, sub-ject to IT operation constraints as well as energy generation andstorage constraints. Concretely written, we wish to solve

(22a)

(22b)(22c)(22d)(22e)(22f)(22g)

(22h)

(22i)

(22j)

where the instantaneous constraints (22g)-(22j) involvingrandom variables are understood to hold almost surely.For the net-cost , we can establish the following.Lemma 1: Viewed as a deterministic function, per re-

alization is jointly convex in .Proof: With and ,

it follows readily from (17) that

Since , it is clear that is a convex andnondecreasing function of . Recall that

and .Given that is convex, it is easy to see thatis jointly convex in [28, Chapter 3.2]. As is anaffine transformation of , it follows that isjointly convex in ; and so is .

A. Problem RelaxationAs the cost in (22a) is convex per Lemma 1 and all the con-

straints are linear, problem (22) is a convex program. However,it is still impossible to solve due to the infinite time horizon. Fur-thermore, the battery SoC dynamic (22b) and the CG rampingconstraints (22f) couple the optimization variables over the in-finite time horizon. This renders traditional solvers, such as dy-namic programming, intractable.To turn (22) into a tractable form, we adopt queue-based

relaxation techniques [9], [29], [30], by recognizing that SoC

3Depending on so-termed mixing conditions assumed, the convergence oflimits can be in probability, mean-square sense, or, almost surely (as).

Page 6: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 407

dynamics in (22b) can be viewed as charge-based queue re-cursions; see also [31]. For the random state , we assumethat mean ergodicity holds in the appropriate sense e.g., almostsurely (as), meaning

(23)

(24)

(25)

where expectations are over the distribution of , and the pos-sible randomness of the control policy.Instead of the original problem (22), we thus aim at the func-

tional optimization problem

(26a)

(26b)(26c)

where denotes the mapping (function) from the randomstate to the vector of optimization variables.Comparing (26) with (22), constraints (22b), (22c) have

been replaced by the time-average constraints (22j), andvariables have been eliminated. In addition, the time-cou-pled ramping constraints (22f) are removed and the QoSconstraints (22j) are re-written compactly. We contend that(26) is a relaxed version of (22). To recognize this, take anyschedule that satisfies (22b), (22c) in (22).Then summing (22b) over time and taking expectation yields

, . Since both andare bounded due to (22c), dividing both sides by and

taking limits as , implies (22j). As constraints (22f)are simply ignored in (26), it is clear that any feasible schedulefor (22) is also feasible for (26). This implies that (26) is arelaxation of (22), which in turn establishes that .With the time-coupled constraints relaxed, (26) appears more

tractable than (22). Specifically, it can be shown that the optimalsolution to (26) is achieved by a time-invariant (generally sta-tionary) control policy that chooses per-slot variablespurely as a function (possibly randomized) of the current state, regardless of the storage energy [27, Theorem 4.5]. As a

consequence, a stochastic dual subgradient solver is developedfor (26) next, which under proper initialization yields a feasibleand near-optimal solution of (22).

B. Lagrange Dual Approach

Consider the feasible set arising due to the instantaneousconstraints of (26) as

Let and denote theLagrange multipliers associated with the constraints (26b) and(26c), respectively. With the compact notation ,and , the partial Lagrangian of (26) is

(27)

while the Lagrange dual function is given by

(28)

and the dual problem of (26) is: .For the dual problem, a standard subgradient iteration can be

employed to obtain the optimal , namely

(29a)(29b)

where is the iteration index; is a constant stepsize; andand denote the subgradients of (28) with respect

to and , expressed as

(30a)(30b)

where and denote the primal variablesgiven by the minimization of (27) over for . Dueto the linearity of the limiting average and the expectation in, , and , these operations can be interchanged with

the minimization of in (27). Accordingly, andcan be found by solving the following minimiza-

tion over the infinite horizon [cf. (21)]

(31)

Note that will be obtained from (31) as well, butit may be infeasible for the original problem (22) since theramping constraint (22f) is not included in the feasible set .Since is a convex set and the objective is a convex

function of , the minimization in (31) is a convexprogram that can be efficiently solved to obtain the minimizer

. The multiplier iterations (29) areguaranteed to converge to a neighborhood of the optimalmultipliers for the dual problem [32, Section 6.3].A challenge associated with (30) is computing

and per iteration . This requires performing (high-dimensional) integration over the unknown multivariate distri-bution function of ; and approximately, finding the corre-sponding limiting time-averages in (23)–(25), both of which areimpractical. To circumvent this impasse, a stochastic subgra-dient approach is devised next to find the stochastic estimates‘on-the-fly’ [30], [33].

Page 7: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

408 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Algorithm 1 Online Power and Workload Management

1: Initialize: with a proper and stepsize2: for do3: Acquire , and find as in (32)4: Solve (33) to obtain instantaneous

schedule5: Perform online operations based on

in (33)6: Update Lagrange multipliers via (32)7: end for

C. Stochastic Subgradient Solver

Consider dropping the expectations in (30) and merging in-dices and , to arrive at the corresponding stochastic iterations[cf. (29)]

(32a)

(32b)

where , and denotethe stochastic estimates of the Lagrange multipliers in (29); and

. Given , variables and areobtained by solving for [cf. (31)]

(33)

In words, (32) constitutes an online approximation of thebatch iterations (29) based on the instantaneous decisions

per slot . This stochastic approach is madepossible thanks to the decoupling of optimization variablesacross time in (26).Different from (31), here the ramping constraints (22f) are

added back in (33). Yet, is not an optimization variablehere, but it is treated as a constant determined from the previousslot . Clearly, (33) is a convex problem per slot , which canbe efficiently solved in polynomial time by existing solvers [34].The proposed (modified) stochastic subgradient solver is sum-marized in Algorithm 1. With the addition of (22f) in (33), theonline energy and workload schedule provided by Algorithm 1is guaranteed to satisfy the physical ramping constraints. Inter-estingly, it can be shown that the proposed algorithmwith properinitialization also yields a schedule that satisfies the storage con-straints (22b), (22c), and offers a near-optimal solution of theoriginal problem (22).It is worth mentioning that the proposed stochastic solver in-

curs affordable low computational complexity. Per slot , theworst-case complexity of solving (33) is by inte-rior-point methods [34], while updating (32) requires just linearcomplexity .

IV. PERFORMANCE GUARANTEES

To arrive at our main analytical claim, we first establish theoptimality gap of the proposed Algorithm 1.

A. Optimality GapTo begin with, introduce the definition

(34)

where as in (33). Compared with (33), (22f) is absent from(34); hence, it clearly holds that .Upon defining and

[cf. (17)], the following lemma can be established.Lemma 2: The optimal value of problem (33) satisfies

where .Proof: See Appendix A.

Lemma 2 shows that inclusion of the ramping constraints tosubproblem (33) will only incur a bounded optimality loss ofthe stochastic subgradient solver. The proof follows the stepsin [24, Theorem 1.2]. Based on this, we can subsequently buildon the stochastic optimization techniques in [9], [29], [30] toestablish the following lemma.Lemma 3: If state is i.i.d. over slots, then the limiting

time-average net-cost incurred by the proposed online algorithmsatisfies

where the constant, and is the optimal

value of (22) under any feasible control.Proof: See Appendix B.

Lemma 3 asserts that the proposed Algorithm 1 convergesasymptotically to a region with optimality gap smaller than

. The gap approaches a constant as the stepsize. In addition, can become negligible when the

ramping constraints are loose, meaning as approaches 1.

B. Feasibility GuaranteeLemma 3 established that the proposed scheme can achieve

a near-optimal objective value for (22). However, since Algo-rithm 1 is modified from a stochastic solver of the relaxed (26),it does not guarantee that the resultant dynamic control policy isa feasible one for (22). In the sequel, we will establish that Algo-rithm 1 indeed yields a feasible policy for (22) when it is prop-erly initialized. To this end, we first need the following lemma.Lemma 4: With , the real-time battery

(dis)charging decisions returned by the proposed onlinealgorithm obey: i) , if ; or, ii)

, if .Proof: See Appendix C.

Lemma 4 reveals a salient structure of the optimal solution forproblem (33). Such a structure can be justified by the economicinterpretation of the Lagrange multipliers. Specifically, can

Page 8: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 409

be viewed as the stochastic instantaneous charging price. Forhigh prices , the optimal decision is to discharge thebattery as much as possible, i.e., . Conversely,the battery units can afford full charge , if theprice is low; i.e., .Relying on the solution structure revealed by Lemma 4, we

can subsequently establish the following lemma.Lemma 5: If the stepsize satisfies , where

then the proposed algorithm guarantees that the Lagrange mul-tipliers satisfy

, .Proof: See Appendix D.

Consider now the linear mapping

(35)

It can be readily seen from Lemma 5 thatholds for all and ; i.e., (22c) are always satisfied under

the proposed online scheme. With the battery (dis)charging dy-namics (22b) naturally performed and the ramping constraint(22f) taken into account by the online decision, the feasibilityof the control actions can be maintained for the originalproblem, provided that we select a stepsize .

C. Main TheoremBased on Lemmas 3 and 5, we are able to reach the following

main result.Theorem 1: Upon setting ,, and selecting a stepsize , the proposed algorithm

yields a feasible dynamic control scheme for (26), which is near-optimal in the sense that

where , and are specified by Lemmas 3 and 5.Clearly, the minimum optimality gap between Algorithm 1

and the offline scheduling is given by

The asymptotically optimal solution can be attained as(meaning that the ramping constraints are loose), and is

very small when the maximum difference between buying andselling prices approaches zero, or, the battery capac-ities are very large. This makes sense intuitively be-cause as approaches zero, purchasing extra power tocharge the batteries will always make profit, and when batterieshave large capacity, the upper bounds in (22b) are not in effect.In these cases, with a proper initialization, the proposed onlinepolicy using any will be feasible for (22), and the optimalwill be reached as close as possible.

TABLE IPOWER SUPPLY PARAMETERS

TABLE IIDATA CENTER COOLING AND OPERATING PARAMETERS

Remark 1: Readers familiar with optimization based onLyapunov functions can recognize similarities between thestochastic dual sub-gradient based solver proposed here, andthe Lyapunov optimization tools in [9], [29]. However, thereare differences between two methods that can be summarizedas follows.D1) The Lyapunov optimization solver relies on the so-called

“virtual queues” to ensure that long-term average constraints aremet, where the tuning parameter in [9], [29] corresponds tothe inverse of the stepsize in the stochastic optimization setup.In contrast, “virtual queues” are naturally emerging as Lagrangemultiplier iterations in our stochastic optimization setup.D2) Leveraging duality and online signal processing tech-

niques, the stochastic dual subgradient iteration is also easy tointerpret. The Lagrange multiplier for instance, can be viewedas the instantaneous charging price, which reveals the intuitionbehind real-time (dis)charging decisions, as discussed afterLemma 4. Weak duality is also utilized to prove Lemma 3.Finally, the dual subgradient iteration permeates results estab-lished for the least mean-square (LMS) algorithm - arguablythe “workhorse” of adaptive schemes - to the problem at hand;e.g., LMS with constant stepsize only converges to the optimalLagrange multiplier in the mean [35]. Thus, a large stepsizewill lead to severe hovering around the equilibrium point, andthus it will incur considerable loss of optimality.

V. NUMERICAL EVALUATIONIn this section, simulated tests are presented to demonstrate

the merits of the proposed approach and justify the analyticalclaims of Section IV.

A. Experiment SetupThe Matlab-based modeling package CVX 2.1 [34] and the

solver SDPT3 [38] are used to solve the optimization problemsinvolved. The considered system includes one data center, oneconventional generator, one renewable generator, anddistributed energy storage units. The power supply limits andthe corresponding parameters are listed in Table I. The datacenter operating limits and the cooling parameters are listed inTable II. Each type- ‘must-serve’ workload and class- delay-tolerant workload arrive according to a Poisson process with av-erage IT demand 10 kWh/slot and 5 kWh/slot, respectively.Two cases are considered for the energy market prices and the

available renewables. In Case A (i.i.d. case), the purchase priceis uniformly distributed within [50,100] $/MWh, and sam-

ples of the renewable supply are generated from a Weibulldistributed wind speed and a wind-speed-to-wind-power map-ping with maximum capacity [39].

Page 9: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

410 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Fig. 2. Hourly real-time wind power generation connected to PJM grids duringJan. 01–30, 2015 [36]; and day-ahead electricity prices in New York during Jan.01–30, 2015 [37].

In Case B (real-data case), the purchase prices arere-scaled from the day-ahead hourly electricity prices to thelarge general services in New York during Jan. 01–30, 2015[37], while the renewable supply is a re-scaled versionof the real-time hourly wind generation connected to the PJMgrids at the same period [36]. The trend of energy purchaseprices and renewable supply is shown in Fig. 2. Notethat energy market prices and renewable energy generationhere are highly correlated over time. While our performanceanalysis is carried out for the i.i.d. case, the proposed algorithmreadily applies to this non-i.i.d. setup.For both cases, the selling price is set to with

, and the CG generation cost is set to the average marketprice . Finally, slot duration is an hour with theentire time-horizon equal to 30 days (i.e., 720 slots), and thestepsize is chosen as [cf. Theorem 1] by default.

B. Benchmarks

To benchmark performance of the proposed algorithm, fourbaseline schemes are tested.1) ALG 1 (Renewable-aware, no cooling optimization,

two-way trade, workload scheduling, with storage): ALG1 is similar to the proposed algorithm except that nocooling optimization is performed.

2) ALG 2 (Renewable-aware, no cooling optimization,two-way trade, no workload scheduling, with storage):ALG 2 is based on the approach in [9], where renewableenergy is taken into account, but neither cooling optimiza-tion nor workload scheduling is carried out.

3) ALG 3 (Renewable-oblivious, no cooling optimization,two-way trade, no workload scheduling, without storage):ALG 4 is widely used in practice to minimize only the en-ergy transaction cost without any consideration on work-load management, renewable energy, cooling optimizationor storage.

4) Optimal: Assuming all needed statistics of randomnessare known a-priori, the offline optimal algorithm is alsointroduced to solve (22) over the entire horizonslots. This optimal algorithm cannot work in practice dueto the lack of future information.

Note that [9] does not account for real-time two-way energytransaction, workload management, and cooling optimization.

Fig. 3. Comparison of average net-costs.

Fig. 4. Average net-cost versus and .

For fair comparison, chilled-water cooling is utilized to cal-culate the final net-cost for ALGs 1–3, while two-way energytransaction is also allowed.

C. Case AIn Fig. 3, the proposed Algorithm 1 is compared with ALGs

1–3, and also against the offline optimal benchmark, in termsof the average net-cost. Within 720 iterations (time slots) theproposed algorithm converges to a much lower net-cost thanALGs 1–3. The net-costs of ALGs 1–3 are about 33%, 37%and 95% larger than that of the proposed algorithm. Intuitivelyspeaking, this is because the proposed algorithm takes bothcooling optimization and workload management into account.It also leverages the renewable energy and energy storage unitsto hedge against future fluctuation of workload demands andenergy prices. These advantages cannot be fully exploited byALGs 1–3. On the other hand, without any future information,the proposed online algorithm incurs only 5% optimality losscompared with the offline optimal approach.Fig. 4 demonstrates the impact of battery capacity

and ramping parameter on algorithm performance. For afixed , a larger results in a smaller average net-costand a smaller optimality gap. This is consistent with Lemma3 and also intuitive since a larger implies a looser rampingconstraint, which endows the proposed algorithm with morefreedom to purchase cheaper energy from CG. For a fixed ,the optimality gap decreases as increases, as a larger

allows the algorithm to choose a smaller stepsize [cf.Lemma 5].To further delineate the trade off between the battery feasi-

bility and the algorithm optimality, Figs. 5, 6 depict the averagenet-cost and the battery SoC evolution for different stepsizes .

Page 10: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 411

Fig. 5. Average net-cost versus .

Fig. 6. The battery state-of-charge versus stepsize .

With the same parameters, the proposed algorithm convergesfaster with a larger stepsize (i.e., ), but incurs lowernet-cost with a small stepsize (i.e., ). This is pre-cisely consistent with Lemma 3 in the sense that the optimalitygap is proportional to the stepsize . However, recall that ar-bitrarily small stepsize may affect feasibility of the proposedonline scheme [cf. Lemma 5]. In Fig. 6, it turns out that the SoCis always feasible when . In con-trast, if the stepsize is selected as , which does notsatisfy the stepsize condition in Lemma 5, then exceeds itsphysical upper bound immediately.The evolution of energy purchase prices , selling prices, Lagrange multipliers , as well as the real-time battery

SoC and battery (dis)charging amount are shown inFig. 7. It can be seen that when at

, 12, while when at , 2,3, 6, 7, 11. Notice that when at , 5, 8,10, one must resort to solving (33) numerically to obtain ,since the sufficient conditions for (dis)charging actions inLemma 4 are not satisfied. Clearly, the Lagrange multiplieris in fact a mapping of the real-time battery SoC [cf. (35)].Such mapping relationships are also true for the slotsand .The long-term QoS ratio [cf. (4)] of the proposed algorithm

is depicted in Fig. 8, where the QoS ratio of the proposed al-gorithm quickly converges to the threshold as thenumber of iterations increases. This corroborates our assertion

Fig. 7. Schedule of battery power .

Fig. 8. QoS ratio of delay-tolerant workloads in class 1.

Fig. 9. Comparison of average net-cost and IT consumption.

that time-average constraints (22j) are asymptotically satisfiedby leveraging the stochastic subgradient strategy [30].

D. Case BFig. 9 compares the average net-cost and IT consumption [cf.

(2)] of the proposed algorithm and ALGs 1–3. It can be seenthat the proposed algorithm reduces the net-cost by 15%–47%,while all algorithms have similar average IT consumption. Theresult is expected since the proposed algorithm optimizes thecooling efficiency and intelligently schedules IT workloads ac-cording to current energy prices and task revenues. In contrast,ALG 1 ignores cooling consumption and thus underestimates

Page 11: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

412 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

Fig. 10. Comparison of cooling and IT consumptions.

Fig. 11. Comparison of IT revenue and IT consumption.

the total power demand, which results in accommodating moredelay-tolerant workloads than the proposed algorithm. ALG 2incurs a higher net-cost since it does not consider cooling con-sumption and workload management, whereas ALG 3 is obliv-ious to not only cooling consumption and workload manage-ment but also renewable energy and storage units. At the sametime, the proposed algorithm only exhibits 14% optimality loss,compared with the ideally optimal algorithm having all futureinformation available. Note that smaller optimality loss can beexpected when larger batteries are deployed in this setup [cf.Fig. 4].The average cooling energy consumption and IT revenue are

compared with the average IT consumption in Figs. 10 and 11,separately. Clearly, the proposed algorithm reduces the coolingenergy consumption by almost 35%, while it has only 1% lessIT consumption than ALGs 1–3. Further, it is shown that byusing combined cooling sources, the average cooling coefficientof the proposed algorithm is around 0.13, which is more effi-cient than simple chilled-water cooling with a constant coef-ficient 0.2. This result is of interest and meaningful. It impliesthat by integrating cooling optimization with workload manage-ment, the proposed algorithm can use less energy to serve thesame amount of IT consumption. Furthermore, Fig. 11 showsthat by incorporating workload management, the proposed al-gorithm can earn 5% more IT revenue with the same IT con-sumption than other algorithms without workload management.Fig. 12 depicts the average power schedule of the proposed al-

gorithm over a 24-hour period, and the trend of energy purchase

Fig. 12. Power schedule of the proposed algorithm.

prices is also shown to illustrate the resultant online policy.One observation is that the hourly power consumption closelyreflects the instantaneous energy purchase price . Specifically,the proposed method tends to consume more power when islower (24PM to 5AM), and less power when is higher (7AMto 10AM, and 17PM to 21PM).Moreover, the lower energy pur-chase price in the proposed method encourages purchasingmore energy from the external grid market, while the peak ofresults in a higher power usage from the CG.

VI. CONCLUSIONSReal-time energy and workload management of IT, cooling,

and power supply systems in current sustainable data centerswas considered in this paper. Taking into account the variabilityof workloads, renewables, and electricity market prices, a sto-chastic optimization problem was formulated to minimize thelimiting average net-cost of the data center. Relying on the sto-chastic subgradient method, an online algorithm was developedto obtain feasible decisions ‘on-the-fly.’ It was established thatthe novel iterations can asymptotically attain near-optimal re-source schedules without knowing the distributions of the un-derlying stochastic processes. Extensive numerical tests corrob-orated the effectiveness and merits of the proposed scheme.Building on this work, promising future directions include

modeling more practical storage units with energy leakage, in-corporating energy transfer costs from storage units, and alsoaccounting for the power transmission network structure.

APPENDIX

A. Proof of Lemma 2Let denote the optimal solution for (33),

and the optimal solution for (34). Construct

a vector . Note that the ramping constraintin (33) is only relevant to . Recall that

satisfies the constraints (22d), (22e) and (22g),

(22i). Upon selecting any ,will be in the feasible set of (33). Let denote the value

of objective function for the feasible solution . It clearly holdsthat , since is a feasible solution but

Page 12: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 413

not necessarily the minimizer of (33). As a consequence, we de-duce that [cf. definitions of and in Lemma 1]

where the last inequality follows from the triangle inequality.Consider the next three cases.c1) If , then simply

let (i.e., ). It is then clear that.

c2) If , then pickin to arrive at

where the last equality holds because .

c3) If , then selectin . Similarly, we have

where the last equality is due to .Combining cases c1)–c3), it readily follows that

.

B. Proof of Lemma 3Squaring the update in (32a) yields

where the last inequality follows from constraints (22d).

Likewise, squaring the update in (32b) implies [cf. the defi-nition of in Section I]

which leads to

(36)

Upon adding [cf. (33)], and taking expectationson both sides of (36), we find

Summing both sides of the last inequality over and dividingboth sides by , we arrive at

from which letting yields

Page 13: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

414 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 10, NO. 2, MARCH 2016

where inequality (a) follows from Lemma 2; equality (b) fol-lows from the definition of the Lagrangian in (27) withdenoting the optimal primal variables given by (33); equality(c) comes from the definition of the dual function; inequality(d) follows from the weak duality [cf. (26a)]; and inequality (e)holds since (26) is a relaxation of (22).

C. Proof of Lemma 4Algorithm 1 solves the real-time problem (33) per slot . In

particular, are obtained by solving [cf. (21)]

Consider the following two cases.i) If , then

It is easy to see that

ifif

ii) If , then

Similarly, it holds that

ifif .

Combining cases i) and ii), one deduces that if per slot ,, then . Likewise, if

, then , andthe lemma follows readily.

D. Proof of Lemma 5The argument proceeds by induction. First, set

, and suppose thatthis holds for . We will show that the bounds hold for , aswell as for subsequent instances. Consider the following threecases.

c1) If , thenit follows from Lemma 4 that

holds con-sidering the facts and .c2) If , then

, since .c3) If , then Lemma 4 im-plies that

,where the last step follows because ,

, and the fact under c2).

REFERENCES[1] J. Koomey, “Growth in data center electricity use 2005 to 2010,” a

report by Analytical Press, 2011.[2] P. X. Gao, A. R. Curtis, B. Wong, and S. Keshav, “It's not easy being

green,” in Proc. ACM SIGCOMM, Helsinki, Finland, Aug. 2012, vol.42, no. 4, pp. 211–222.

[3] E. All, “Light-duty automotive technology, carbon dioxide emissions,fuel economy trends: 1975 through 2010,” Ann Arbor 2013 [Online].Available: http://www.epa.gov/oms/fetrends.htm

[4] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. H. Andrew,“Greening geographical load balancing,” IEEE/ACM Trans. Net-working, vol. 23, no. 2, pp. 657–671, Apr. 2015.

[5] M. A. Adnan and R. K. Gupta, “Workload shaping to mitigate vari-ability in renewable power use by data centers,” in Proc. IEEE Int.Conf. Cloud Comput., Anchorage, AK, USA, Jun. 2014, pp. 96–103.

[6] A. Rahman, X. Liu, and F. Kong, “A survey on geographic load bal-ancing based data center power management in the smart grid environ-ment,” IEEE Commun. Surveys Tutorials, vol. 16, no. 1, pp. 214–233,2014.

[7] A. Wierman, Z. Liu, I. Liu, and H. Mohsenian-Rad, “Opportunitiesand challenges for data center demand response,” in Proc. IEEE GreenComput. Conf., Dallas, TX, USA, Nov. 2014, pp. 1–10.

[8] Apple Environmental Responsibility [Online]. Available: http://www.apple.com/environment/

[9] R. Urgaonkar, B. Urgaonkar,M. Neely, and A. Sivasubramaniam, “Op-timal power cost management using stored energy in data centers,” inProc. ACM SIGMETRICS, San Jose, CA, Jun. 2011, pp. 221–232.

[10] W. Deng, F. Liu, H. Jin, C. Wu, and X. Liu, “Multigreen: Cost-mini-mizing multi-source datacenter power supply with online control,” inProc. ACM Int. Conf. Future Energy Syst., Berkeley, CA, USA, May2013, pp. 149–160.

[11] Y. Guo and Y. Fang, “Electricity cost saving strategy in data centersby using energy storage,” IEEE Trans. Parallel Distrib. Syst., vol. 24,no. 6, pp. 1149–1160, Jun. 2013.

[12] L. Yu, T. Jiang, and Y. Cao, “Energy cost minimization for distributedInternet data centers in smart microgrids considering power outages,”IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 120–130, Jan.2015.

[13] Y. Yao, L. Huang, A. Sharma, L. Golubchik, and M. Neely, “Datacenters power reduction: A two time scale approach for delay tolerantworkloads,” in Proc. INFOCOM, Orlando, FL, USA, Mar. 2012, pp.1431–1439.

[14] H. Wang, J. Huang, X. Lin, and H. Mohsenian-Rad, “Exploring smartgrid and data center interactions for electric power load balancing,” inProc. ACM SIGMETRICS, Pittsburgh, PA, USA, Dec. 2013, vol. 41,no. 3, pp. 89–94.

[15] Z. Liu, A. Wierman, Y. Chen, B. Razon, and N. Chen, “Data center de-mand response: Avoiding the coincident peak viaworkload shifting andlocal generation,”Elsevier Perform. Eval., vol. 70, no. 10, pp. 770–791,Oct. 2013.

[16] J. Luo, L. Rao, and X. Liu, “Temporal load balancing with service delayguarantees for data center energy cost optimization,” IEEE Trans. Par-allel Distrib. Syst., vol. 25, no. 3, pp. 775–784, Mar. 2014.

[17] E. Samadiani, Y. Joshi, and F. Mistree, “The thermal design of a nextgeneration data center: A conceptual exposition,” in Proc. IEEE Int.Conf. Thermal Iss. Emerging Tech.: Theory Applicat., Cairo, Egypt,Jan. 2007, pp. 93–102.

Page 14: 402 … · 402 IEEEJOURNALOFSELECTEDTOPICSINSIGNALPROCESSING,VOL.10,NO.2,MARCH2016 Cooling-AwareEnergyandWorkloadManagement inDataCentersviaStochasticOptimization

CHEN et al.: COOLING-AWARE ENERGY AND WORKLOAD MANAGEMENT IN DATA CENTERS 415

[18] Z. Liu, Y. Chen, C. Bash, A. Wierman, D. Gmach, Z. Wang, M.Marwah, and C. Hyser, “Renewable and cooling aware workload man-agement for sustainable data centers,” in Proc. ACM SIGMETRICS,London, U.K., Jun. 2012, vol. 40, no. 1, pp. 175–186.

[19] Y. Guo, Y. Gong, Y. Fang, P. P. Khargonekar, and X. Geng, “Energyand network aware workload management for sustainable data centerswith thermal storage,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no.8, pp. 2030–2042, Aug. 2014.

[20] Active Power, “Data center thermal runaway. a review of cooling chal-lenges in high density mission critical environments,” White Paper2007 [Online]. Available: www.edsenerji.com.tr/dokuman_indir/16/

[21] R. Zhou, Z. Wang, A. McReynolds, C. E. Bash, T. W. Christian, andR. Shih, “Optimization and control of cooling microgrids for data cen-ters,” in Proc. IEEE Conf. Thermal Thermomech. Phenom. Electron.Syst., San Diego, CA, USA, May 2012, pp. 338–343.

[22] J. P. Holman, Heat Transfer, 8th ed. Columbus, OH, USA: McGraw-Hill, 1996.

[23] C. Patel, R. Sharma, C. Bash, and A. Beitelmal, “Energy flow in theinformation technology stack,” in Proc. IMECE, Chicago, IL, USA,Nov. 2006.

[24] S. Sun, M. Dong, and B. Liang, “Joint supply, demand, energy storagemanagement towards microgrid cost minimization,” in Proc. IEEESmartGridCom, Venice, Italy, Nov. 2014, pp. 109–114.

[25] D. Rastler, “Electricity energy storage technology options,” WhitePaper 2010 [Online]. Available: www.epri.com

[26] S. Li, M. Brocanelli, W. Zhang, and X. Wang, “Integrated power man-agement of data centers and electric vehicles for energy and regula-tion market participation,” IEEE Trans. Smart Grid, vol. 5, no. 5, pp.2283–2294, Sep. 2014.

[27] M. J. Neely, “Stochastic network optimization with application to com-munication and queueing systems,” Synth. Lectures Commun. Netw.,vol. 3, no. 1, pp. 1–211, 2010.

[28] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY,USA: Cambridge Univ. Press, 2004.

[29] S. Lakshminaryana, H. V. Poor, and T. Quek, “Cooperation and storagetradeoffs in power grids with renewable energy resources,” IEEE J. Sel.Areas Commun., vol. 32, no. 7, pp. 1386–1397, Jul. 2014.

[30] A. G.Marques, L.M. Lopez-Ramos, G. B. Giannakis, J. Ramos, and A.J. Caama no, “Optimal cross-layer resource allocation in cellular net-works using channel-and queue-state information,” IEEE Trans. Veh.Technol., vol. 61, no. 6, pp. 2789–2807, Jul. 2012.

[31] N. Gatsis and A. G. Marques, “A stochastic approximation approachto load shedding in power networks,” in Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process., Florence, Italy, May 2014, pp. 6464–6468.

[32] D. P. Bertsekas, Convex Optimization Theory. Belmont, MA, USA:Athena Scientific, 2009.

[33] J.-C. Pesquet and A. Repetti, “A class of randomized primal-dual algo-rithms for distributed optimization,” J. Nonlinear Convex Anal. 2014,arXiv preprint:1406.6404, to be published.

[34] CVX: Matlab Software for Disciplined Convex Programming Version2.1. Sep. 2012 [Online]. Available: http://cvxr.com/cvx

[35] V. Solo, “Averaging analysis of adaptive algorithms made simple,” inSystem Identification, Environmental Modelling, Control System De-sign. London, U.K.: Springer-Verlag, 2012, pp. 115–131.

[36] Pennsylvania-New Jersey-Maryland Interconnection (PJM) HourlyReal-Time Wind Generation, Jan., 2015 [Online]. Available:http://www.pjm.com/markets-and-operations/ops-analysis.aspx

[37] Hourly Electric Supply Charges in New York, Jan. 2015 [Online].Available: https://www.nationalgridus.com/

[38] K. C. Toh, M. J. Todd, and R. H. Tutuncu, “SDPT3—A Matlab soft-ware package for semidefinite programming,” Optimizat. Meth. Soft-ware, vol. 11, pp. 545–581, 2009.

[39] Y. Zhang, N. Gatsis, and G. B. Giannakis, “Robust energy manage-ment for microgrids with high-penetration renewables,” IEEE Trans.Sustain. Energy, vol. 4, no. 4, pp. 944–953, Oct. 2013.

Tianyi Chen received the B.Eng. degree (withhighest honors) in communication science andengineering from Fudan University, China, in 2014.Since August 2014, he has been with SPiNCOM,working toward his Ph.D. degree in the Dept. of ECEat the University of Minnesota. His research interestslie in network optimization with applications togreen communications, and sustainable data centers.He received a National Scholarship from China in2013, and the UMN ECE Department Fellowship in2014.

Xin Wang (SM’09) received the B.Sc. and M.Sc.degrees from Fudan University, Shanghai, China, in1997 and 2000, respectively, and the Ph.D. degreefrom Auburn University, Auburn, AL, USA, in2004, all in electrical engineering.From September 2004 to August 2006, he was

a Postdoctoral Research Associate with the De-partment of Electrical and Computer Engineering,University of Minnesota, Minneapolis. In August2006, he joined the Department of Computer andElectrical Engineering and Computer Science,

Florida Atlantic University, Boca Raton, FL, USA, as an Assistant Professorand, in August 2010, was promoted and became an Associate Professor.He is currently a Professor with the Department of Communication Scienceand Engineering, Fudan University. His research interests include stochasticnetwork optimization, energy-efficient communications, cross-layer design,and signal processing for communications. He served as an Associate Editor forthe IEEE SIGNAL PROCESSING LETTERS. He currently serves as an AssociateEditor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING and as an Editorfor the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY.

Georgios B. Giannakis (F’97) received his Diplomain Electrical Engr. from the Ntl. Tech. Univ. ofAthens, Greece, 1981. From 1982 to 1986 he waswith the Univ. of Southern California (USC), wherehe received his M.Sc. in electrical engineering, 1983,M.Sc. in mathematics, 1986, and Ph.D. in electricalengr., 1986. Since 1999 he has been a Professorwith the Univ. of Minnesota, where he now holdsan ADC Chair in Wireless Telecommunications inthe ECE Department, and serves as director of theDigital Technology Center.

His general interests span the areas of communications, networking and sta-tistical signal processing - subjects on which he has published more than 385journal papers, 650 conference papers, 22 book chapters, two edited books andtwo research monographs (h-index 114). Current research focuses on learningfrom Big Data, wireless cognitive radios, and network science with applicationsto social, brain, and power networks with renewables. He is the (co-) inventor of25 patents issued, and the (co-) recipient of 8 best paper awards from the IEEESignal Processing (SP) and Communications Societies, including the G. Mar-coni Prize Paper Award in Wireless Communications. He also received Tech-nical Achievement Awards from the SP Society (2000), from EURASIP (2005),a Young Faculty Teaching Award, the G.W. Taylor Award for Distinguished Re-search from the University of Minnesota, and the IEEE Fourier Technical FieldAward (2015). He is a Fellow of EURASIP, and has served the IEEE in a numberof posts, including that of a Distinguished Lecturer for the IEEE-SP Society.