virtual embedding.pdf

download virtual embedding.pdf

of 14

Transcript of virtual embedding.pdf

  • 7/27/2019 virtual embedding.pdf

    1/14

    1

    Greenhead: Virtual Data Center EmbeddingAcross Distributed Infrastructures

    Ahmed Amokrane, Mohamed Faten Zhani, Rami Langar, Raouf Boutaba, Guy Pujolle

    LIP6 / UPMC - University of Pierre and Marie Curie; 4 Place Jussieu, 75005 Paris, France University of Waterloo; 200 University Ave. W., Waterloo, ON, Canada

    Email: [email protected]; [email protected]; [email protected];

    [email protected]; [email protected]

    !

    AbstractCloud computing promises to provide on-demand com-

    puting, storage and networking resources. However, most cloud

    providers simply offer virtual machines (VMs) without bandwidth and

    delay guarantees, which may hurt the performance of the deployedservices. Recently, some proposals suggested remediating such

    limitation by offering Virtual Data Centers (VDCs) instead of VMs

    only. However, they have only considered the case where VDCs

    are embedded within a single data center. In practice, infrastructure

    providers should have the ability to provision requested VDCs across

    their distributed infrastructure to achieve multiple goals including

    revenue maximization, operational costs reduction, energy efficiency

    and green IT, or to simply satisfy geographic location constraints of

    the VDCs.

    In this paper, we propose Greenhead, a holistic resource man-

    agement framework for embedding VDCs across geographically

    distributed data centers connected through a backbone network. The

    goal of Greenhead is to maximize the cloud providers revenue while

    ensuring that the infrastructure is as environment-friendly as pos-

    sible. To evaluate the effectiveness of our proposal, we conducted

    extensive simulations of four data centers connected through the

    NSFNet topology. Results show that Greenhead improves requests

    acceptance ratio and revenue by up to 40% while ensuring high

    usage of renewable energy and minimal carbon footprint.

    Index TermsGreen Computing, Energy Efficiency, Cloud Comput-

    ing, Virtual Data Center, Distributed Infrastructure

    1 INTRODUCTION

    Cloud computing has recently gained significant pop-

    ularity as a cost-effective model for hosting large-scale online services in large data centers. In a cloudcomputing environment, an Infrastructure Provider(InP) partitions the physical resources inside each datacenter into virtual resources (e.g., Virtual Machines(VMs)) and leases them to Service Providers (SPs) inan on-demand manner. On the other hand, a serviceprovider uses those resources to deploy its serviceapplications, with the goal of serving its customersover the Internet.

    Unfortunately, current InPs like Amazon EC2 [1]mainly offer resources in terms of virtual machines

    without providing any performance guarantees interms of bandwidth and propagation delay. The lackof such guarantees affects significantly the perfor-mance of the deployed services and applications [2].

    To address this limitation, recent research proposalsurged cloud providers to offer resources to SPs in the

    form of Virtual Data Centers (VDCs) [3]. A VDC is acollection of virtual machines, switches and routersthat are interconnected through virtual links. Eachvirtual link is characterized by its bandwidth capacityand its propagation delay. Compared to traditionalVM-only offerings, VDCs are able to provide betterisolation of network resources, and thereby improvethe performance of service applications.

    Despite its benefits, offering VDCs as a serviceintroduces a new challenge for cloud providers calledthe VDC embedding problem, which aims at mappingvirtual resources (e.g., virtual machines, switches,

    routers) onto the physical infrastructure. So far, fewworks have addressed this problem [2], [4], [5], butthey only considered the case where all the VDCcomponents are allocated within the same data center.Distributed embedding of VDCs is particularly ap-pealing for SPs as well as InPs. In particular, a SPuses its VDC to deploy various services that operatetogether in order to respond to end users requests. Asshown in Fig. 1, some services may require to be inthe proximity of end-users (e.g., Web servers) whereasothers may not have such location constraints and canbe placed in any data center (e.g., MapReduce jobs).

    On the other hand, InPs can also benefit fromembedding VDCs across their distributed infrastruc-ture. In particular, they can take advantage of theabundant resources available in their data centers andachieve various objectives including maximizing rev-enue, reducing costs and improving the infrastructuresustainability.

    In this paper, we propose a management frameworkable to orchestrate VDC allocation across a distributedcloud infrastructure. The main objectives of suchframework can be summarized as follows.

    - Maximize revenue. Certainly, the ultimate ob-

    jective of an infrastructure provider is to increaseits revenue by maximizing the amount of leased re-sources and the number of embedded VDC requests.However, embedding VDCs requires satisfying dif-

    Digital Object Indentifier 10.1109/TCC.2013.5 2168-7161/13/$31.00 2013 IEEE

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    2/14

    2

    Fig. 1: Example of VDC deployment over a distributed

    infrastructureferent constraints, namely the capacity and locationconstraints. Obviously, the embedding scheme mustensure that the capacity of the infrastructure is neverexceeded. In addition, it must satisfy location con-straints imposed by SPs.

    - Reduce backbone network workload. To copewith the growing traffic demand between data cen-ters, infrastructure providers tend to build their pro-prietary wide-area backbone network to interconnecttheir facilities (e.g., Google G-scale network [6]). In

    this context, one key objective when embedding VDCsis to minimize the traffic within the backbone net-work. Indeed, it has been reported recently that thecost of building an inter-data center network is muchhigher than the intra-data center network cost and itaccounts for 15% of the total infrastructure cost [7]. Inaddition, according to several studies [8], wide areadata transport is bound to be the major contributorto the data transport costs. Hence, it is crucial toreduce the backbone network traffic and place high-communicating VMs within the same data centerwhenever possible.

    - Reduce data center operational costs. Reducingdata centers operational costs is a critical objectiveof any infrastructure provider as it impacts its budgetand growth. This can be achieved through minimizingenergy costs, which constitutes a significant portionof the total operational expenditure. To this end, twokey techniques can be adopted: (1) placing moreworkload into the most energy-efficient data centers,and (2) taking advantage of the difference in electricityprices between the locations of the infrastructure facil-ities. In particular, energy-efficient data centers can beidentified by their Power Usage Effectiveness (PUE),

    and favored to host more virtual machines.Furthermore, InPs can achieve more savings by

    considering the fluctuation of electricity price overtime and the price difference between the locations of

    the data centers. Hence, VMs can be efficiently placedsuch that the total electricity cost is minimized.

    - Reduce the carbon footprint. Recent research hasreported that, in 2012, the carbon footprint of datacenters around the world accounted for 0.25% of theworldwide carbon emission, which constitutes 10% of

    Information and Communication Technologies (ICT)emissions [9]. As a result, InPs are facing a lot ofpressure to operate on renewable sources of energy(e.g., solar and wind power) to make their infras-tructure more green and environment-friendly. Basedon these observations, an efficient VDC embeddingscheme should maximize the usage of renewables andtake into account their availability, which dependson the data center geographical location, the time ofthe day (e.g., day and night for solar power) as wellas the weather conditions (e.g., wind, atmosphericpressure). Furthermore, whenever the power from the

    electric grid is used, the VDC embedding scheme hasto minimize the infrastructure carbon footprint. Inthat case, the placement of the VMs is critical sincethe carbon footprint per watt of power varies fromlocation to location.

    In this paper, we propose Greenhead, a resourcemanagement framework for VDC embedding acrossa distributed infrastructure. Greenhead aims at max-imizing the InPs revenue by minimizing energycosts, while ensuring that the infrastructure is asenvironment-friendly as possible. To reduce the com-plexity of the problem, we propose a two-step ap-

    proach. We first divide a VDC request into partitionssuch that the inter-partition bandwidth demand isminimized and the intra-partition bandwidth is maxi-mized. The aim of such partitioning is to embed VMsexchanging high volumes of data in the same datacenter. This significantly reduces the traffic carried bythe backbone, and thereby improves requests accep-tance ratio. We then propose a simple yet efficient al-gorithm for assigning partitions to data centers basedon electricity prices, data centers PUEs, availabilityof renewables and the carbon footprint per unit ofpower.

    To the best of our knowledge, this is the firsteffort to address VDC embedding problem over adistributed infrastructure taking into account energyefficiency as well as environmental considerations.

    The remainder of this paper is organized as follows.In Section 2, we present related works relevant toours. We then describe the proposed managementframework in Section 3. We provide a mathematicalformulation of the VDC embedding problem acrossa distributed infrastructure in Section 4. Section 5presents a detailed description of the proposed algo-

    rithms for VDC partitioning and embedding. Section 6discusses the simulation results showing the effec-tiveness of Greenhead. We finally provide concludingremarks in Section 7.

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    3/14

    3

    2 RELATED WORK

    In this section, we survey relevant research in the lit-erature. We classified previous work into three relatedtopics, namely: VDC embedding within a single datacenter, virtual network embedding, and workloadscheduling across geographically distributed data

    centers. VDC Embedding within a single data centerSo far, only few works have addressed VDC embed-

    ding problem. For instance, Guo et al. [5] proposed adata center network virtualization architecture calledSecondNet that incorporates a greedy algorithm toallocate resources to VDCs. Ballani et al. [2] proposedtwo abstractions for VDCs, namely a virtual clusterand an oversubscribed virtual cluster. They developedOktopus, an implementation of those abstractions thatuses a greedy algorithm for mapping virtual resourcesto a tree-like physical topology. Finally, Zhani et al.

    [4] presented VDC Planner, a resource managementframework for data centers that leverages dynamicVM migration to improve the acceptance ratio ofVDCs, and thereby increases InPs revenue. They alsoproposed a VDC consolidation algorithm to minimizethe number of active physical servers during low-demand periods. Unfortunately, the above proposalscannot be directly applied to allocate resources inmultiple data centers due to the large size of theresulting topology. In addition, for a distributed en-vironment, different considerations should be takeninto account such as carbon footprint of the data

    centers and variability of electricity prices over timeand between different locations.

    The only work we are aware of that addressedmulti-data center embedding problem is that of Xinet al. [10]. They proposed an algorithm that usesminimum k-cut to split a request into partitions beforeassigning them to different locations. However, thiswork has only aimed at load balancing the work-load through request partitioning without consideringother objectives like revenue maximization, backbonenetwork usage optimization, energy efficiency andgreen IT. Furthermore, it does not consider constraints

    on the VM placement. Virtual Network EmbeddingVirtual network embedding has been extensively

    studied in the literature. It basically aims at embed-ding virtual nodes (mainly routers) and links on topof a physical backbone substrate. Current proposalshave addressed the embedding problem either in asingle domain (i.e., a backbone owned and managedby a single InP) or in multiple domains (i.e., multiplenetworks managed by different InPs).

    In the single domain case, the InP tries to embedthe virtual networks while aiming to achieve multiple

    objectives including: (1) minimizing the embeddingcost [11], (2) maximizing the acceptance ratio andrevenue [12], [13], and (3) improving energy efficiency[14], [15].

    In the multi-domain case, the request is provisionedacross multiple domains belonging to different InPs.Houidi et al. [16] proposed a centralized approachwhere the SP first splits the request using Max-FlowMin-Cut based on prices offered by different InPs thendecides where to place the partitions. Chowdhury etal. [17] proposed a distributed embedding solutioncalled PolyVine. In PolyVine, the virtual network re-quest is sent to a single InP, which tries to allocateas much resources as possible in his own networkbefore forwarding the un-embedded nodes and linksto a neighboring provider. The process continues re-cursively until the whole request is embedded.

    The above proposals on virtual network embeddingcannot be directly applied to the VDC embeddingproblem for many reasons. While a virtual networkcan be made of tens of nodes (mainly routers), aVDC, expected to be similar to a real data center,may comprise thousands of nodes of different types(e.g., VMs, virtual switches and routers). There is,therefore, a definite need for developing new solu-tions able to embed large scale VDCs and to considerthe diversity of resources. Finally, previous works donot take advantage of the variability of electricityprices between different locations and also ignoreenvironmental considerations. Workload placement in geographically dis-

    tributed data centersSeveral works have addressed the problem of work-

    load placement in geographically distributed data

    centers. They either aimed at reducing energy costs[18][20], or minimizing the carbon footprint [21], [22]or both [23].

    Generally, energy costs are cut down by takingadvantage of the variability of electricity prices be-tween different data centers and even at the samelocation over time. The carbon footprint is reducedby following the renewables available during someperiods of the day. For instance, Zhang et al. [18]used a model predictive control framework for ser-vice placement. Services are dynamically placed indata centers and migrated according to the demand

    and price fluctuation over time while considering themigration cost and the latency between services andend-users. Qureshi et al. [20] addressed the problemof replica placement and request routing in ContentDistribution Networks (CDN). They aimed at reduc-ing electricity costs by dynamically placing data atlocations with low electricity prices. Gao et al. [24]addressed the same problem but they aimed at min-imizing energy costs, carbon footprint and the delaybetween end users and the location of the data. Liuet al. proposed a framework for workload assign-ment and dynamic workload migration between data

    centers that minimizes the latency between end-usersand services while following renewables and avoidingusing power from the electricity grid [21], [22]. Le etal. [25] proposed a workload assignment framework

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    4/14

    4

    Fig. 2: VDC embedding across multiple data centers

    across multiple data centers that minimizes the costsof energy consumed by IT and cooling equipmentdepending on the fluctuations of electricity prices andthe variability of the data centers PUEs.

    However, the main limitation of the aforementionedproposals is that they ignore the communication pat-terns and the exchanged data between the VMs. Thismakes such approaches not applicable for embeddingVDCs since we need also to consider bandwidth anddelay requirements between the VDC components.

    In summary, our work is different from traditionalvirtual network and VDC embedding proposals since

    it considers resource allocation for VDCs across thewhole infrastructure including data centers and thebackbone network connecting them. It also differsfrom service placement works since we are provi-sioning all types of resources including computing,storage and notably networking (i.e., bandwidth anddelay).

    To the best of our knowledge, this work is thefirst effort to address VDC embedding across a dis-tributed infrastructure while considering the usage ofthe backbone network, the variability of electricityprices, energy efficiency as well as environmental

    considerations.

    3 SYSTEM ARCHITECTURE

    In this work, we consider a distributed infrastructureconsisting of multiple data centers located in differ-ent regions and interconnected through a backbonenetwork (see Fig. 2). The entire infrastructure (includ-ing the backbone network) is assumed to be ownedand managed by the same infrastructure provider.Each data center may operate on on-site renewableenergy (e.g., wind, solar) and resorts to electricity

    grid only when its on-site renewable energy becomesinsufficient. Unfortunately, renewables are not alwaysavailable as they depend on the data center location,the time of the day and external weather conditions.

    While renewable energy has no carbon footprint, en-ergy from the grid is usually produced by burningcoal, oil and gas, generating high levels of carbonemissions. As a result, whenever electricity is drawnfrom the grid, cloud provider has to pay a penaltyproportional to the generated carbon emission. Thegenerated carbon depends on the source of powerused by the electric grid supplier, which could be arenewable source or a conventional one or a mix ofboth. Furthermore, it is also worth noting that pricesof the grid electricity differ between regions and theyeven vary over time in countries with deregulated

    electricity markets.As shown in Fig. 2, a SP sends the VDC request

    specifications to the InP, which has the responsibilityof allocating the required resources. Naturally, thecloud provider will make use of its distributed infras-tructure with the objective of maximizing its revenueand minimizing energy costs and carbon footprint;this is where our proposed management framework,Greenhead, comes into play. Greenhead is composedof two types of management entities: (1) a centralcontroller that manages the entire infrastructure and(2) a local controller deployed at each data center to

    manage the data centers internal resources.The central management entity includes five com-

    ponents as depicted in Fig. 2:

    The Partitioning Module is responsible for splittinga VDC request into partitions such that inter-partition bandwidth is minimized. The aim ofthis module is to reduce the number of virtuallinks provisioned between data centers. Each par-tition is supposed to be entirely embedded intoa single data center. The motivation behind suchpartitioning will be explained in Section 5.1.

    The Partition Allocation Module is responsible forassigning partitions to data centers based on run-time statistics collected by the monitoring mod-ule. It ensures that all partitions are embedded

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    5/14

    5

    while achieving cost effectiveness, energy effi-ciency and green IT objectives such as reducingenergy costs from the power grid and maximiz-ing the use of renewable sources of energy.

    The Inter-data center Virtual Link Allocation Moduleallocates virtual links in the backbone network.Those virtual links connect VMs that have beenassigned to different data centers.

    The Monitoring Module is responsible for gath-ering different statistics from the data centers.The collected information includes PUE, resourceutilization, outdoor temperature, electricity priceand the amount of available renewable energy.

    The VDC Information Base contains all informationabout the embedded VDCs including their parti-tions and mapping either onto the data centersor the backbone network.

    Regarding the local controller at each data center,

    its main role is to manage the resources within thedata center itself. Specifically, it allocates resourcesfor a partition of a VDC as requested by the centralcontroller. If the embedding is not possible (for exam-ple, due to unavailability of resources), the local con-troller notifies the central controller. Subsequently, thepartition allocation module will attempt to find an-other data center able to embed the rejected partition.It is worth noting that different resource allocationschemes can be deployed locally at the data centers(e.g., VDC planner [4], SecondNet [5], Oktopus [2]).Finally, each local controller has to report periodically

    statistics including PUE, temperature, resource usageand availability of renewables to the central controller.

    4 PROBLEM FORMULATION

    In this section, we formally define the VDC em-bedding problem across multiple data centers as anInteger Linear Program (ILP). Table 1 describes thenotations used in our ILP model.

    We assume that time is divided into slots [1, . . ,T].The metrics characterizing each data center (e.g., PUE,electricity price) are measured at the beginning of

    each time slot and are considered constant duringthe corresponding time slot. Thus, for readability, weomit the time reference in all variables defined in theremainder of this section.

    The physical infrastructure is represented by agraph G(V W,E), where V denotes the set of datacenters and W the set of nodes of the backbone net-work. The set of edges E represents the physical linksin the backbone network. Each link is characterized byits bandwidth capacity and propagation delay.

    A VDC request j is represented by a graphGj(Vj , Ej). Each vertex v Vj corresponds to a

    VM, characterized by its CPU, memory and diskrequirements. Each edge e Ej is a virtual linkthat connects a pair of VMs. It is characterized byits bandwidth demand bw(e) and propagation delay

    TABLE 1: Table of notations

    Notation MeaningP UEi PUE of data center i

    i Electricity price in data center ii on-site renewable power cost in data center iNi Residual renewable power in data center iCi Carbon footprint per unit of power from the power

    grid in data center ii Cost per ton of carbon in data center i

    zjik

    A boolean variable indicating whether data center isatisfies the location constraint of VM k of VDC j

    xjik

    A boolean variable indicating whether VM k isassigned to data center i

    fe,e a boolean variable indicating whether the physicallink e E is used to embed the virtual link e Ej

    Dji Cost of embedding the VDC request j in data centeri

    Pi,IT Amount of power consumed only by IT equipment(i.e., servers and switches) in data center i

    Pji Total power consumed in data center ir Price per unit of resource type rb Price per unit of bandwidth

    cp Cost per unit of bandwidth in the backbone network

    d(e). Furthermore, each VDC j has a lifetime Tj . Weassume the revenue generated by VDC j, denoted byRj , to be proportional to the amount of CPU, memoryand bandwidth required by its VMs and links. LetR denote the different types of resources offered byeach node (i.e., CPU, memory and disk). The revenuegenerated by VDC j can be written as follows:

    Rj = (vVj

    rR

    Crj (v) r +

    e

    Ej

    bw(e) b) (1)

    where Crj (v) is the capacity of VM v belonging to theVDC j in terms of resource r, and r and b are theselling prices of a unit of resource type r and a unitof bandwidth, respectively.

    Furthermore, we assume that each VM v Vj mayhave a location constraint. Therefore, it can only beembedded in a particular set of data centers. To modelthis constraint, we define

    zjik =

    1 If the VM k of the VDC j can beembedded in data center i

    0 Otherwise.

    as a binary variable that indicates whether a VM k ofto VDC j can be embedded in a data center i.

    The problem of embedding a given VDC j acrossthe infrastructure involves to two steps:

    First, assign each VM k Vj to a data center.Hence, we define the decision variable xjik as:

    xjik =

    1 If the VM k of the VDC j isassigned to data center i

    0 Otherwise.

    Second, embed every virtual link belonging to Ej

    either in the backbone network if it connects twoVMs assigned to different data centers or within

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    6/14

    6

    the same data center, otherwise. To do so, wedefine the virtual link allocation variable fe,e as:

    fe,e =

    1 If the physical link e E is used toembed the virtual link e Ej

    0 Otherwise.

    Finally, the ultimate objective of the InP when em-

    bedding a VDC request is to maximize its profit de-fined as the difference between the revenue (denotedby Rj) and the total embedding cost, which consistsof the embedding cost in the data centers (denoted byDj) plus the embedding cost in the backbone networkPj . Hence, our problem can be formulated as an ILPwith the following objective function:

    Maximize Rj (Dj + Pj) (2)

    Subject to the following constraints (3)-(8):

    A VM has to be assigned to a data center that

    satisfies its location constraints:xjik z

    jik, k V

    j ,i V (3)

    A VM is assigned to one and only one data center:iV

    xjik = 1, k Vj (4)

    The capacity constraint of the backbone networklinks should not be exceeded:

    eEj

    fe,e bw(e) sbw(e), e E (5)

    where sbw(e) is the residual bandwidth of thebackbone network link e.

    The required propagation delay for every virtuallink allocated in the backbone should be satisfied:

    eE

    fe,e d(e) d(e), e Ej (6)

    The flow conservation constraint given by:

    fe1,e fe2,e = xd(e1)d(e) xs(e2)s(e),

    e1, e2 E, d(e1) = s(e2), e Vj (7)

    where s(e) and d(e) denote the source and desti-

    nation of link e, respectively. Furthermore, the central controller should also

    ensure that each data center is able to accom-modate VMs and virtual links assigned to it. Tomodel this constraint, let Gji (V

    ji , E

    ji ) denote a

    partition from Gj , where Vji and Eji are the set

    of VMs and virtual links belonging to VDC j andassigned to data center i. They can be written as

    Vji = {k Vj |xjik = 1}

    Eji = {e Ej |s(e) Vji and d(e

    ) Vji }

    We define the function

    Embedi(Gji ) =

    1 If data center i can

    accommodate Vji and Eji

    0 Otherwise.

    Hence, to ensure that the data center i can hostthe assigned VMs and links, we should satisfy:

    xjik Embedi(Gji ), k V

    j , i V (8)

    Let us now focus on the expression of the embed-ding costs Dj and Pj in the data centers and thebackbone network, respectively. Recall that these costsare part of the objective function.

    - The cost of embedding in the data centersIn this work, we evaluate the request embedding

    cost in the data centers in terms of energy and carbonfootprint costs. To do so, we first evaluate the amountof power required to embed the partition Gji in a datacenter i denoted by Pji .

    Let Pji,IT denote the amount of power consumedonly by IT equipment (i.e., servers and switches) inorder to accommodate Gji (expressed in kilowatt). Thisamount of power depends mainly on the local alloca-

    tion scheme, the current mapping and the availabilityof resources at data center i. The power consumedat the data center i by IT equipment and other sup-porting systems (e.g., cooling) to accommodate thepartition Gji can be computed as

    Pji = Pji,IT PUEi (9)

    where PUEi is the power usage effectiveness of datacenter i. The mix of power used in data center i isgiven by

    Pji = Pji,L + P

    ji,D (10)

    where P

    j

    i,L and P

    j

    i,D denote, respectively, the on-site consumed renewable power and the amount ofpurchased power from the grid. Note that the amountof on-site consumed power should not exceed theamount of produced power, which is captured byPji,L RNi, where RNi is the amount of residual re-newable power in data center i expressed in kilowatt.

    Hence, the embedding cost (expressed in dollar) ofthe partition Gji in data center i can be written as

    Dji = Pji,L i + P

    ji,D (i + iCi) (11)

    where i is the on-site renewable power cost in

    data center i expressed in dollars per kilowatt-hour($/kWh), i is the electricity price in data center iexpressed in dollars per kilowatt-hour ($/kWh), Ci isthe carbon footprint per unit of power used from thegrid in data center i expressed in tons of carbon perkWh (t/kWh) and i is the cost per unit of carbon($/t). Note that i includes the upfront investment,maintenance and operational costs.

    Finally, the total embedding cost of request j in allavailable data centers can be written as follows :

    Dj =iV

    Dji (12)

    - The cost of embedding in the backbone networkVirtual links between the VMs that have been as-

    signed to different data centers should be embedded

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    7/14

    7

    in the backbone network. Let Pj denote the costincurred by the InP in order to accommodate thosevirtual links. We assume that it is proportional to theirbandwidth requirements and the length of physicalpaths to which they are mapped. It is given by:

    Pj = eEj

    eE

    fe,e bw(e) cp (13)

    where cp is the cost incurred by the InP per unit ofbandwidth allocated in the backbone network.

    The above embedding problem can be seen asa combination of the bin packing problem andthe multi-commodity flow problem, which are bothknown to be NP-hard. In addition, in order to usean ILP solver, one should know the embedding costsof all possible partitions of the VDC graph in all datacenters. This means that each local controller has toprovide the central management framework with theembedding cost of every possible partition. This mayresult in a large computational overhead not only atlocal controllers but also at the central controller sincethe number of possible partitions can be significant,especially for large-scale VDC requests. Therefore, asolution that is both efficient and scalable is required.

    In the next section, we present our solution that,first, divides the VDC request into partitions suchthat the inter-partition bandwidth is minimized. Notethat minimizing the inter-partition bandwidth aimsat reducing the bandwidth usage within the back-bone network. Once, the partitioning is completed,

    we, then, use a greedy algorithm that places theobtained partitions in data centers based on locationconstraints and embedding costs that consider energyconsumption, carbon footprint, electricity prices andPUEs of the different facilities. Finally, the algorithmoptimally connects them through virtual links acrossthe backbone network.

    5 VDC PARTITIONING AND EMBEDDING

    As mentioned earlier, our solution consists of twostages: (1) VDC partitioning, and (2) partition embed-ding. In the following, we present these two stages.

    5.1 VDC Partitioning

    Before starting the embedding process, the VDC parti-tioning module splits the VDC request into partitionssuch that the inter-partition bandwidth is minimized.This allows to minimize the bandwidth usage insidethe backbone network.

    Our motivation stems from two main observations:(i) the cost of inter-data center network accounts for15% of the total cost, which is much higher than thecost of the intra-data center network [7], (ii) wide-area

    transit bandwidth is more expensive than buildingand maintaining the internal network of a data center[26], and (iii) the inter-data center network mightbecome a bottleneck, which will eventually reduce the

    Algorithm 1 Location-Aware Louvain Algorithm (LALA)

    1: IN: Gj(Vj , Ej): The VDC request to partition2: repeat3: Put every edge ofG in a single partition4: Save the initial modularity5: while Nodes moved between partitions do6: for all v Gj do7: Find the partition P such as if we move v from its

    partition to P:8: -Get a maximum modularity increase9: -There will not be two nodes with different location

    constraints in P10: if such a partition P exists then11: Move v to the partition P12: end if13: end for14: end while15: if current modularity > initial modularity then16: End false17: Change Gj to be the graph of partitions18: else19: End true20: end if21: until End

    acceptance ratio of VDC requests. Hence, to reducethe operational costs and avoid inter-data center even-tual bottleneck, it is highly recommended to reducethe inter-data center traffic [8].

    The VDC partitioning problem reduces to theweighted graph partitioning problem, which is knownto be NP-Hard [27]. Hence, we propose to use theLouvain algorithm [28]. We chose the Louvain algo-rithm because it is a heuristic algorithm that deter-

    mines automatically the number of partitions and haslow time complexity of O(n log n). Furthermore, it isshown to provide good results [28].

    The objective of the Louvain algorithm is to max-imize the modularity, which is defined as an indexbetween 1 and 1 that measures the intra-partitiondensity (i.e., the sum of the links weights insidepartitions) compared to inter-partition density (sumof the weights of links between partitions). In fact,graphs with high modularity have dense connections(i.e., high sum of weights) between the nodes withinpartitions, but sparse connections across partitions.

    In a nutshell, the original Louvain algorithm pro-ceeds as follows. Initially, every node is considered asa partition. The algorithm then considers each nodeand tries to move it into the same partition as one ofits neighbors. The neighboring node is chosen suchthat the gain in modularity is maximal. Then a newgraph is built by considering the partitions foundduring the first phase as nodes and by collapsinginter-partitions links into one link (the weight of thenew link is equal to the sum of the original linksweights). The same process is applied recursively tothe new graph until no improvement in the modular-

    ity is possible. For more details on the original versionof the Louvain algorithm, please refer to [28].

    However, one should note that this algorithm is notdirectly applicable to the VDC partitioning problem

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    8/14

    8

    Algorithm 2 Greedy VDC Embedding Across data centers

    1: IN: G(V W, E), GjM

    (VjM

    , EjM

    )

    2: OUT: Assign each partition in VjM

    to a data center, embed thelinks between the partitions assigned to different data centersin the backbone network

    3: for all i V do4: ToDC[i] {}5: end for

    6: for all v VjM do7: Sv {i V / i satisfies the location constraint}8: end for9: for all v Vj

    Mdo

    10: i s Sv with the smallest cost getCost(s, v), andLinksEmbedP ossible(s, v) = true

    11: if no data center is found then12: return FAIL13: end if14: ToDC[i] ToDC[i] {v}15: for all k N(v) do16: if k ToDC[i] then17: ToDC[i] ToDC[i] {evk}18: else19: if l = i V / k ToDC[l] then20: Embed evk in G using the shortest path21: end if22: end if23: end for24: end for25: return ToDC

    since it does not take into account location constraints.Indeed, two VMs with two different location con-

    straints should not be assigned to the same datacenter, and hence they have to belong to differentpartitions. However, the Louvain algorithm may notseparate them, which results in non-feasible solutions.To address this limitation, we modified the Louvainalgorithm to take into account location constraintsin the partitioning process. The resulting heuristicalgorithm, called Location-Aware Louvain Algorithm(LALA) is described in Algorithm 1. Basically, LALAprevents moving a node from one partition to anotherwhenever the location constraint could be violated.

    Note that, unlike previous approaches in the lit-erature, where the number of partitions is known[10] or based on star-shaped structures detection [29],LALA determines the number of partitions as well

    as the shape and size of the partitions based on themodularity.Once the VDC partitioning is completed, the second

    step is to assign the partitions to the data centersin such a way to minimize the operational costs aswell as the carbon footprint, and provision virtuallinks across the backbone network to connect them.In what follows, we describe the partition placementalgorithm.

    5.2 Partition Embedding problem

    Once a request Gj(Vj, Ej) is partitioned, the resultingpartitions that are connected through virtual links canbe seen as a multigraph GjM(V

    jM, E

    jM) where V

    jM is the

    set of nodes (partitions) and EjM is the set of virtual

    links connecting them. The next step is to embed thismultigraph in the infrastructure.

    Note that, at this stage, we can use the ILP formu-lation introduced in section 4 by replacing the VDCrequest Gj by its graph of partitions GjM. However,even if the VDC partitioning process significantlyreduces the number of components (partitions ratherthan VMs) to be embedded, the above formulated ILPis still NP-hard. Therefore, we propose a simple yetefficient heuristic algorithm to solve the ILP problem.

    Algorithm 2 describes the proposed partition em-bedding algorithm. For each partition v VjM, webuild the list of data centers able to host it based onthe location constraints (lines 6-8). The idea is to startby assigning the location-constrained partitions firstthen select the most cost effective data centers thatsatisfy these constraints. For each partition v VjMto embed, the central management entity queries the

    Local Controller of each data centers

    that satisfies thelocation constraints to get the embedding cost of v.The cost is returned by the remote call getCost(s, v),which includes both power and carbon footprint costsas described in equation (11). The next step is to selectthe data center that will host the partition v (lines 10-14). The selected data center is the one that incurs thelowest embedding cost (provided by the proceduregetCost(s, v)) and where it is possible to embed vir-tual links between v and all previously embedded par-titions (denoted by N(v)). Hence, the requirements ofall virtual links in terms of bandwidth and delay are

    satisfied (achieved when LinksEmbedPossible(s, v) =true). Furthermore, links between the partition v andother partitions assigned to different data centers areembedded in the backbone network using the shortestpath algorithm (lines 19-21).

    If the whole multigraph is successfully embedded,Algorithm 2 provides the mapping of all the par-titions to the data centers as well as the mappingof the virtual links that connect them in the thebackbone network. The complexity of this algorithm isO(|VjM| |V|), where |V

    jM| is the number of partitions

    and |V| is the number of data centers.

    6 PERFORMANCE EVALUATION

    In order to evaluate the performance of Greenhead,we run extensive simulations using realistic topologyand parameters. In the following, we present thesetting of the conducted simulations, the performancemetrics that we evaluated as well as the obtainedresults.

    6.1 Simulation Settings

    - Physical infrastructure:

    We consider a physical infrastructure of 4 datacenters situated in four different states: New York,Illinois, California and Texas. The data centers are con-nected through the NSFNet topology as a backbone

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    9/14

    9

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Time (hours)

    Normalizedvalues

    Electricity price

    Available renewablesCarbon footprint per unit of power

    Cost per Ton of carbon

    (a) New York

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Time (hours)

    Normalizedvalues

    Electricity price

    Available renewablesCarbon footprint per unit of power

    Cost per Ton of carbon

    (b) California

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Time (hours)

    Normalizedvalues

    Electricity price

    Available renewablesCarbon footprint per unit of power

    Cost per Ton of carbon

    (c) Illinois

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Time (hours)

    Normalizedvalues

    Electricity price

    Available renewablesCarbon footprint per unit of power

    Cost per Ton of carbon

    (d) Texas

    Fig. 3: Available renewables, electricity price, carbon footprint per unit of power and cost per unit of carbonin the data centers

    network [30]. NSFNet includes 14 nodes located atdifferent cities in the United States. Each data centeris connected to the backbone network through theclosest node to its location. We assume all NSFNetlinks have the same capacity of 10Gbps [8], [31]. Asillustrated in Fig. 3, the electricity price, the available

    renewable energy and the carbon footprint per unitof power drawn from the grid not only depends onthe location but are also subject to change over time.

    In our experiments, we simulate two working days(i.e., 48 hours). We use electricity prices reported bythe US Energy Information Administration (EIA) indifferent locations [32]. The amount of power gener-ated during two days are extracted from [33]. In orderto evaluate the carbon footprint generated at each datacenter, we use the values of carbon footprint per unitof power provided in [34]. We also use real solar andwind renewable energy traces collected from different

    US states [33], and considered the on-site renewablepower cost to be i = 0.01/kWh,i [35], [36]. In orderto evaluate PUEs of the different data centers, weadopted the technique described in [37].

    - VDC requests:

    In our simulations, similarly to previous works[4], [11][13], [16], VDCs are generated randomlyaccording to a Poisson process with arrival rate .Their lifetime follows an exponential distribution withmean 1/. This mimics a real cloud environmentwhere VDCs could be allocated for a particular lapse

    of time depending on the SP requirements. This isthe case for Amazon EC2, for example, where a SPcan dynamically create VMs and use them only fora specific duration. The number of VMs per VDC isuniformly distributed between 5 and 10 for small-sized VDCs and 20 and 100 for large-sized VDCs.Two VMs belonging to the same VDC are directlyconnected with a probability 0.5 with a bandwidth de-mand uniformly distributed between 10 and 50Mbpsand a delay uniformly distributed between 10 and100 milliseconds. In addition, in each VDC, a fractionof VMs, denoted by Ploc [0, 1], is assumed to have

    location constraints.- The baseline approach:

    Since, previous proposals on virtual network em-bedding and VDC embedding are not directly ap-

    plicable to the studied scenario (see Section 2), wedeveloped a baseline embedding algorithm that doesnot consider VDC partitioning. The baseline algorithmmaps a VDC to the physical infrastructure by embed-ding its VMs and links one by one. In other words,it applies the Greenhead embedding algorithm, while

    considering each single VM as a partition.- The simulator

    We developed a C++ discrete event simulator forthe central and local controllers, consisting of about3000 lines of code. The exchange between the centralcontroller and the local controllers is implementedusing remote procedure calls. The results are obtainedover many simulation instances for each scenario,with a margin of error less than 5%, then we calculatethe average value of performance metrics. We do notplot confidence intervals for the sake of presentation.

    - Performance Metrics

    In order to compare our approach to the baseline,we evaluate several performance metrics includingthe acceptance ratio, the revenue, energy costs, thecarbon footprint and the backbone network utiliza-tion. In particular, the acceptance ratio is defined asthe ratio of the number of embedded VDCs to the totalnumber of received VDCs (i.e., including embeddedand rejected VDCs). It is given by:

    At =UtNt

    (14)

    where Ut and Nt are the number of VDC requeststhat have been embedded and the total number ofreceived VDCs up to time t, respectively. The instan-taneous revenue at a particular time t is given by:

    R(t) =

    jQ(t)

    Rj (15)

    where Q(t) is the set of VDC requests embedded inthe infrastructure at time t and Rj as defined in (1).The cumulative revenue up to time t, denoted byCR(t), can then be written as:

    CR(t) =

    t

    0

    R(x) dx. (16)

    The instantaneous power, carbon footprint and

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    10/14

    10

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    Time (hours)

    Acceptanceratio

    GreenheadBaseline

    (a) Acceptance ratio

    0 10 20 30 40 500

    500

    1000

    1500

    2000

    Time (hours)

    Profit(USD)

    GreenheadBaseline

    (b) Instantaneous profit

    0 10 20 30 40 500

    5

    10

    15

    20

    25

    30

    35

    Time (hours)

    Averagelinkutilization

    GreenheadBaseline

    (c) Average link utilization in the .backbone network

    0 10 20 30 40 500

    2000

    4000

    6000

    8000

    Time (hours)

    Bandwidth(Mbps)

    GreenheadBaseline

    (d) Used bandwidth per request inthe backbone network

    Fig. 5: Greenhead vs the baseline. ( = 8 requests/hour, 1/ = 6 hours, Ploc = 0.15, duration=48 hours)

    backbone network cost is given by:

    C(t) =

    jQ(t)

    Djt + Pj (17)

    where Djt is defined in (12). Note that we addthe time slot in the subscript to the definition of the

    Dj

    t since we are considering the variations betweendifferent time slots. The cumulative cost up to time tcan be written as:

    CC(t) =

    t0

    C(x) dx. (18)

    Naturally, the instantaneous and cumulative profitsare given by the difference between the instantaneousrevenue and cost and the cumulative revenue andcost, respectively.

    Finally, in order to compare Greenhead resourceallocation scheme to other schemes, we define the

    cumulative objective function at time t as the sumof objective function values associated to the VDCsembedded at that time. It can be written as

    B(t) =

    jQ(t)

    (Rj (Dj + Pj)) (19)

    where Rj(Dj +Pj) is the objective function scoreof embedding VDC j as defined in equation (2).

    6.2 Simulation results

    Through extensive experiments, we first show the

    effectiveness of our framework in terms of timecomplexity, acceptance ratio, revenue and backbonenetwork utilization. Then, we study the utilizationof available renewable energy in the different data

    0 10 20 30 40 500

    50

    100

    150

    200

    250

    Time (hours)

    ObjectiveFunctionValue

    Greenhead

    Baseline

    Optimal

    Fig. 4: Cumulative objective function obtained with Greenhead,the baseline and the ILP solver

    centers. Finally, we investigate the carbon footprintand we discuss how to spur development of greeninfrastructure.

    1) Greenhead provides near-optimal solutionwithin a reasonable time frame

    First, we compare Greenhead to an optimal solutionprovided by an ILP solver, as well as to the baselinein terms of computational time and solution quality(i.e., cumulative objective function). In our first set ofsimulations, we fixed the arrival rate to 8 requestsper hour, the average lifetime 1/ to 6 hours and thefraction of location-constrained VMs Ploc to 0.15. Theexperiments were conducted on a machine with a 3.4GHz dual core processor and 4.00 GB of RAM runningLinux Ubuntu. To compute the optimal solution, wedeveloped a C++ implementation of the branch-and-bound algorithm.

    Fig. 4 compares the cumulative objective function

    (equation (19)) of the aforementioned algorithms forsmall-sized VDC requests consisting of fully con-nected 5-10 VMs. We can observe that the meanvalues obtained for Greenhead are very close or evenoverlap with the values obtained with the ILP solver.This means that the Greenhead approach provides asolution close to the optimal one. We can also seethat Greenhead improves the cumulative objectivefunction value by up to 25% compared to the baseline.

    Table 2 reports the average computation timeneeded to partition and embed a VDC request. Theresults show that Greenhead takes a very short time

    to partition and embed a VDC request (less thanone millisecond for small-sized VDCs and up to 31millisecond for larger VDCs). On the other hand, theILP solver takes more than 13 seconds for small-sized VDCs. The Baseline, however, needs the leastcomputation time since no partitioning is performed.Note that the results for the optimal solution in large-sized VDCs were not reported since the solver wasnot able to find the optimal solution due to memoryoutage.

    2) Improve backbone network utilization, accep-tance ratio and revenue

    In the second set of experiments, we compareGreenhead to the baseline approach in terms of ac-ceptance ratio, instantaneous revenue and backbonenetwork utilization. To do so, we, first, fixed the

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    11/14

    11

    0 0.2 0.4 0.6 0.8 10.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Proportion of locationconstrained VMs

    Acceptaceratio

    GreenheadBaseline

    (a) Acceptance Ratio

    0 0.2 0.4 0.6 0.8 10

    1

    2

    3

    4

    5

    6x 10

    4

    Proportion of locationconstrained VMs

    Profit(USD)

    Greenhead

    Baseline

    (b) Instantaneous profit

    0 0.2 0.4 0.6 0.8 10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    Proportion of locationconstrained VMs

    Backbonenetworkutilization

    GreenheadBaseline

    (c) Average link utilization in thebackbone network

    Fig. 7: Impact of the fraction of location-constrained VMs. ( = 8 requests/hour)

    TABLE 2: Computation time for Greenhead, the base-line and the ILP solver (in milliseconds)

    VDC Greenhead Baseline ILPsize Partitioning Embedding Total Solver

    5-10 VMs 0.214 0.061 0.275 0.079 1354020-100 VMs 31.41 0.28 31.69 2.2 -

    0 5 10 15 20

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Arrival rate (request/hour)

    Acceptaceratio

    GreenheadBaseline

    (a) Acceptance Ratio

    0 5 10 15 200

    1

    2

    3

    4

    5

    6

    7

    8x 10

    4

    Arrival rate (request/hour)

    Profit(USD)

    GreenheadBaseline

    (b) Profit

    Fig. 6: Acceptance ratio and revenue for differentarrival rates (Ploc = 0.10)

    arrival rate to 8 requests per hour, the averagelifetime 1/ to 6 hours and the fraction of location-constrained VMs Ploc to 0.15, and we simulated theinfrastructure for 48 hours. Results are illustratedin Fig. 5. From this figure, we can see that Green-head achieves, on average, 40% higher acceptanceratio than the baseline (Fig. 5(a)) and up to 100%more instantaneous profit (Fig. 5(b)). Although bothschemes lead to almost the same utilization of thebackbone network on average (Fig. 5(c)), they differ

    in the fact that Greenhead avoids embedding virtuallinks with high bandwidth demand in the backbonenetwork thanks to the partitioning algorithm. Hence,it ensures that the embedded requests consume asless bandwidth as possible inside the backbone net-work. This is confirmed by Fig. 5(d), which comparesthe average used bandwidth per request inside thebackbone network for both schemes. It is clear thatrequests embedded by Greenhead use on average 40%less bandwidth in the backbone network than thebaseline algorithm.

    Fig. 6 and 7 show the performance results when

    varying the arrival rate and Ploc, respectively.From Fig. 6, we can notice that as the arrival rate

    increases, more requests are embedded, which resultsin higher revenue. At the same time, the acceptance

    ratio goes down since there is no room to accept all theincoming requests. It is also clear from this figure thatthe acceptance ratio as well as the revenue are alwayshigher for Greenhead compared to the baseline.

    However, this benefit is reduced when Ploc = 0 asshown in Fig. 7. In fact, when there are no location

    constraints, the VDCs can be hosted in any datacenter, and hence, their placement is only driven bythe availability of renewables, the electricity price andthe carbon footprint. In practice, if the data centers arenot overloaded, any particular VDC is entirely hostedin the same data center. This results in low backbonenetwork utilization as shown in Fig. 7(c). On the otherhand, when Ploc = 1, all the VMs have to be placed asrequired by the SP. As a result, the Greenhead is notable to perform any optimization. Finally, when thefraction of the constrained VMs is between 0 and 1,the Greenhead has more freedom to decide of the non-

    constrained VMs placement. In this case, Greenheadis able to optimize VDCs allocation and significantlyimprove the acceptance ratio and revenue comparedto the baseline.

    0 10 20 30 40 500

    5

    10

    15x 10

    4

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (a) Greenhead

    0 10 20 30 40 500

    2

    4

    6

    8

    10

    12

    14x 10

    4

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (b) The baseline

    Fig. 8: Power consumption across the infrastructure( = 8 requests/hour, Ploc = 0.20)

    3) Maximize renewables usage

    To illustrate how our proposed framework exploitsthe renewables in the different data centers, we stud-ied the power consumption across the infrastructureand particularly the usage of renewable energy. Fig.8 shows the total power consumption across the

    infrastructure for both Greenhead and the baselineapproach. It is clear from this figure that Greenheadconsumes much more power than the baseline sinceit accepts more VDC requests. We can also see that

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    12/14

    12

    0 10 20 30 40 500

    0.5

    1

    1.5

    2

    2.5

    3

    3.5x 10

    5

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (a) Ploc = 0.0

    0 10 20 30 40 500

    0.5

    1

    1.5

    2

    2.5x 10

    5

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (b) Ploc = 0.1

    0 10 20 30 40 500

    2

    4

    6

    8

    10

    12

    14x 10

    4

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (c) Ploc = 0.6

    0 10 20 30 40 500

    2

    4

    6

    8

    10

    12

    14x 10

    4

    Power(W)

    Time (hours)

    Available renewable powerConsumed renewable powerTotal consumed power

    (d) Ploc = 1

    Fig. 9: The utilization of the renewables in all data centers for different fractions of location-contained nodesPloc for Greenhead ( = 8 requests/hour)

    it uses up to 30% more renewable power than thebaseline.

    Fig. 9 shows the impact of the fraction of location-constrained VMs on the power consumption acrossthe infrastructure. We can notice that, as the fraction

    of constrained nodes increases, Greenhead uses morepower from the grid. For instance, with Ploc = 0,Greenhead uses 100% of available renewables. How-ever, when Ploc is getting higher, up to 15% of theavailable renewables are not used. This is due to thefact that the VMs with location constraints can onlybe embedded in specific data centers, which may nothave available renewables. Consequently, more poweris drawn from the grid.

    4) Reduce energy consumption and carbon foot-print per request.

    Fig. 10 compares the obtained results for both

    schemes for all studied metrics. We can observe thatGreenhead improves up to 40% the acceptance ratiowhich translates into 48% more profit. Furthermore,Greenhead uses up to 15% more renewables andreduces the consumed power per request by 15%compared to the baseline approach. In addition, wecan notice that, while Greenhead boosts significantlythe profit up to 48%, it generates the same amount ofcarbon footprint compared to the baseline approach.

    0

    0.2

    0.4

    0.6

    0.8

    1

    Acce

    ptanc

    erati

    o

    Cumu

    lative

    profit

    Backbo

    nenetw

    orkutiliz

    ation

    Cons

    umed

    powe

    rper

    reque

    st

    Utiliza

    tiono

    frenewa

    bles

    Carbo

    nfootpr

    intpe

    rrequ

    est

    GreenheadBaseline

    Fig. 10: Comparison of the average values of thedifferent metrics

    5) Green infrastructure is possible through tuning,at the expense of power cost.

    Finally, Fig. 11 shows the impact of varying the costper unit of carbon (i = , i V) on the carbon

    0 100 200 300 400 500 6000

    0.2

    0.4

    0.6

    0.8

    1

    Cost per ton of carbon ($/t)

    Carbon footprintCost of power from the grid

    Fig. 11: The carbon footprint (normalized values) ofthe whole infrastructure with variable cost per ton ofcarbon

    footprint in the whole infrastructure as well as thetotal power cost. In this experiment, is set equal to8 request/hour and Ploc equal to 0.1. From this figure,we can see that a tradeoff between the carbon foot-

    print and the power cost can be achieved. In addition,we can notice that an InP can set a carbon footprinttarget to reach by choosing the corresponding valueof. For instance, one can reduce the carbon footprintby 12% while increasing the power cost by only 32%by setting to 80 $/t.

    It is worth noting that nowadays, the carbon cost isimposed by governments as a carbon tax whose costis between 25 and 30 $ [38][40]. According to Fig. 11,such a cost is not enough to force InPs to reduce theircarbon footprint.

    To explain the power cost increase when reduc-

    ing the carbon footprint, lets explore Fig. 12, whichpresents the power consumption in different datacenters. From this figure, we can notice that for smallvalues of (i.e., 160$), Greenhead uses morethe data centers in Illinois and New York. Thesetwo data centers have low electricity prices (see Fig.3) but high carbon footprint (0.0006 ton/Kwh and0.0005 ton/Kwh, respectively). However, as in-creases, Greenhead uses the data center in Californiasince it has the smallest carbon footprint per unit ofpower (0.0003 ton/Kwh) but a higher electricity price(on average, 100% higher compared to New York data

    center).Consequently, we can conclude that: (i) to reduce

    data centers carbon footprint, governments shouldconsider much higher carbon taxes, and (ii) using

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    13/14

    13

    Greenhead, a socially-responsible InP should considerhigher carbon costs, even by artificially increasingthese costs, to force Greenhead to use environment-friendly data centers to reduce the carbon footprint.

    200 0 200 400 600 8000

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Cost per ton of carbon ($/t)

    Propotion

    New York

    California

    IllinoisTexas

    Fig. 12: The power from the grid (normalized values)used in different data centers with variable cost perton of carbon

    7 CONCLUSIONS

    The last few years witnessed a massive migrationof businesses, services and applications to the cloud.Cloud providers take advantage of the worldwidemarket to deploy geographically distributed infras-tructures and enlarge their coverage. However, multi-ple data centers consume massive amounts of power.Furthermore, their carbon footprint is a rapidly grow-ing fraction of total emissions. In this paper, weproposed Greenhead, a holistic resource managementframework for embedding VDCs across a geograph-

    ically distributed infrastructure. The goal of Green-head is to find the best trade-off between maxi-mizing revenue, reducing energy costs and ensuringthe environmental friendliness of the infrastructure.The key idea of the proposed solution is to conquerthe complexity of the problem by partitioning theVDC request based on the bandwidth requirementsbetween the VMs. The partitions and the virtual linksconnecting them are then dynamically assigned to theinfrastructure data centers and backbone network inorder to achieve the desired objectives.

    We conducted extensive simulations for four data

    centers connected through the NSFNet topology. Theresults show that Greenhead provides near-optimalsolution within a reasonable computational timeframe and improves requests acceptance ratio andInP revenue by up to 40% while ensuring high us-age of renewable energy and minimal footprint perrequest.

    REFERENCES

    [1] Amazon Elastic Compute Cloud (Amazon EC2),http://aws.amazon.com/ec2/.

    [2] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, To-

    wards predictable datacenter networks, in Proceedings of theACM SIGCOMM 2011, pp. 242253.

    [3] M. F. Bari, R. Boutaba, R. Esteves, Z. Granvilleand, Lisandro,M. Podlesny, G. Rabbani, Md., Q. Zhang, and M. F. Zhani,Data center network virtualization: A survey, 2012.

    [4] M. F. Zhani, Q. Zhang, G. Simon, and R. Boutaba, VDC Plan-ner: Dynamic Migration-Aware Virtual Data Center Embed-ding for Clouds, in Proceedings IFIP/IEEE Integrated Network

    Management Symposium (IM 2013), May 2013.[5] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and

    Y. Zhang, SecondNet: a data center network virtualizationarchitecture with bandwidth guarantees, in Proceedings of the6th International Conference, ser. Co-NEXT, 2010, pp. 112.

    [6] Amin Vahdat, SDN Stack for Service Provider Networks,

    Open Networking Summit, 2012.[7] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, The

    cost of a cloud: research problems in data center networks,SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 6873,Dec. 2008.

    [8] Forrester Research, The future of data center wide-area net-working, 2010.

    [9] ITU, Toolkit on Environmental Sustainability for the ICT Sec-tor (ESS), 2012. [Online]. Available: http://www.itu.int/ITU-T/climatechange/ess/index.html

    [10] Y. Xin, I. Baldine, A. Mandal, C. Heermann, J. Chase, andA. Yumerefendi, Embedding virtual topologies in networkedclouds, in Proceedings of the 6th International Conference onFuture Internet Technologies, ser. CFI 11, 2011, pp. 2629.

    [11] M. Chowdhury, M. Rahman, and R. Boutaba, Vineyard: Vir-tual network embedding algorithms with coordinated nodeand link mapping, IEEE/ACM Transactions on Networking,vol. 20, no. 1, pp. 206 219, feb. 2012.

    [12] X. Cheng, S. Su, Z. Zhang, H. Wang, F. Yang, Y. Luo, andJ. Wang, Virtual network embedding through topology-awarenode ranking, SIGCOMM Comput. Commun. Rev., vol. 41,no. 2, pp. 3847, Apr. 2011.

    [13] I. Fajjari, N. Aitsaadi, G. Pujolle, and H. Zimmermann, VNE-AC: Virtual Network Embedding Algorithm Based on AntColony Meta-heuristic, in IEEE International Conference onCommunications (ICC), june 2011, pp. 16.

    [14] J. F. Botero, X. Hesselbach, M. Duelli, D. Schlosser, A. Fischer,and H. de Meer, Energy Efficient Virtual Network Embed-ding, Communications Letters, IEEE, vol. 16, no. 5, pp. 756 759, may 2012.

    [15] S. Su, Z. Zhang, X. Cheng, Y. Wang, Y. Luo, and J. Wang,

    Energy-aware virtual network embedding through consoli-dation. in 2012 IEEE Conference on Computer CommunicationsWorkshops (INFOCOM WKSHPS). IEEE, 2012, pp. 127132.

    [16] I. Houidi, W. Louati, W. Ben Ameur, and D. Zeghlache,Virtual network provisioning across multiple substrate net-works, Comput. Netw., vol. 55, no. 4, pp. 10111023, Mar. 2011.

    [17] M. Chowdhury, F. Samuel, and R. Boutaba, Polyvine: policy-based virtual network embedding across multiple domains,in Proceedings of the second ACM SIGCOMM workshop on Virtu-alized infrastructure systems and architectures, ser. VISA 10, 2010,pp. 4956.

    [18] Q. Zhang, Q. Zhu, M. F. Zhani, and R. Boutaba, DynamicService Placement in Geographically Distributed Clouds, inIEEE 32nd International Conference on Distributed ComputingSystems (ICDCS), June 2012, pp. 526 535.

    [19] M. Adnan, R. Sugihara, and R. Gupta, Energy Efficient

    Geographical Load Balancing via Dynamic Deferral of Work-load, in IEEE 5th International Conference on Cloud Computing(CLOUD), june 2012, pp. 188 195.

    [20] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, andB. Maggs, Cutting the electric bill for internet-scale systems,SIGCOMM Comput. Commun. Rev., vol. 39, no. 4, pp. 123134,Aug. 2009.

    [21] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. Andrew,Geographical load balancing with renewables, SIGMETRICSPerform. Eval. Rev., vol. 39, no. 3, pp. 6266, Dec. 2011.

    [22] , Greening geographical load balancing, in Proceedingsof the ACM SIGMETRICS joint international conference on Mea-surement and modeling of computer systems, ser. SIGMETRICS11, 2011, pp. 233244.

    [23] J. He, X. Deng, D. Wu, Y. Wen, and D. Wu, Socially-Responsible Load Scheduling Algorithms for Sustainable Data

    Centers over Smart Grid, in 3rd IEEE International Conferenceon Smart Grid Communications (IEEE SmartGridComm), Taiwan ,November 2012.

    [24] P. X. Gao, A. R. Curtis, B. Wong, and S. Keshav, Its not easybeing green, in Proceedings of the ACM SIGCOMM conference

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7/27/2019 virtual embedding.pdf

    14/14

    14

    on Applications, technologies, architectures, and protocols for com-puter communication, ser. SIGCOMM12, New York, NY, USA,2012, pp. 211222.

    [25] K. Le, J. Zhang, J. Meng, R. Bianchini, Y. Jaluria, andT. Nguyen, Reducing electricity cost through virtual machineplacement in high performance computing clouds, in Inter-national Conference for High Performance Computing, Networking,Storage and Analysis (SC), nov. 2011, pp. 112.

    [26] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim,

    P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, VL2: ascalable and flexible data center network, in Proceedings ofthe ACM SIGCOMM conference on Data communication, ser.SIGCOMM 09, 2009, pp. 5162.

    [27] S. E. Schaeffer, Graph clustering, Computer Science Review,vol. 1, no. 1, pp. 27 64, 2007.

    [28] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre,Fast unfolding of communities in large networks, Journal ofStatistical Mechanics: Theory and Experiment, Oct. 2008.

    [29] T. Ghazar and N. Samaan, Hierarchical approach for effi-cient virtual network embedding based on exact subgraphmatching, in Proceedings of the IEEE Global CommunicationsConference (GLOBECOM), 2011.

    [30] The National Science Foundation Network (NSFNET),http://www.nsf.gov.

    [31] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez, Inter-datacenter bulk transfers with Netstitcher, in SIGCOMM,2011, pp. 7485.

    [32] U.S. Energy Information Administration,http://www.eia.gov.

    [33] The Renewable Resource Data Center (RReDC) website,http://www.nrel.gov/rredc/, 2012.

    [34] Carbon Footprint Calculator, http://www.carbonfootprint.com,2012.

    [35] Xiang, Deng and Di, Wu and Junfeng, Shen and Jian, He,Eco-Aware Online Power Management and Load Schedulingfor Green Cloud Datacenters, CS Department, Sun Yat-senUniversity, Technical report, April 2013.

    [36] Phil Heptonstall, A Review of Electricity Unit Cost Esti-mates, UK Energy Research Centre Working Paper, May 207.

    [37] A. Qouneh, C. Li, and T. Li, A quantitative analysis of cooling

    power in container-based data centers, 2011.[38] Carbon taxation and fiscal consolidation, the potential ofcarbon pricing to reduce Europes fiscal deficits, Report

    prepared for the European Climate Foundation and Green Budget,May 2012. [Online]. Available: http://bit.ly/L7i3td

    [39] Climate Action Plan Tax, City of Boulder, Colorado, June2010. [Online]. Available: http://bit.ly/XyGk32

    [40] British columbia carbon tax, February 2008. [Online].Available: http://bit.ly/JLUurv

    Ahmed Amokrane received the Engineerand M.Sc. degrees in computer science fromEcole Nationale Superieure dInformatique(ESI), Algeria and Ecole Normale Superieurede Cachan (ENS Cachan), France, in 2010and 2011, respectively. He is currently aPh.D. candidate at the University Pierre andMarie Curie, Paris, France. His research in-terests include energy efficient and greennetworking in wireless networks, datacentersand the cloud. He graduated with honors

    from the ESI and ENS Cachan.

    Mohamed Faten Zhani received engineer-ing and M.S. degrees from the NationalSchool of Computer Science, Tunisia in 2003and 2005, respectively. He received his Ph.D.in Computer science from the University ofQuebec in Montreal, Canada in 2011. Since

    then, he has been a postdoctoral researchfellow at the University of Waterloo. His re-search interests include virtualization, re-source management in cloud computing en-vironments network performance evaluation

    Rami Langar is currently an Associate Pro-fessor at the LIP6, University of Pierre andMarie Curie - Paris 6, Paris, France. He re-ceived the M.S. degree in network and com-puter science from the University of Pierreand Marie Curie - Paris 6 in 2002; and thePh.D. degree in network and computer sci-ence from Telecom ParisTech, Paris, France,in 2006. In 2007 and 2008, he was with the

    School of Computer Science, University ofWaterloo, Waterloo, ON, Canada, as a Post-

    Doctoral Research Fellow. His research interests include mobilityand resource management in wireless mesh, vehicular ad-hoc andfemtocell networks, performance evaluation and quality-of-servicesupport.

    Raouf Boutaba received the M.Sc. andPh.D. degrees in computer science from theUniversity Pierre and Marie Curie, Paris, in1990 and 1994, respectively. He is currently aprofessor of computer science at the Univer-

    sity of Waterloo and a distinguished visitingprofessor at the division of IT convergenceengineering at POSTECH. His research in-terests include network, resource and ser-vice management in wired and wireless net-works. He is the founding editor in chief of

    the IEEE Transactions on Network and Service Management (2007-2010) and on the editorial boards of other journals. He has receivedseveral best paper awards and other recognitions such as thePremiers Research Excellence Award, the IEEE Hal Sobol Awardin 2007, the Fred W. Ellersick Prize in 2008, and the Joe LociCeroand the Dan Stokesbury awards in 2009. He is a fellow of the IEEE.

    Guy Pujolle received the Ph.D. and ThsedEtat degrees in Computer Science fromthe University of Paris IX and Paris XI on1975 and 1978 respectively. He is currently aProfessor at Pierre et Marie Curie University- Paris 6, a distinguished invited professorat POSTECH, Korea, a member of the Insti-tut Universitaire de France, and a memberof The Royal Physiographical Academy ofLund, Sweden. He spent the period 1994-2000 as Professor and Head of the com-

    puter science department of Versailles University. He was alsoProfessor and Head of the MASI Laboratory at Pierre et MarieCurie University (1981-1993), Professor at ENST (1979-1981), anda member of the scientific staff of INRIA (1974-1979). Guy Pu-

    jolle is the French representative at the Technical Committee onNetworking at IFIP. He is an editor for ACM International Journalof Network Management, Telecommunication Systems, and Editorin Chief of Annals of Telecommunications. Guy Pujolle is a pio-neer in high-speed networking having led the development of thefirst Gbit/s network to be tested in 1980. He was participating inseveral important patents like DPI or virtual networks. Guy Pu-jolle is co-founder of QoSMOS (www.qosmos.fr), Ucopia Commu-nications (www.ucopia.com), EtherTrust (www.ethertrust.com), Vir-tuor (www.VirtuOR.fr), and Green Communications (www.green-communications.fr).

    IEEE TRANSACTIONS ON CLOUD COMPUTING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.