Towards Trustworthy Resource Scheduling in Clouds

12
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013 973 Towards Trustworthy Resource Scheduling in Clouds Imad M. Abbadi and Anbang Ruan Abstract—Managing the allocation of cloud virtual machines at physical resources is a key requirement for the success of clouds. Current implementations of cloud schedulers do not consider the entire cloud infrastructure neither do they consider the overall user and infrastructure properties. This results in major security, privacy, and resilience concerns. In this paper, we propose a novel cloud scheduler which considers both user requirements and in- frastructure properties. We focus on assuring users that their vir- tual resources are hosted using physical resources that match their requirements without getting users involved with understanding the details of the cloud infrastructure. As a proof-of-concept, we present our prototype which is built on OpenStack. The provided prototype implements the proposed cloud scheduler. It also pro- vides an implementation of our previous work on cloud trust man- agement which provides the scheduler with input about the trust status of the cloud infrastructure. Index Terms—Access control, computer security, information security, Enterprise Resource planning. I. INTRODUCTION C LOUD infrastructure is complex and heterogeneous in nature, with numerous components provided by different vendors [11]. Applications deployed in the cloud might need to interact amongst themselves and, in some cases, depend on other deployed applications. The complexity of the infrastruc- ture and application dependencies create an environment which requires careful management and raises security and privacy concerns [15], [31]. The central component that manages the allocation of virtual resources for a cloud infrastructure’s phys- ical resources is known as the cloud scheduler [27]. Currently available schedulers do not consider users’ security and privacy requirements, neither do they consider the properties of the en- tire cloud infrastructure. For example, a cloud scheduler should consider application performance requirements (e.g., the phys- ical hosting of interdependent application components need to be within physical proximity) and user security and privacy re- quirements (e.g., mitigates the threats caused by the multitenant architecture [31] and considers the trust status of the hosting components). Manuscript received May 30, 2012; revised February 19, 2013; accepted Feb- ruary 19, 2013. Date of publication February 25, 2013; date of current ver- sion May 16, 2013. This work was supported by the TClouds project, which is funded by the EU’s Seventh Framework Program ([FP7/2007–2013]) under Grant agreement ICT-257243. The associate editor coordinating the review of this manuscript and approving it for publication was William Horne. The authors are with the Department of Computer Science, University of Oxford, Oxford OX1 3QD, U.K. (e-mail: [email protected]; Anbang. [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TIFS.2013.2248726 This paper proposes a trustworthy scheduling algorithm that can automatically manage the cloud infrastructure by con- sidering both user requirements and infrastructure properties and policies. The paper also develops the required trustworthy software agents which automatically manage the collection of the properties of physical resources. Having a trustworthy and timely copy of the infrastructure properties and user re- quirements is critical for the correct operation of the scheduler. This paper specically focuses on providing the scheduler with trustworthy input about the trust status of the cloud infrastruc- ture and it establishes the foundations of planned future work to cover other properties. OpenStack refers to its cloud scheduler component using the name “nova-scheduler” [29]—it identies the scheduler as the most complex component to develop and states that signicant effort still remains to have an appropriate cloud scheduler. To develop the trustworthy scheduler component, it is important to have an understanding of how clouds are managed and how they work in practice, which Abbadi et al. has covered in previous work [9], [10]. From this we concluded that establishing trust in clouds requires two mutually dependent elements: a) Trust- worthy mechanisms and tools to help cloud providers automate the process of managing, maintaining, and securing the infra- structure; and b) methods for cloud users and providers to estab- lish trust in the operation of the infrastructure. Point (a) includes (but is not limited to) supporting the cloud infrastructure with trustworthy self-managed services which automatically manage the cloud infrastructure [9], [11]. Automated self-managed ser- vices should provide cloud computing with exceptional capa- bilities and new features. For example, scale per use, hiding the infrastructure’s complexity, automated reliability, availability, scalability, dependability, and resilience that consider users’ se- curity and privacy requirements by design [9]. The proposed cloud scheduler belongs to point (a) which is our rst step in providing the trustworthy self-managed services. The previous work in [9], [12] partially covers point (b), which we also pro- totype and integrate with the proposed scheduler to present a coherent solution. The paper is organized as follows. Section II provides essen- tial foundations which cover cloud structure and management services. Section III discusses the cloud’s compositional chains of trust. Section IV presents our framework architecture and Section V presents our prototype. Finally, Section VI discusses related work and Section VII concludes the paper. II. BACKGROUND In this section we briey summarize the previous work that this paper builds on. Specically, we highlight work describing the structure of clouds and cloud management services. 1556-6013/$31.00 © 2013 IEEE

description

ieee papers for students

Transcript of Towards Trustworthy Resource Scheduling in Clouds

Page 1: Towards Trustworthy Resource Scheduling in Clouds

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013 973

Towards Trustworthy Resource Scheduling in CloudsImad M. Abbadi and Anbang Ruan

Abstract—Managing the allocation of cloud virtual machines atphysical resources is a key requirement for the success of clouds.Current implementations of cloud schedulers do not consider theentire cloud infrastructure neither do they consider the overalluser and infrastructure properties. This results in major security,privacy, and resilience concerns. In this paper, we propose a novelcloud scheduler which considers both user requirements and in-frastructure properties. We focus on assuring users that their vir-tual resources are hosted using physical resources that match theirrequirements without getting users involved with understandingthe details of the cloud infrastructure. As a proof-of-concept, wepresent our prototype which is built on OpenStack. The providedprototype implements the proposed cloud scheduler. It also pro-vides an implementation of our previous work on cloud trust man-agement which provides the scheduler with input about the truststatus of the cloud infrastructure.

Index Terms—Access control, computer security, informationsecurity, Enterprise Resource planning.

I. INTRODUCTION

C LOUD infrastructure is complex and heterogeneous innature, with numerous components provided by different

vendors [11]. Applications deployed in the cloud might needto interact amongst themselves and, in some cases, depend onother deployed applications. The complexity of the infrastruc-ture and application dependencies create an environment whichrequires careful management and raises security and privacyconcerns [15], [31]. The central component that manages theallocation of virtual resources for a cloud infrastructure’s phys-ical resources is known as the cloud scheduler [27]. Currentlyavailable schedulers do not consider users’ security and privacyrequirements, neither do they consider the properties of the en-tire cloud infrastructure. For example, a cloud scheduler shouldconsider application performance requirements (e.g., the phys-ical hosting of interdependent application components need tobe within physical proximity) and user security and privacy re-quirements (e.g., mitigates the threats caused by the multitenantarchitecture [31] and considers the trust status of the hostingcomponents).

Manuscript receivedMay 30, 2012; revised February 19, 2013; accepted Feb-ruary 19, 2013. Date of publication February 25, 2013; date of current ver-sion May 16, 2013. This work was supported by the TClouds project, whichis funded by the EU’s Seventh Framework Program ([FP7/2007–2013]) underGrant agreement ICT-257243. The associate editor coordinating the review ofthis manuscript and approving it for publication was William Horne.The authors are with the Department of Computer Science, University of

Oxford, Oxford OX1 3QD, U.K. (e-mail: [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIFS.2013.2248726

This paper proposes a trustworthy scheduling algorithmthat can automatically manage the cloud infrastructure by con-sidering both user requirements and infrastructure propertiesand policies. The paper also develops the required trustworthysoftware agents which automatically manage the collectionof the properties of physical resources. Having a trustworthyand timely copy of the infrastructure properties and user re-quirements is critical for the correct operation of the scheduler.This paper specifically focuses on providing the scheduler withtrustworthy input about the trust status of the cloud infrastruc-ture and it establishes the foundations of planned future workto cover other properties.OpenStack refers to its cloud scheduler component using the

name “nova-scheduler” [29]—it identifies the scheduler as themost complex component to develop and states that significanteffort still remains to have an appropriate cloud scheduler. Todevelop the trustworthy scheduler component, it is important tohave an understanding of how clouds are managed and how theywork in practice, which Abbadi et al. has covered in previouswork [9], [10]. From this we concluded that establishing trustin clouds requires two mutually dependent elements: a) Trust-worthy mechanisms and tools to help cloud providers automatethe process of managing, maintaining, and securing the infra-structure; and b) methods for cloud users and providers to estab-lish trust in the operation of the infrastructure. Point (a) includes(but is not limited to) supporting the cloud infrastructure withtrustworthy self-managed services which automatically managethe cloud infrastructure [9], [11]. Automated self-managed ser-vices should provide cloud computing with exceptional capa-bilities and new features. For example, scale per use, hiding theinfrastructure’s complexity, automated reliability, availability,scalability, dependability, and resilience that consider users’ se-curity and privacy requirements by design [9]. The proposedcloud scheduler belongs to point (a) which is our first step inproviding the trustworthy self-managed services. The previouswork in [9], [12] partially covers point (b), which we also pro-totype and integrate with the proposed scheduler to present acoherent solution.The paper is organized as follows. Section II provides essen-

tial foundations which cover cloud structure and managementservices. Section III discusses the cloud’s compositional chainsof trust. Section IV presents our framework architecture andSection V presents our prototype. Finally, Section VI discussesrelated work and Section VII concludes the paper.

II. BACKGROUND

In this section we briefly summarize the previous work thatthis paper builds on. Specifically, we highlight work describingthe structure of clouds and cloud management services.

1556-6013/$31.00 © 2013 IEEE

Page 2: Towards Trustworthy Resource Scheduling in Clouds

974 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

Fig. 1. Cloud computing—layering conceptual model.

A. Cloud Structure Overview

This section briefly highlights part of a cloud taxonomy (see[9] for further details). The cloud environment is composedof enormous resources, which are categorized based on theirtypes and deployments across the cloud infrastructure. A re-source is a conceptual entity that provides services to otherentities. Cloud environment conceptually consists of multipleintersecting layers as follows: i) Physical Layer—This layerrepresents the physical resources and their interactions, whichconstitute a cloud’s physical infrastructure. Physical layer re-sources are consolidated to serve the Virtual Layer; ii) VirtualLayer—This layer represents the virtual resources, which arehosted by the Physical Layer; and iii) Application Layer—Thislayer runs the applications of the cloud’s customer which arehosted using the Virtual Layer resources.Fig. 1 provides a conceptual model which identifies an entity

Layer as the parent of the three cloud layers (i.e., the physical,virtual, and application layers). From an abstract level a Layercontains Resources which join Domains (i.e., we have physicaldomains, virtual domains, and application domains). A Domainresembles a container which consists of related resources. EachDomain resources are managed following the Domain definedpolicy.Domains that need to interact between themselves withina layer join a Collaborating Domain (i.e., we have physical col-laborating domains, virtual collaborating domains, and applica-tion collaborating domains). A Collaborating Domain controlsthe interaction between its member Domains using a definedpolicy.The nature of Resources, Domains, Collaborating Domains,

and their policies are layer specific.Domains andCollaboratingDomains are concepts that help in managing the cloud infra-structure, and managing resources distribution and coordinationin normal operations and incidents. Each of the identified cloudentities has a root of trust which helps in establishing trust inclouds. Subsequent sections clarify the roots of trust in moredetails.

B. Virtual Control Center

Currently there are many tools for managing a cloud’s vir-tual resources, e.g., vCenter [38] and OpenStack [27]. For con-venience we call such tools using a common name “Virtual

Control Centre (VCC)”, which is a cloud device1 that managesvirtual resources and their interactions with physical resourcesusing a set of software agents. The VCCwill play a major role inproviding cloud’s automated self-managed services, which aremostly provided manually at the time of writing.In previous work ([13]) Abbadi et al. proposed a framework

for establishing trust in the operational management of clouds.The framework identifies the challenges and requirements, andaddresses the ones which are related to establishing a trust-worthy environment at the infrastructure level. This includes theestablishment of offline chains of trust amongst clouds’ entities.This paper uses the framework because it provides a secure en-vironment for collecting infrastructure properties and enforcingthe scheduler policies at client devices. The functions of theframework are provided using two types of software agents: aserver software agent that runs at the VCC and a client agentthat runs at cloud’s physical resources. We refer to the serversoftware agent of the VCC as a Domain Controller Server Side,(DC-S), and we refer to the client software agent as a DomainController Client Side, (DC-C). The DC-C is in charge of en-forcing the DC-S policies at physical resources. The DC-S es-tablishes chains-of-trust with each DC-C as follows: the DC-Sverifies the trustworthiness of each DC-C to continually enforcethe domain policies and to only access the domain credentialswhen the resource execution status is as expected. In turn, theDC-C provides assurance to the DC-S about the trustworthinessof its hosting resource’s execution environment when managingthe domain and enforces the domain policies. This provides theassurance that only resources with a trustworthy DC-C can bea member of a domain. See ([13]) for a detailed discussion ofhow the offline chains of trust could be established and assured.

III. CLOUD COMPOSITIONAL CHAINS OF TRUST

One of the key properties of a cloud infrastructure is its trust-worthiness with regard to managing users’ virtual resources onphysical resources as agreed in a service level agreement (SLA).Assessing trust levels of clouds is not only beneficial to cloudusers, but also help cloud providers to understand how their in-frastructure is operated and managed. However, this is a diffi-cult problem to deal with considering the dynamic nature andenormous resources of the infrastructure [11], [14], [15], [17],[18], [22], [26], [31]. In Section VI we briefly analyze some ofthe proposed schemes in this direction; some of those focus onmeasuring the trustworthiness of the overall cloud infrastructurewhile others attempt to establish a chain of trust with a specificresource at a specific point in time. Abbadi ([12]) analyzed thisproblem, and argues that it is impractical to measure the trust-worthiness of the overall cloud infrastructure (considering itscomplexity), neither should we measure the trustworthiness ofa single component (as cloud dynamism breaks any establishedchain of trust). Abbadi’s method is based on segmenting the in-frastructure and measuring the trustworthiness of each segmentindependently. The boundaries of each segment are based onhow the infrastructure is managed in practice. Specifically, the

1VCC (as the case of OpenStack) could be deployed at a set of dedicatedand collaborating devices that share a common database to support resilience,scalability and performance.

Page 3: Towards Trustworthy Resource Scheduling in Clouds

ABBADI AND RUAN: TOWARDS TRUSTWORTHY RESOURCE SCHEDULING IN CLOUDS 975

boundaries are controlled using the domains and the collabo-rating domains concept which we outlined earlier.Abbadi ([12]) proposed the concept of compositional chains

of trust which provides a single chain of trust representing agroup of entities. This is important in clouds as many entitiesexist as a composition ofmultiple entities (e.g., a cluster of phys-ical servers, a cluster of load balanced application and databaseservers). Members of such a grouping may have identical or dif-ferent chains of trust. However, to an entity dependent on thisgrouping, they should see a single chain of trust representing thetrust they have in the grouping. In other words, relying entitieswill see a single entity, even though that entity will be a groupingrepresentingmultiple entities.Moreover, the functions proposedto calculate the compositional chains of trust provide differentlevels of transparency based on the cloud user type (i.e., IaaS,PaaS, or SaaS). Abbadi’s paper does not provide a prototype.Our proposed scheduler uses his methods to measure the trust-worthiness of the cloud infrastructure, and we also provide aprototype of the related part of Abbadi’s compositional chainsof trust.This section covers the part which is related to the physical

layer. Virtual and application layers are not covered by the pro-posed scheduler at this stage as we focus on IaaS type of ser-vice—In IaaS, the scheduler does not (“read as must not”) in-terfere with the internal details of individual virtual machinesand running applications to maintain users’ privacy and secu-rity requirements.

A. Types of Chains of Trust

AChain of Trust (CoT) consists of a set of elements primarilyused to establish the trust status of an object. The first elementof the CoT (also called the root of trust) should be establishedfrom a trusted entity or an entity that is assumed to be trusted,e.g., a trusted third party, a tamper-evident hardware chip (as inthe case of a Trusted Platform Module (TPM) [37]). The truststatus of the second element in the CoT is measured by the rootof trust (i.e., the first element in the CoT). If the verifier truststhe root of trust, then the verifier must also trust the root of trustmeasurement of the second element. The second element thenmeasures the trust status of the third element in the CoT. If thesecond element is trusted, and the second element measures thethird element trust status, then the verifier trusts the measure-ments of the third element. This process is a simplified exampleof how a CoT could possibly be built.Clouds have two types of CoTs: a single resource CoT, and

a compositional CoT representing multiple entities (i.e., do-mains and collaborating domains). A verifier is mainly inter-ested in evaluating compositional CoTs without the need to getinvolved in understanding the details of the cloud infrastructure.The compositional CoT would build on the CoTs for individualresources. As a result this section defines both types of CoTs,which includes defining the nature of their roots of trust.

B. A Resource Chain of Trust

As stated earlier, a resource is a conceptual entity that pro-vides services to other entities. Therefore, we begin the dis-cussion by defining the CoT for a single resource (RCoT) asa triple comprising an initial trust function (itf), a set of trust

functions (stf) and a sequence of elements in the chainwhere x is an element representing any com-

ponent (software, hardware, etc.) that contributes to the chain oftrust in the corresponding resource. An RCoT requires the fol-lowing: i) the initial function evaluates to trusted or assumed tobe trusted when applied to the first element of the sequence, andii) every function in the set of trust functions evaluates to truewhen applied to any two consecutive elements of the sequence.This is formally defined as follows:

The nature of the (i.e., the first element in thesequence, ) is based on the type of the entity and its locationwithin cloud layers. In context of TCG specifications the RCoTstarts from a CRTM (Core Root of Trust for Measurement)which should be stored in a protected location such as the TPM(currently it is protected by the BIOS). Once the CRTM mea-sures the platform’s initial state, it stores the result in protectedregisters inside the TPM (referred to as PCRs). The CRTM rep-resents the , , and the Set.(trust_functions) con-tains the TCG root’s of trust functions (that is the RTM, the RTSand the RTR) and other functions. Theis the one that measures the CRTM itself and stores the resultinside the TPM’s PCR.Unlike the RCoT at the physical layer, the RCoT at the virtual

and application layers have different treatments for their rootsof trust. This is because physical resources are the foundation ofa virtual resource’s roots of trust, which in turn forms the foun-dation of an application resource’s roots of trust. In other words,the virtual and application layers RCoT, considering cloud dy-namisms, should build on compositional CoTs and not a specificRCoT. It is outside the scope of this paper to discuss these anyfurther; detailed discussion of which can be found in [12].We defined two operations over the RCoTs:

i) : concatenatesto ; and ii) :

returns a set of elements, each represent the input CoT. Insuch an operation we should properly check for cycles in thetrust relation—but our context cannot allow this, and so wedo not concern ourselves with a detailed discussion of thataspect of the model. The operation is idempotent,commutative, and associative, so we will use terms such as

since these are unambiguous.

C. The Physical Layer DCoT and CDCoT

We now define how trust is composed from the members ina particular grouping in clouds. Understanding compositionalchains of trust is a vital requirement for establishing trust inclouds. This is because cloud resources at upper layers areserved by a collaborating set of resources rather than a specificresource. We identify two types of domain configurations:homogeneous and heterogeneous. In a homogeneous setting allresources are configured uniformly resulting in identical CoTs.An example of this is the resources within a physical domain

Page 4: Towards Trustworthy Resource Scheduling in Clouds

976 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

or a virtual domain. Resources of a physical domain are iden-tical and carefully selected, interconnected and positioned toachieve the domain properties. Similarly, resources of a virtualdomain are identical as they represent (as a result of horizontalscalability) a replication of VMs hosting an instance of anapplication resource. Application domains, on the other hand,are heterogeneous as they are composed of resources havingdifferent CoTs. Collaborating domains follow the same conceptas domains. For example, domains which are member of aphysical collaborating domain are homogeneous as they canserve as a backup of each other. Virtual and application layercollaborating domains, on the other hand, are heterogeneousas they serve to identify the interdependencies between theirmember domains. For further details clarifying the relationsbetween cloud entities and their grouping see [9], [13].We identify two types of compositional CoTs, namely: a do-

main chain of trust (DCoT) and collaborating domains chainof trust (CDCoT). The compositional CoT is composed of twoentities: i) a and ii) a combination of the enti-ties’ CoTs which are member of the corresponding domain/col-laborating domains. Unlike the RCoT, the of theDCoT/CDCoT attests to the trustworthiness of the way the do-main or collaborating domain is managed and operated. Weneed a that satisfies two properties: i) its trust-worthiness can be measured and assessed at all times, and ii) itcan provide strong assurance about the trustworthiness of theway the domains and the collaborating domains are managedand operated.Considering the above discussion, the two elements of the

physical layer DCoT are as follows: i) the combination of allRCoTs that are members of the physical domain—The physicaldomain is homogeneous, and as a result all RCoTs member ofthe domain are identical (combining identical chains of trustis equal to either chain of trust as discussed in Section III-B);and ii) the of the domain—As explained inSection II-B, each resource of a physical domain runs a trust-worthy copy of the DC-C that would provide assurances of thephysical resource state, the VCC runs a trustworthy copy ofthe DC-S that measures the trustworthiness of the DC-C, anda verifier can independently acquire the CoT of the VCC andassess its trustworthiness. These satisfy our two stated prop-erties of the of the DCoT/CDCoT. Therefore,we propose a CoT of the VCC, , to act as the

of the physical DCoT.Assume a homogeneous physical domain, , con-

sisting of resources , , , such that. The is then

defined as follows.

The DC-S, which is part of the RCoT(VCC), vouches and atteststo the trustworthiness of members of . The DC-S alsoprovides the assurance that the DC-C can only operate and be amember of a domain when its serving host has a specific valueof the RCoT. Therefore, a verifier only needs to attest to thetrustworthiness of the RCoT(VCC) and the DC-C, i.e., an ex-tended CoT which starts from the RCoT(VCC) and extends to

the DC-C. The resources of each physical domain have an iden-tical values of DC-Cs when they run as expected2. As a resultwe can redefine the as follows

After discussing the physical layer DCoT we now move tothe physical layer CDCoT. The DC-S and the DC-C managesboth physical domains and physical collaborating domains. As aresult, an appropriate of the CDCoT is the sameas the of the DCoT. The of theCDCoT is already included in the DCoT, thus we can excludeit from the physical CDCoT. Suppose a collaborating domain

is composed of domains:such that .The is then defined as follows.

3 By substituting the value of from, we end up with the following

The above shows that the physical CDCoT is mainly based onthe VCC and the DC-C. The trustworthiness of the VCC can bemeasured by a verifier, and the trustworthiness of the DC-C canbe verified by the VCC. This is the foundation of the physicallayer which acts as a foundation for the layerabove it (i.e., the virtual layer), as discussed in [12].

IV. HIGH LEVEL ARCHITECTURE

Having outlined the cloud taxonomy, the relationship be-tween cloud components and the compositional chains of trust,we can now cover the scheme framework. We use OpenStackCompute ([29]) as a management framework to represent theVCC. OpenStack is an open source tool for managing the cloudinfrastructure which is under continuous development.Fig. 2 presents a high level architecture which illustrates the

main entities and the general layout of our scheme framework.We use an OpenStack controller node (i.e., the VCC) and anOpenStack nova-compute (i.e., a computing node at the physicallayer). The computing node runs a hypervisor which managesa set of VMs. The VCC receives two main inputs: user require-ments and infrastructure properties. The VCC manages the uservirtual resources based on these inputs. In this section we focuson our introduced components and, in addition, we cover thechanges we introduced at the following components of Open-Stack (further details about other components can be found at[29]): nova-api, nova-database, nova-scheduler and nova-com-pute. We update some of the functions of these components andintroduce new functions.

2Resources of different domains would typically have different values ofDC-Cs, as different domains have different properties.3As discussed earlier each physical collaborating domain has identical phys-

ical domains to support clouds properties; however, resources and domainsmember of different collaborating domains are not necessarily identical.

Page 5: Towards Trustworthy Resource Scheduling in Clouds

ABBADI AND RUAN: TOWARDS TRUSTWORTHY RESOURCE SCHEDULING IN CLOUDS 977

Fig. 2. High level architecture.

A. Nova-Api

Nova-api is a set of command line tools and graphical inter-faces which are used by customers when managing their re-sources in the cloud, and are also used by cloud administra-tors when managing the cloud virtual infrastructure. We up-dated the nova-api library to consider the following: i) Infra-structure Properties—The cloud physical infrastructure is verywell organized and managed, and its organization and manage-ment associate its components with infrastructure properties.Examples of such properties include: resources reliability andconnectivity, resources distribution across the cloud infrastruc-ture, resources RCoT, redundancy types, and resources clus-tering and grouping. ii) User Properties —These include tech-nical requirements, service level agreements, and user-centricsecurity and privacy requirements. And iii) Changes —Theserepresent changes in: user properties, infrastructure properties,and the infrastructure policy.The main changes which we introduced at nova-api includes

the following: i) enable users to manage their requirements;ii) enable administrators to manage the properties and policiesof the infrastructure, e.g., associate computing nodes with theirdomains and collaborating domains; and iii) enable OpenStackto automatically collect the properties of the Cloud physicalresources through trusted channels—at this stage we specifi-cally focus on collecting the RCoTs. These data are stored onnova-database and are used by our proposed scheduler.

B. Nova-Database

Nova-database is composed of many tables which hold thedetails of the cloud components, users, projects and securityproperties. We extended nova-database in different directionsto realize the taxonomy of clouds, and to cover the introduceduser security requirements and infrastructure properties. Fig. 3illustrates our proposed modification of nova-database in boldformat which are as follows.Compute_nodes is an existing nova-database table that holds

records reflecting computing resources at the physical layer. Weupdated this table by adding the following fields: RCoT(Phys-ical) and security properties which hold values of the computingresources security details.

Physical_Layer_Domain is a new table which holds recordsof a cloud physical domains, defines the relationship amongsteach domain resources, and holds the domain metadata. Themetadata includes the domain capabilities, the DCoT, and a for-eign key pointing to a table which identifies the relative locationof the physical domain within a clouds infrastructure.Location and Location_Distances. The aim of these tables is

to identify all possible locations within a cloud infrastructure.It also defines relative distances between pairs of all identifiedlocations. These tables are bound as follows: the compute_nodetable is bound to the physical_layer_domain table, and the phys-ical_layer_domain table is bound to a specific location identifierin theLocation table. The latter is bound with the location_dis-tances table which specifies the distances between a locationidentifier and all other location identifiers. In this we assume theresources of a physical domain are within close physical prox-imity which reflects current deployment scenarios in practicallife.Collaborating_PL_Domain is a new table which establishes

the concept of collaborating physical domains. Each recordidentifies a specific backup domain for each physical domainwith a priority value. A source domain can have many backupdomains. The value of the priority field identifies the order bywhich physical backup domains could possibly be allocatedto serve the source domain needs. Backup domains are usedin maintenance windows, emergencies, load balancing, etc.Backup domains should have the same capabilities and theDCoT value as the source physical domain itself.Instances is an existing OpenStack table representing the

running instances at computing nodes. We updated the tableby adding the following fields: i) the virtual resource chainof trust RCoT(Virtual), ii) the application resource chain oftrust RCoT(Application) and iii) two foreign keys which estab-lish a relationship with the instance’s virtual and applicationdomain tables, as defined in Virtual_Layer_Domain and Ap-plication_Layer_Domain tables, respectively. It is outside thescope of this paper to cover the details of these tables as we aremainly focusing on the scheduler requirements.Services table is an existing OpenStack table which binds the

virtual layer resources to their hosting resources at the physicallayer.Other tables. Openstack has many more tables and we also

added more tables which are outside the scope of the discussionof this paper.Most of nova-database records are uploaded automatically

using: i) the proposed software agents, ii) the modified nova-api, and/or iii) other OpenStack management tools. Ideally suchrecords should be securely protected, collected and managed.Our focus at this stage is on providing a high level architecturedesign, a running cloud scheduler, and a management softwareagents for attesting to the trustworthiness of Openstack compo-nents. Full automation of the cloud management services is ourplanned long term objective.

C. Nova-Scheduler

Nova-scheduler controls the hosting of VMs at physicalresources considering user requirements and infrastructureproperties. Current implementations of nova-scheduler do not

Page 6: Towards Trustworthy Resource Scheduling in Clouds

978 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

Fig. 3. Updates on nova-database to include part of the identified clouds relations.

consider the entire cloud infrastructure, neither do they con-sider the overall user and infrastructure properties. Accordingto OpenStack documentation, nova-scheduler is still immatureand great efforts are still required to improve it. We imple-mented a new scheduler algorithm, (Access Controlas a Service), which performs the following when allocatinga physical resource to host a virtual resource: i) considersthe discussed cloud taxonomy, ii) selects a physical resourcewhich has properties that can best match the requested userrequirements, and iii) ensures that the user requirements arecontinually maintained. The ACaaS scheduler collaborateswith the following software agents (see Fig. 2).• The cloud client agent, DC-C, runs at each OpenStackcomputing node and performs the following: i) calculatesthe computing node’s RCoT, continually assesses the statusof the computing node, and then passes the result over tothe DC-S, and ii) manages domains and collaborating do-mains members based on policies distributed by the DC-S(e.g., a VM can only operate with a known value of a chainof trust and when the hosting physical collaborating do-mains have a specific value of the CDCoT, as defined byuser properties).

• The cloud server agent, DC-S runs at OpenStack do-main controller and performs the following: i) maintainsand manages OpenStack components (including thenova-scheduler) by ensuring they operate the cloud onlywhen they are trusted to behave as expected, ii) managesthe membership of the physical and virtual domains,and iii) attests to the DC-C trustworthiness when itscomputing node joins a physical domain. The DC-S alsointermediates the communication between the DC-C andthe nova-scheduler, attests to the DC-C’s computing-nodetrustworthiness, collects the computing node’s RCoT, andthen calculates the DCoT and the CDCoT. Finally, it storesthe result in an appropriate field in nova-database.

Next section discusses our prototype and how these componentscollaborate to establish trust in the cloud.

V. PROTOTYPE

Having defined a high level architecture of our scheme,this section describes our prototype. As discussed earlier, thispaper focuses on providing a mechanism for a trustworthycollection of resources’ RCoT, and calculates for each groupof resources their DCoT and CDCoT. It then uses the ACaaSscheduler to match user properties with infrastructure prop-erties. Other infrastructure properties, at this stage, are eithercollected automatically (such as the capabilities of physicalresources) or entered manually (such as the physical location ofcomputing nodes). These properties could be altered by systemadministrators. Planned future work will focus on extendingour framework to establish trustworthy collection/calculationof the other properties. The trust measurements performed bythe DC-C identifies the building up of a resource’s RCoT andits integrity measurements. This section discusses our imple-mentation of the proposed framework — we use an OpenStackcontroller node as the VCC. Our implementation also includestrust establishment building on remote attestations and securescheduling.

A. Trust Attestation via the DC-C

Our implementation is based on an open source trusted com-puting infrastructure which is built on a Linux operating system.Building the RCoT of a computing node starts from the nodeTPM and ends up to the node DC-C, as illustrated in Fig. 4. TheRCoT building process is started by the platform bootstrappingprocedure, which initializes the TPM via a trusted BIOS. Thetrusted BIOSmeasures and loads the trusted bootloader [4] (i.e.,the Trusted Grub) which measures and loads a Linux kernel.We updated the Linux kernel ensuring that the IBM IntegrityMeasurement Architecture (IMA) [33] is enabled by default.The IMAmeasures all critical components before loading them.These include kernel modules, user applications and associatedconfiguration files. The values of these measurements are irre-versibly stored inside the PCRs of the computing node, which

Page 7: Towards Trustworthy Resource Scheduling in Clouds

ABBADI AND RUAN: TOWARDS TRUSTWORTHY RESOURCE SCHEDULING IN CLOUDS 979

Fig. 4. Compute node architecture.

TABLE ICOMPUTE NODE BOOTSTRAPPING

MEASUREMENT LOG

are protected by the TPM. The IMA by default uses PCR #10to store the measurements.The TPM driver and the Trusted Core Service Daemon

(TCSD) [2] expose the Trusted Computing Services (TCS)to applications. These components constitute the part of theDC-C for collecting and reporting the trust measurement of aresource. The RCoT is, hence, constructed from the CRTM [3],which itself resides and is protected by the Trusted BIOS.Table I illustrates part of the records of the bootstrapping

process for our prototype as generated by the IMA measure-ment log. The IMA measurement log is the source for gener-ating Integrity Reports (IR) which are used, as we discuss latter,to determine the genuine properties of a target system during theremote attestation process. The first column, in Table I, showsthe value of after loading the components of the thirdcolumn. The second column records the hash value of the loadedcomponent. The first record holds the value of the boot_ag-gregate which is a combined hash value of ;that is, it possesses the measurement of the Trusted ComputingBase (TCB) of a computing node, including the Trusted BIOS,

the Trusted Bootloader, and the image of the Linux Kernel to-gether with its initial ram-disk and kernel arguments. Whenevera software component is loaded, the IMA module generates ahash value of the loaded component and then extends it into

by invoking the command [37]. Sucha command updates to reflect the loaded componentas follows: . Subsequent rowsin the table present the measurement logs of the bootstrappingworkflow at our adopted operating system, Ubuntu 11.04. OtherOpenStack components are then measured which includes thenova-compute.conf script, the python executable, the nova-com-pute executable, supporting libraries, and critical configurationfiles.To reduce the complexity and to focus on a practical cloud de-

ployment cases, our prototype turns off all unnecessary servicesat the base system. As a result, the value of does not getchanged except if a new software module (e.g., a user program,kernel modules or shared libraries) is loaded on the computingnode. The loading process could be either good (e.g., a securitypatch) or malicious. In such a case, the loaded software modulewould be measured and added to the log records. Such a mea-surement changes the value of .Our prototype intentionally filters out the IMAmeasurements

of VMs, i.e., the QEMU program in our prototype. This is be-cause a VM CoT should be built on a compositional CoT; thatis, the IMA measurements of a VM should not be consideredas part of the TCB of a computing node. The measurements ofa VM should rather be controlled by the IaaS cloud user andnot the cloud provider, as this will likely raise user’s privacyconcerns. This measurement process, in addition, would sig-nificantly increase the complexity of the trust management. Ifan exploited VM runs on a computing node, for example, toperform a malicious behavior to other components and applica-tions, the properties of the computing node would change oncethe exploited VM starts to affect the TCB components. In sucha case, the DC-C will leave the physical domain; i.e., the DC-Cwill stop operating and the VMs which are hosted at the infectedcomputing node will be forced to migrate to another healthycomputing node member of the same physical domain.Finally, the DC-C collects the integrity measurement logs as

recorded by the IMA, and generates an IR following the speci-fications of the Platform Trust Service interface (PTS) [5]. TheDC-C, as we discuss in the next subsection, sends the IR reportand the signed PCR values to the DC-S on request. In our pro-totype, this component is implemented by integrating the PTSCmodule from the OpenPTS [7].

B. Trust Management by the DC-S

This section starts by summarizing high level steps of theimplemented part of the system workflow4. It then presents theprototyping details which are related to the DC-S. These are asfollows.1) Cloud security administrators could either create a newphysical domain or use an existing domain. The creationprocess involves deciding on the domain capabilities, loca-tion, and defining its collaborating domains. As discussed

4Further details about these steps are provided in [9].

Page 8: Towards Trustworthy Resource Scheduling in Clouds

980 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

in Section IV, we updated nova-api to enable administra-tors to manage this process.

2) Cloud security administrators then install the DC-C andnova-compute at all new physical computing nodes that areplanned to join the domain.

3) The DC-C joins the cloud physical domain by communi-cating with the DC-S. The DC-S would first attest to theDC-C trustworthiness and establish an offline chain of trustwith the DC-C (using sealing and remote attestation con-cepts, as proposed by TCG specifications). Next, the DC-Cwould calculate its host RCoT, as described in the proto-type, and passes the results to the DC-S.

4) Subsequently, the DC-S would store the RCoT at the com-pute_nodes table, and ensure that all devices in each do-main have the same capabilities. The sealing mechanism,which is established in previous steps, assures the DC-Sthat the DC-C can only operate with the same value of thereported RCoT. If this value changes (e.g., as in the caseof the hosting device being hacked) the DC-C will not op-erate. This prevents VMs from starting at a hacked device.

5) Users, using nova-api commands, deploy their VMs andassociate them with certain properties. Such properties in-clude, for example, the required CDCoT(Physical), and themultitenancy restrictions which control the sharing of acomputing node with other users.

6) The ACaaS scheduler allocates an appropriate physical do-main to host a user VM. The properties of the physicaldomain and its member devices should satisfy the defineduser requirements.

The remaining part of this section covers the implementation ofthe remote attestation process and the secure scheduling.1) Remote Attestation: Our prototype implements the remote

attestation process using OpenPTS [7] which is managed bythe DC-S. OpenPTS sends an attestation request to each com-puting node to retrieve its IR and PCR values. When a com-puting node sends the requested values, OpenPTS would thenexamine the consistency of the received IR and PCR values [3].Subsequently, it would verify the security properties of the com-puting node by matching the reported IR with the expected mea-surement from a white-list database [3]. The white-list databasestores sets of measurements, where each set is calculated basedon a carefully selected good platform configuration state. Thecalculation is performed on a good platform in form of hashvalues for a selected set of preloaded software components.For the purpose of our prototype we used two newly installed

Ubuntu 11.04 servers. These servers have minimal settings fora computing node to perform its planned functions. The hashvalues of the software stack of each computing node shouldexist within the white-list database. If it does not, we considerthe computing node to be untrusted. The good configurationscould be extended/changed by adding/updating their corre-sponding values in the white-list database.The attestation protocol works as follows. Every computing

node is identified by its AIK (Attestation Identity Key).The AIK is certified by the cloud controller VCC as itcovers the Privacy-CA role [1], [3]. When a new computingnode is added to the cloud infrastructure it must be first reg-istered at the VCC which then certifies its AIK. Only registered

computing nodes can connect to the VCC as their certified AIKscannot be forged and AIKs can only be used inside the genuineTPM that generates them. The registration steps of at isoutlined in Protocol 1. Whenever a computing node sends a re-quest to connect to the VCC a trust establishment protocol isexecuted which is outlined in Protocol 2.

Protocol 1 Computing Node Registration Protocol.

1) sends a registration request to as follows . First,sends a request to its TPM to create an AIK key

pair using the command TPM_CreateAIK. The TPMwould then generate an AIK key pair. The generatedprivate part of the key pair never leaves the TPM, and thecorresponding public part of the key pair is signed by theTPM Endorsement Key (EK) [3]. The EK is protected bythe TPM, and never leaves it. then sends a registrationrequest to . The request is associated with the EKcertificate, the AIK public key and other parameters.

(1.1)

2) certifies as follows . verifies .If the verification succeeds, generates a specific-AIKcertificate for and a unique ID, . It then sendsthe result to .

(1.2)

Protocol 2 Trust Establishment Protocol.

1) sends an attestation request to . The requestincludes a nonce .

2) would then report an attestation ticket to as follows.sends its values, and the measurement log

back to , together with . These are signed using the’s AIK.

(2.1)

3) then verifies the message sent by as follows. Itverifies the signature and matches the sentnonce. If the verification succeeds, examines theconsistency of and , and then determines theproperties of based on the value of .

The configurations of a computing node could possibly bealtered after an attestation session, e.g., loading a new applica-tion. In such a case, the computing node attestation properties(as maintained by the VCC) would be violated. Addressing thiswould require establishing a trusted channel [19] to seal [3] thecommunication key with the verified PCR values. The sealingprocess provides the assurance that the DC-S can load the keyonly for a specific computing node’s configuration. Any changesin the computing node configurations would trigger a new attes-tation request from the VCC to the computing node.

Page 9: Towards Trustworthy Resource Scheduling in Clouds

ABBADI AND RUAN: TOWARDS TRUSTWORTHY RESOURCE SCHEDULING IN CLOUDS 981

The implementation of the trusted channel, when sealed keysare loaded into memory, requires a small TCB. The TCB shouldenforce a strict access to the memory area which stores thekey. Having a large TCB, on the other hand, could result inleaking the key frommemory without reflecting that on the plat-form trust status. Implementing a small TCB is a challengingproblem, especially, considering the complexity and scalabilityof the hosting cloud system (we leave this important subjectas a planned future research). As an attempt to lessen the im-pact of this threat, in our prototype we impose periodic attesta-tions which keep the security properties of a computing node upto date. We implemented this by associating a timer with eachcomputing node. Reattestation is enforced whenever the timerexpires. Untrusted computing nodes found by the reattestationwill be immediately removed from the database and would needto re-enroll in the system for future use. In addition, VMs run-ning on untrusted computing nodes will be forced to migrate toother computing nodes which are a member of the same phys-ical domain.2) Secure Scheduling: As we discussed in previous sections,

computing nodes are organized into physical domains. Such or-ganization would be based on the properties of each computingnode (i.e., security, privacy and other properties) which enableit to serve the needs of the domain. Users can specify theirexpected properties of computing nodes that could host theirVMs. Some of the properties could be represented by a set ofPCR values. However, PCR values are hard to precalculate andmanage as they represent aggregated hash values of softwarecomponents when loaded in a specific order. In our proposedprototype users do not need to specify PCR values, rather theywould need to identify their desired hosting environment usingthe provided sets of white-lists. A computing node white-list isidentified in accordance with its properties which get attestedwhilst joining a domain and periodically thereafter. Genuineupdates on the properties of a computing node (e.g., applyinga security patch) requires the adjustment of the correspondingrecord in the white-list database. In our prototype, part of usersrequired properties could represent entries in the white-list data-base. The ACaaS scheduler deploys each VM on a computingnode that has the same properties as the one requested by theuser of the VM. The ACaaS scheduler, with collaboration withthe DC-S and the DC-C, periodically examines the consistencyof such properties.

C. Preliminary Performance Evaluation

We assess the performance of our prototype by measuring theoverheads which are introduced by the trusted computing infra-structure. Our assessment focuses on the following critical oper-ations: the PCR quote instruction, the PCR verification instruc-tion, and a full remote attestation process. We also assess theadditional time which is required when extending the RCoT ona compute node during the bootstrapping process, as discussedearlier. Table II shows the performance metrics for these op-erations. Each row includes the average time and the standarddeviation of 15 execution of every operation.We found that a full attestation requires around 10 second on a

computing node, the attestation includes quoting the PCR valuesand generating the IRs. On the VCC, the attestation includes

TABLE IITRUSTED COMPUTING OPERATIONS OVERHEADS

verifying the PCR values and analyzing the IRs. The lengthof the attestation time (i.e., 10 second) is considered a majorproblem as it would affect the overall cloud performance. Weaddressed this problem by implementing an offline attestationrather than an online attestation. Hence, the impact of the attes-tation time would be negligible. For example, the cloud sched-uler when making a scheduling decision would directly fetch apreviously stored attestation results from the database.An important point that would need to be considered in the

offline attestation case is the trade-off between security and per-formance. That is, the longer the gap between successive attes-tations the better the performance and the worse the assuranceon changes of device status. Analogously, the shorter the gapbetween attestations the worse the performance and the betterthe assurance on changes of device status. The delays betweensuccessive attestations determine the required time for viola-tion detection. Even if the delay is zero, there is still the fullattestation interval (i.e., the 10 second) where a violation couldhappen. Within this period, the state of a target computing nodecould be changed without reflecting such a change to its secu-rity properties as stored in the database.After careful examination of a full remote attestation process,

we found the consumed time when quoting and verifying thevalues of a PCR is much less than the consumed time for gen-erating and verifying the IRs. Verifying IRs is only necessarywhen the state of a computing node gets changed, i.e., as longas successive measurements of PCR values are identical, the se-curity properties of a computing node could safely be assumedidentical. That is, having identical PCR values eliminates theneed to do further examination of the IRs. As a result, the at-testations to a computing node can be optimized by first com-paring the current PCR values with the previous one. If they donot match, the computing node would then have to generate anIR which gets verified by the VCC.In practice, the state of a platform would rarely change [24].

For example, the configuration of a computing node changesonly when it loads a new privileged executable, e.g., loadingnew security modules, applying patches or launching maliciousattacks. Most of the time, the attestation delay can be reduced byonly generating and verifying the PCR values, which, as illus-trated in Table II, take less than 0.8 second. Active attestationwhich is proposed by Stumpf et al. [36] uses a time-stampedhash-chain attestation to generate attestation tickets. The attes-tation ticket is a data package which records the PCR values andis signed by the TPM. The used nonce in an attestation session isbound to a global time; hence, a previously generated attestationtickets could be reused. This scheme could be used in our archi-tecture for each node to actively generate its attestation ticket(i.e., quote its PCR values), instead of using an attestation ses-sion. In this case, only the time which is required for fetching the

Page 10: Towards Trustworthy Resource Scheduling in Clouds

982 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

Fig. 5. Attestation response time breakdown. (a) Regular remote attestationsession. (b) Periodically and actively PCR quote. (c) Active attestation session.(d) Active attestation with unchanged PCR values.

ticket from memory would be added to the attestation latency.This reduces the overall overhead to a minimal value.As depicted in Fig. 5(a), the latency introduced by a remote

attestation session consists of the generation and verification oftwo entities: the attestation ticket (mainly the TPM Quote in-struction) and the SML (Stored Measurement Log, the TCG ter-minology for the IR). As with active attestation, the tickets aregenerated in parallel with the attestation sessions (Fig. 5(b)).Hence, for each attestation, only negligible overheads are intro-duced for fetching, sending and verifying the latest ticket. TheSML is generated and sent only when necessary; thus, the over-head is further reduced to the time taken for comparing two hashvalues which is also negligible.Finally, an inevitable bootstrapping delay would be incurred

for extending the RCoT. This is a mandatory step because allsoftware components prior loading should first get measuredand extended to the TPM. This process is 3 times longer thanbooting a general system, as illustrated in Table II. Such a longtime would not cause any problem, as in a production environ-ment (which applies to the cloud environment) computing nodesdo not get rebooted frequently; i.e., the bootstrapping delay isnegligible.

VI. RELATED WORK

The issue of establishing trust in the cloud has been discussedby many authors (e.g., [8], [20], [21], [23], [35]). Much of thediscussion has been centered on reasons to “trust the cloud” ornot to. Khan and Malluhi [23] discuss factors that affect con-sumers’ trust in the cloud and some of the emerging technolo-gies that could be used to establish trust in the cloud including

enabling more jurisdiction over the consumers’ data throughprovision of remote access control, transparency in the securitycapabilities of the providers, independent certification of cloudservices for security properties and capabilities and the use ofprivate enclaves. The issue with jurisdiction is echoed by Hayet al. [21], who further suggest some technical mechanisms in-cluding encrypted communication channels and computation onencrypted data as ways of addressing some of the trust chal-lenges. Schiffman et al. [35] propose the use of hardware-basedattestation mechanisms to improve transparency into the en-forcement of critical security properties. The work in [8], [20]focuses on identifying the properties for establishing trust in thecloud.Few papers propose the use of a TPM in clouds for remote

attestation (e.g., [32], [34], [35]). The work in [32] proposes aremote attestation mechanism based upon reputation systemsand TCG remote attestation concepts. This work requiresresources, when interacting with other resources, to attest totheir trustworthiness, keep a copy of the measured trust values,and to share it with other resources. The trust measurements areassociated with a timestamp and would need to be revalidatedafter itexpires.Thedisseminationofsuchmeasurementsbetweenresources forms a web-of trust. We identify the followingweaknesses in the scheme: i) the relation between entities inclouds are identified mainly based on the dynamic behaviorof the entities (i.e., a relationship between entity andis established when exchanges messages with ), andii) the entities of the web-of-trust (consists of a large andreplicated database of trust measurements) that expire andrequire frequent reevaluation. Establishing trust between entitiesbased on entities’ dynamic behavior (i.e., point i) is not accurateand might affect the cloud availability and resilience. Forexample, entities at the physical layer forming a collaboratingdomain do not communicate frequently, and sometimes nevercommunicate directly. If a physical domain fails then all itshosted resources must start up immediately at another physicaldomain. Establishing a chain of trust at this critical stage wouldaffect the timing of service recovery. In addition, point (ii)is time consuming and, by considering the enormous numberof resources in clouds, it is not practical to keep revalidatingthe trust measurements.The work in ([34], [35]) provides remote attestation for ei-

ther the entire cloud infrastructure or for the physical resourceshosting a specific VM. However, we argue that it is not prac-tical to attest to the entire cloud infrastructure considering itshuge and distributed resources, neither it is practical to attestto a specific set of physical resources considering the dynamicnature of clouds. In addition, these papers require users to un-derstand, to some extent, the cloud infrastructure, i.e., they donot provide a transparent cloud infrastructure.Establishing trust and security in data centers using trusted

computing technology has been explored by the Europeanproject OpenTC [6]. However, our paper proposes a uniformand more formal model to capture, describe and use the trustinformation in a cloud computing context. Specifically, weimplemented the ACaaS scheduler which uses the trust infor-mation to help in providing trustworthy automated managementof the cloud infrastructure.

Page 11: Towards Trustworthy Resource Scheduling in Clouds

ABBADI AND RUAN: TOWARDS TRUSTWORTHY RESOURCE SCHEDULING IN CLOUDS 983

Trusted Virtual Datacenter (TVDc) [16] incorporates trustedcomputing technologies into virtualization and system manage-ment software. It provides strong isolation between workloadsby enforcing a Mandatory Access Control (MAC) policythroughout a datacenter. TVDc also provides integrity guaran-tees to each workload by leveraging a hardware root of trust ineach platform to determine the identity and integrity of everypiece of software running on a platform. Shamon [25] proposeda distributed systems architecture in which MAC policies canbe enforced across physically separate systems. By bridgingthe reference monitor between these systems, different logicalcompartments enforced by MAC policies can be achievedacross physical machines. Trust in the MAC enforcementcapabilities of a remote system is established using the remoteattestation. However, each resource (in these proposals) istreated independently. Different expected configurations, i.e.,the white-list, should be maintained for the attestation of eachresource. The attestation strategy for each of them is alsodefined separately. Our work is different as the homogeneity(within a domain/collaborating domain context) dominatesthe cloud internal structure, with which a compositional chainof trust can be constructed to significantly reduce attestationredundancy and improve efficiency.Parno et al. [30] focuses on bootstrapping trust in commodity

systems. Various aspects of the TPM-based chain of trust andcorrespondingly secure storage and remote attestation schemesare discussed. As this work is based only on building trust on asingle commodity system, particular concerns arise when imple-menting the trusted cloud architecture. As discussed in previoussections, the complexity and dynamics of clouds require the im-plementation of the compositional chains of trust, in which re-sources with various security properties can exist inside a cloudto satisfy different scheduling requirements, while similar nodesare organized to significantly reduce management overheads.A cloud scheduler is proposed and implemented by an in-

dustry body (e.g., VMWare in their VCenter [38]) and by opensource tools (e.g., Openstack and OpenNebula [27], [28]). Thescheduler decisions could be based on various factors at thephysical layer such as memory, CPU, and load. Currently themain implemented scheduling algorithms are: i) chance: in thismethod, a computing node is chosen randomly, ii) availabilityzone: similar to chance, but the computing node is chosen ran-domly from within a specified availability zone, and iii) simple:in this method, a computing node whose load is the least ischosen to run an instance. Such algorithms are still basic; i.e.,they do not consider the entire cloud infrastructure neither dothey consider wide user requirements.Our scheme is different from the above as it is based on prac-

tical understanding of how clouds work.We explicitly identifiedthe cloud infrastructure and user properties. We also consideredthe cloud taxonomy, and the dynamic nature and the practicalrelationships between cloud entities. Understanding these helpus in providing a novel cloud scheduler that matches user prop-erties with infrastructure properties ensuring user requirementsare continuously met following a preagreed SLA. In addition,we developed software agents running on computing nodes toenforce the scheduler decision and also to provide a trustworthyreport about the trust level of computing nodes. Moreover, we

assess the trustworthiness of the infrastructure using the com-positional chains of trust scheme, which considers both the dy-namic nature of clouds and the way cloud infrastructures aremanaged.

VII. CONCLUSION

Cloud infrastructure is expected to be able to support Internetscale critical applications (e.g., hospital systems and smart gridsystems). Critical infrastructure services and organizations alikewill not outsource their critical applications to a public cloudwithout strong assurances that their requirements will be en-forced. This is a challenging problem to address, which we havebeen working on as part of TClouds project. A key point for ad-dressing such a problem is providing a trustworthy cloud sched-uler supported by trustworthy data enabling the scheduler tomake the right decision. Such a trustworthy source of data isrelated to both user requirements and infrastructure properties.User requirements and infrastructure properties are enormous,and assuring their trustworthiness is our long term objective.This paper covers one of the most important properties which isabout measuring the trust status of the cloud infrastructure, andenabling users to define their minimal acceptable level of trust.We presented our prototype which covers both the proposedscheduler and, in addition, our previous work focusing on cloudtrust measurements. The key advantage of the prototype is that itconsiders critical factors that have not been considered in com-mercial schedulers such as: considering the overall cloud infra-structure and computing nodes’ trustworthiness. In addition, theprototype enables users to identify their expected trust level atthe physical resources hosting their virtual resources without theneed to get involved with infrastructure complexity. The paperalso prototyped client agents which enforce the scheduler deci-sions across the wide scale cloud infrastructure.This paper presents a core component of our long term objec-

tive of establishing clouds’ trustworthy self-managed services.We identified, throughout the paper, different areas of researchto expand this work, which we are planning to work on in thenear future.

ACKNOWLEDGMENT

The authors would like to thank William Horne, John Lyle,and the anonymous reviewers for their valuable input.

REFERENCES[1] Privacy CA [Online]. Available: http://www.privacyca.com.[2] Trousers—the Open-Source TCG Software Stack [Online]. Available:

http://trousers.sourceforge.net/.[3] Trusted Computing Group [Online]. Available: http://www.trusted-

computing group.org.[4] Trusted Grub [Online]. Available: http://trousers.sourceforge.net/grub.

html.[5] Infrastructure Work Group Platform Trust Services Interface Spec-

ification, Version 1.0 2006 [Online]. Available: http://www.trust-edcomputinggroup.org/resources /infrastructure_work_group_plat-form_trust_services_interface_specification_ version_10

[6] OpenTC 2009 [Online]. Available: www.opentc.net[7] Open Platform Trusted Service User’s Guide 2011 [Online]. Available:

http://iij.dl.sourceforge.jp/openpts/51879/userguide-0.2.4.pdf[8] J. Abawajy, “Determining service trustworthiness in intercloud com-

puting environments,” in Proc. 2009 10th Int. Symp. Pervasive Syst.,Algorithms, and Networks (ISPAN), Dec. 2009, pp. 784–788.

Page 12: Towards Trustworthy Resource Scheduling in Clouds

984 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 6, JUNE 2013

[9] M. A. Imad, “Clouds infrastructure taxonomy, properties, and man-agement services,” in Advances in Computing and Communications,A. Abraham, J. L. Mauri, J. F. Buford, J. Suzuki, and S. M. Thampi,Eds. Berlin, Heidelberg, Germany: Springer, 2011, vol. 193, Com-munications in Computer and Information Science, pp. 406–420.

[10] I. M. Abbadi, “Middleware services at cloud virtual layer,” inProc. 2ndInt. Workshop on Dependable Service-Oriented and Cloud Comput.(DSOC 2011), Aug. 2011, IEEE Comput. Soc..

[11] M. A. Imad, “Toward trustworthy clouds’ Internet scale critical infra-structure,” in Proc. 7th Inform. Security Practice and Experience Conf.LNCS (ISPEC ’11), Berlin, Germany, Jun. 2011, vol. 6672, pp. 73–84,Springer-Verlag.

[12] I. M. Abbadi, “Clouds trust anchors,” in Proc. 11th IEEE Int.Conf. Trust, Security and Privacy in Comput. and Commun. (IEEETrustCom-11), Liverpool, U.K., Jun. 2012, pp. 127–136.

[13] I. M. Abbadi, “Muntaha alawneh, and andrew martin. Secure virtuallayermanagement in clouds,” inProc. 10th IEEE Int. Conf. Trust, Secu-rity and Privacy in Comput. and Commun. (IEEE TrustCom-10), Nov.2011, pp. 99–110.

[14] I. M. Abbadi and N. Cornelius, “Dynamics of trust in clouds — chal-lenges and research agenda,” in Proc. 6th Int. Conf. Internet Technol.and Secured Trans. (ICITST-2011), Dec. 2011, pp. 110–115.

[15] A. Michael, F. Armando, G. Rean, D. J. Anthony, H. K. Randy, K.Andrew, L. Gunho, A. P. David, R. Ariel, S. Ion, and Z. Matei, Abovethe Clouds: A Berkeley View of Cloud Computing.

[16] S. Berger, R. Cáceres, D. Pendarakis, R. Sailer, E. Valdez, R. Perez,W. Schildhauer, and D. Srinivasan, “Tvdc: managing security in thetrusted virtual datacenter,” SIGOPSOper. Syst. Rev., vol. 42, Jan. 2008.

[17] S. Bleikertz, M. Schunter, C. W. Probst, D. Pendarakis, and K.Eriksson, “Security audits of multi-tier virtual infrastructures inpublic infrastructure clouds,” in Proc. 2010 ACM Workshop on CloudComput. Security Workshop (CCSW ’10), New York, NY, USA, 2010,pp. 93–102, ACM.

[18] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka,and J. Molina, “Controlling data in the cloud: outsourcing computa-tion without outsourcing control,” in Proc. 2009 ACM Workshop onCloud Comput. Security (CCSW ’09), New York, NY, USA, 2009, pp.85–90, ACM.

[19] Y. Gasmi, A.-R. Sadeghi, P. Stewin, M. Unger, and N. Asokan,“Beyond secure channels,” in Proc. 2007 ACM Workshop on ScalableTrusted Comput. (STC ’07), New York, NY, USA, 2007, pp. 30–40,ACM.

[20] S.M. Habib, S. Ries, andM.Muhlhauser, “Cloud computing landscapeand research challenges regarding trust and reputation,” in Proc. 20107th Int. Conf. Ubiquitous Intell. Comput. and 7th Int. Conf. on Auto-nomic Trusted Comput. (UIC/ATC), Oct. 2010, pp. 410–415.

[21] B. Hay, K. L. Nance, and M. Bishop, “Storm clouds rising: Securitychallenges for IaaS cloud computing,” in Proc. HICSS IEEE Comput.Soc., 2011, pp. 1–7.

[22] J. Keith and N. Burkhard, The Future of Cloud Computing — Oppor-tunities For European Cloud Computing Beyond 2010.

[23] K. M. Khan and Q. M. Malluhi, “Establishing trust in cloud com-puting,” IT Professional, vol. 12, no. 5, pp. 20–27, Sep. 2010.

[24] J. Lyle and A. Martin, “On the feasibility of remote attestation forweb services,” in Proc. IEEE Int. Symp. on Secure Comput. (SecureCom09), 2009, pp. 283–288.

[25] J. M. McCune, T. Jaeger, S. Berger, R. Caceres, and R. Sailer,“Shamon: A system for distributed mandatory access control,” inProc. IEEE Comput. Soc. 22nd Annu. Comput. Security Applicat.Conf. (ACSAC ’06) , Washington, DC, USA, 2006, pp. 23–32.

[26] Take Your Business to a Higher Level, Sun Microsystems, 2009.

[27] OpenSource. OpenStack 2010 [Online]. Available: http://www.open-stack.org/.

[28] OpenSource. OpenNebula 2012 [Online]. Available: http://www.open-nebula.org/.

[29] OpenStack.OpenStack Compute — Administration Manual 2011 [On-line]. Available: http://docs.openstack.org

[30] P. Bryan, M. M. Jonathan, and P. Adrian, “Bootstrapping trust in com-modity computers,” in Proc. 2010 IEEE Symp. Security and Privacy(SP ’10), Washington, DC, USA, 2010, pp. 414–429, IEEE Comput.Soc..

[31] R. Thomas, T. Eran, S. Hovav, and S. Stefan, “Hey, you, get off of mycloud: exploring information leakage in third-party compute clouds,”in Proc. 16th ACM Conf. Comput. and Commun. Security (CCS ’09),New York, NY, USA, 2009, pp. 199–212, ACM.

[32] R. Anbang and M. Andrew, “Repcloud: Achieving fine-grained cloudTCB attestation with reputation systems,” in Proc. Sixth ACM Work-shop on Scalable Trusted Comput. (STC ’11), 2011, pp. 3–14, ACM.

[33] S. Reiner, Z. Xiaolan, J. Trent, and V. D. Leendert, “Design and im-plementation of a TCG-based integrity measurement architecture,” inProc. 13th Conf. USENIX Security Symp. (SSYM’04), Berkeley, CA,USA, 2004, vol. 13, p. 16, USENIX Association.

[34] S. Nuno, P. G. Krishna, and R. Rodrigo, “Towards trusted cloud com-puting,” in Proc. 2009 Conf. Hot Topics in Cloud Comput. USENIXAssociation, Berkeley, CA, USA, 2009.

[35] S. Joshua, M. Thomas, V. Hayawardh, J. Trent, and M. Patrick,“Seeding clouds with trust anchors,” in Proc. 2010 ACM Workshop onCloud Comput. Security Workshop (CCSW ’10), New York, NY, USA,2010, pp. 43–46, ACM.

[36] S. Frederic, F. Andreas, K. Stefan, and E. Claudia, “Improving the scal-ability of platform attestation,” in Proc. 3rd ACM Workshop on Scal-able Trusted Comput. (STC ’08), New York, NY, USA, 2008, ACM.

[37] Trusted Computing Group, TPM Main, Part 3, Commands. Specifica-tion. ver. 1.2 Revision 103, 2007.

[38] VMware. VMware vCenter Server 2012 [Online]. Available: http://www.vmware.com/products/vcenter-server/.

Imad M. Abbadi has substantial experience in ITwhich includes almost all technologies behind cloudcomputing. He is currently leading the role of OxfordUniversity in the TClouds project. He designed, de-veloped, and delivered a worldwide novel postgrad-uate course in cloud security merging both the in-dustrial vision of clouds with solid scientific founda-tions. The course is now part of theOxfordUniversityM.Sc. in Software and Systems Security.

Anbang Ruan is a doctoral student at Oxford Uni-versity. His research interest is on building a practicaltrusted cloud infrastructure.