Enhancing OpenStack clouds using P2P technologies

46
October

Transcript of Enhancing OpenStack clouds using P2P technologies

Master of Science in Telecommunication Systems October 2017

Enhancing OpenStack clouds using

P2P technologies

Robin Philip Joseph

Faculty of Computing

Blekinge Institute of Technology

SE�371 79 Karlskrona, Sweden

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in

partial ful�llment of the requirements for the degree of Master of Science in Telecommunication

Systems. The thesis is equivalent to 20 weeks of full time studies.

Contact Information:Author:Robin Philip JosephE-mail: [email protected]

University advisor:Dr. Kurt TutschkuDepartment of Computer Science (DIDD)

Faculty of Computing Internet : www.bth.seBlekinge Institute of Technology Phone : +46 455 38 50 00SE�371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Abstract

It was known for a long time that OpenStack has issues with scal-ability. Peer-to-Peer systems, on the other hand, have proven toscale well without signi�cant reduction of performance.The objectives of this thesis are to study the challenges associatedwith P2P-enhanced clouds and present solutions for overcomingthem.As a case study, we take the architecture of the P2P-enhancedOpenStack implemented at Ericsson that uses the CYCLON P2Pprotocol. We study the OpenStack architecture and P2P tech-nologies and �nally propose solutions and provide possibilities inaddressing the challenges that are faced by P2P-enhanced Open-Stack clouds. We emphasize mainly on a decentralized identityservice and management of Virtual machine images.This work also investigates the characterization of P2P architec-tures for their use in P2P-enhanced OpenStack clouds. The resultssection shows that the proposed solution enables the existing P2Psystem to scale beyond what was originally possible. We also showthat the P2P-enhanced system performs better than the standardOpenStack.

Keywords: Cloud Computing, OpenStack, P2P, Scalability

i

Acknowledgements

Firstly, I would like to thank my family for bringing me this far in life.I am immensely grateful to my professor at BTH, Dr. Kurt Tutchku and mysupervisor at Ericsson, Dr. Fetahi Wuhib. This work would not be possiblewithout their guidance, insights and their patience.I am thankful to Dr. João Monteiro and Vinay Yadav from Ericsson CloudResearch, for providing support and boosting my morale. I am also grateful toEricsson for providing resources and experiences.I would also like to express my unending gratitude to my friends and colleaguesfor supporting me and encouraging me in my work.

ii

Contents

Abstract i

Acknowledgements ii

1 Introduction 11.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Peer-to-Peer Technologies . . . . . . . . . . . . . . . . . . 21.1.3 P2P-enhanced OpenStack clouds . . . . . . . . . . . . . . 3

1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Conceptual Background 52.1 Summary of the existing P2P-enhanced OpenStack . . . . . . . . 52.2 Keystone and Federated Identity . . . . . . . . . . . . . . . . . . 7

2.2.1 Keystone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Federated Identity . . . . . . . . . . . . . . . . . . . . . . 82.2.3 External Identity Providers (IDP) . . . . . . . . . . . . . . 92.2.4 Keystone as an IDP . . . . . . . . . . . . . . . . . . . . . 92.2.5 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Container Technology . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.1 LXD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Related Work 12

4 Method 144.1 Overview of the applied methods . . . . . . . . . . . . . . . . . . 144.2 Analysis of Basic Functional Requirements in OpenStack . . . . . 14

4.2.1 Identity service . . . . . . . . . . . . . . . . . . . . . . . . 154.2.2 Image management . . . . . . . . . . . . . . . . . . . . . . 164.2.3 Scheduling of Virtual Resources . . . . . . . . . . . . . . . 164.2.4 Networking Service . . . . . . . . . . . . . . . . . . . . . . 17

iii

4.3 Functional Analysis of capabilities of Peer-to-Peer technologies forOpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4 Analysis of Numerical Scalability and Performance of Features ofP2P Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Inference Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5.1 Keystone solution decision . . . . . . . . . . . . . . . . . . 224.5.2 Image management solution decisions . . . . . . . . . . . . 23

4.6 Implementation and Veri�cation . . . . . . . . . . . . . . . . . . . 244.6.1 Image management solution . . . . . . . . . . . . . . . . . 244.6.2 Decentralized Identity Service solution . . . . . . . . . . . 24

5 Results 285.1 Decentralized Identity Solution . . . . . . . . . . . . . . . . . . . 285.2 Peer-to-Peer System . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Analysis and Discussion 326.1 How can Peer-to-Peer architectures be characterized for their suit-

ability in P2P-based OpenStack clouds? . . . . . . . . . . . . . . 326.2 How can P2P technology be used for decentralized identity service

in OpenStack clouds and which P2P-technology should be appliedfor high scalability of this service? . . . . . . . . . . . . . . . . . . 33

7 Conclusions and Future Work 34

References 35

iv

List of Figures

1.1 OpenStack Services Outline . . . . . . . . . . . . . . . . . . . . . 1

2.1 Architecture of the existing implementation of P2P-enhanced Open-Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Architecture of the P2P-OpenStack agent . . . . . . . . . . . . . . 62.3 Keystone Backends . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Mapping of remote user to local user . . . . . . . . . . . . . . . . 11

4.1 Outline of phases and methods employed in the thesis . . . . . . . 154.2 Flow of image between OpenStack clouds . . . . . . . . . . . . . . 164.3 Functional design space of P2P service overlays . . . . . . . . . . 184.4 View of the Additional Dimension of the Functional Design space 194.5 Architecture of the Identity service solution . . . . . . . . . . . . 254.6 Flow of Requests between Agents for Tokens . . . . . . . . . . . . 26

5.1 Average Response times for 64 requests . . . . . . . . . . . . . . . 285.2 Average area under cputime curve for 64 requests . . . . . . . . . 295.3 Averages of Response times versus Number of requests and number

of SPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.4 VM startup time and fail rate of the standard and P2P systems

with increasing system size and load . . . . . . . . . . . . . . . . 315.5 VM failure rate of the standard and P2P systems (size 32) with

increasing concurrent VM requests . . . . . . . . . . . . . . . . . 31

v

Chapter 1

Introduction

1.1 Concepts

Peer-to-Peer technologies o�er a highly scalable and e�ective alternative to theserver-client model. These technologies, when incorporated into clouds, can pro-vide enhanced versions of clouds that are e�cient, scalable and overcome thelimitations of the server-client paradigm.

1.1.1 OpenStack

OpenStack is an open-source Cloud operating system which provisions virtualresources and services. The core functionality of OpenStack is delivered throughthe services that it o�ers. The OpenStack services among others include thecompute service, identity management, networking and image management. Fig1.1.1 taken from the o�cial OpenStack documentation [10] presents the Open-Stack services and their primary functions.

Each service of OpenStack provides a separate functionality and is exposed tothe end-user via REST APIs. Below is a brief outline of the OpenStack services.

Figure 1.1: OpenStack Services Outline

1

Chapter 1. Introduction 2

Nova (Compute service)

This service provides the management of virtual machines through an abstractionlayer for compute drivers that interfaces with supported hypervisors. Multiplehypervisors are supported by OpenStack including Kernel-based Virtual Machine(KVM), Xen, Hyper-V and Linux Containers (LXC).

Neutron (Networking Service)

Previously named Quantum, this service provides various networking functionslike creating and deleting networks, subnets. It also provides additional functionsfor management of IP address, DNS, DHCP, load balancing etc.,

Keystone (Identity Service)

Authentication and authorization throughout the OpenStack infrastructure areprovided by Keystone. This service is pluggable and integral in the secure com-munication of the di�erent services of OpenStack.

Glance (Image Service)

Glance Image service manages the discovery, registration and the distribution ofvirtual machine images (VMI) to the OpenStack compute nodes.

Horizon (Dashboard Service)

This service provides a web-based interface for managing, monitoring and provi-sioning of the OpenStack resources.

1.1.2 Peer-to-Peer Technologies

Generally, P2P systems are highly scalable, highly distributed and self-organizing,which implement P2P services and functions to perform tasks in an altruisticsense. In other words, each peer in the system is involved in transactions that donot directly bene�t itself but bene�ts the system on the whole. Every peer in astrictly P2P system is equal in terms of functionality and must be able to takethe role of both the server as well as the client, simultaneously. A P2P protocolprovides the function to join and leave the P2P overlay as well as to insert,delete, discover and retrieve the resources and also service-speci�c functions thatare highly distributed.

The term Peer-to-Peer (P2P) is often perceived as "not client/server" and hasutility in systems, protocols, mechanisms, architectures, services and applications[34]. In modern applications, the border between P2P and client-server paradigmsare not very clear. While some architectures can be considered to fully comply

Chapter 1. Introduction 3

with the P2P paradigm, others can be in between client-server and P2P dependingon the de�nition of P2P being considered. Thus, we �rst de�ne what we meanby P2P in our paper. We borrow this de�nition from [7].

"A system is considered to be P2P if the elements that form thesystem share their resources in order to render the service that thesystem is designed to provide. The elements in the system both provideservices to other elements and request services from other elements."

In practical applications, a system can be considered Peer-to-Peer even if fewof its elements do not strictly follow this criterion. This de�nition is within thecontext of a single service and hence in complex systems which o�er multipleservices, some of the services can be P2P while others can be client-server. P2Pservices are designed to overcome the limitations of the client-server model interms of fault tolerance, scalability and self-organization. Thus, as the numberof nodes increases, the performance of the P2P systems will not degrade as muchas that of the client-server systems.

1.1.3 P2P-enhanced OpenStack clouds

Multiple OpenStack clouds can be associated together to form a P2P systemand to co-operate and function as a single cloud. The performance of this P2P-enhanced OpenStack cloud was shown to be better than the standard OpenStackcloud when we considered the scale of the compute nodes [12]. This abstractionof several clouds as one cloud overcomes the limitations of the underlying ele-ments of the individual compute nodes. Thus, it is of interest to further developP2P-enhanced OpenStack clouds and study the suitability of P2P technologies inenhancing the other services of OpenStack in terms of scalability.

In this paper, we take the de�nition of scalability from [23]. Thus, Scalabilitycan be de�ned as the adaptability to changes in the peer-to-peer system size, loadextent and nature of the load. [34] states that scalability describes the numericalrelationship between an in�uencing factor that changes numerically over a largerange of values and a performance metric of a P2P service overlay.

1.2 Problem description

It is known that OpenStack is not very good at scaling. Thus, there is an in-terest in the research community to develop this aspect of OpenStack clouds byemploying technologies that are known to be scalable such as Peer-to-Peer tech-nologies. There is also a gap in the research area when it comes to implementingP2P technologies in the OpenStack clouds, investigating the limitations of theseP2P-enhanced clouds and e�ectively overcoming them.

Chapter 1. Introduction 4

1.3 Motivation

This thesis aims to provide a solution for scaling OpenStack cloud instances on thecloud level. We aim to investigate the suitability of P2P technologies in enhancingthe OpenStack services and e�ectively improving the scalability and performanceof these clouds. We use the case study of the P2P-enhanced OpenStack cloud setup at Ericsson to investigate the limitations and bottlenecks of the architectureand to evaluate the use of P2P technologies and functions in overcoming theselimitations. The primary focus is on improving the identity and authorizationmanagement of the P2P system.This thesis paper is structured as follows. Chapter 2 outlines the important con-cepts helpful in understanding this thesis paper. Chapter 3 presents the relatedwork done in the �eld of Peer-to-Peer technologies and OpenStack scalability.Chapter 4 describes in detail, the method along with the implementation andevaluation of the solutions. Chapter 5 and 6 provide the results of the evaluationalong with analysis and discussions. Lastly, chapter 7 provides the conclusionand future work of the thesis work.

1.4 Research Questions

From descriptions of the motivation and the problem statement along with aninvestigation on the P2P-enhanced Cloud taken as our case study, we have tworesearch questions that demand an investigation.

1. How can Peer-to-Peer architectures be characterized for their suitability inP2P-based OpenStack clouds?

2. How can P2P technology be used for decentralized identity service in Open-Stack clouds and which P2P technology should be applied for high scala-bility of this service?

Chapter 2

Conceptual Background

This chapter presents a summary of a P2P-enhanced OpenStack cloud that weuse as a case study. It also brie�y describes the concepts and technologies thatwere used in the thesis.

2.1 Summary of the existing P2P-enhanced Open-

Stack

The paper by Xin Han [12] pointed out the limitations in scalability of the Open-Stack compute nodes per controller node. Instead of looking at the lower levelissues of the individual services, the author proposed a new architecture of Open-Stack, as a solution which abstracts several OpenStack clouds to behave as a singleP2P-enhanced cloud and thus improves the scalability of OpenStack clouds.

The solution involved the design of an OpenStack agent which acted as amediator between the OpenStack API and the user. The agents collectively forma Peer-to-Peer overlay network. The system is a hybrid P2P system, as illustratedin Figure 2.1 taken from the same paper, where all the clouds share a commonidentity service provided by one centralized Keystone node. The protocol chosenwas the CYCLON protocol [35] because of its low diameter and ability to dealwith massive node failures and high churn rates. Each peer keeps track of a subsetof the peers in its neighbor list and a shu�ing algorithm is used to keep updatingand exchanging the list of neighbors.

The architecture of the agent includes an agent Database, HTTP server, aCYCLON component, four proxies (for four services of OpenStack Nova, Neutron,Glance and Keystone) and a scheduler component. Figure 2.2, also taken fromthe paper, outlines the architecture of the agent which includes the proxies foreach OpenStack service. The proxies act as brokers and maps agents' endpoints(URLs) to OpenStack components' endpoints. The database runs MySQL andstores the users' allocation info. The HTTP server listens on a port and mediatesbetween the user and agent. The CYCLON component is the implementationof CYCLON API and maintains a list of a�liated neighbors and exchanges itslist of neighbors with other agents periodically. Lastly, the scheduler selects an

5

Chapter 2. Conceptual Background 6

Figure 2.1: Architecture of the existing implementation of P2P-enhanced Open-Stack

OpenStack cloud instance to create the requested resource based on �lters andweighers. Filters and weighers are used to select the suitable cloud from theoverlay network.

As a proof of concept, Create, Read and Delete (CRD) operations of Open-Stack are partially supported. The evaluation was done through two scenariosusing a benchmarking tool called Rally, and it involved comparing the bench-marks of a Standard OpenStack cloud with the Peer-to-Peer solution that wasimplemented. The �rst scenario tests the concurrency of the requests and howthe increase in concurrent requests a�ect the controller response times. The sec-ond scenario evaluates the e�ect of increasing the number of controller nodes onthe response times. The results show that without considering the e�ect of the

Figure 2.2: Architecture of the P2P-OpenStack agent

Chapter 2. Conceptual Background 7

centralized Keystone node, using Peer-to-Peer to abstract multiple clouds as onecloud makes it scale better compared to a conventional OpenStack cloud.

2.2 Keystone and Federated Identity

2.2.1 Keystone

The Keystone service provides the basic identity and authorization functions inOpenStack clouds. The concepts in Keystone focus on authorization, access man-agement, and discovery. Keystone manages access and authorization by creatingan RBAC (Role Based Access Control) policy which is enforced on each publicAPI endpoint. Figure 2.3 taken from the book by Martinelli et al. [20] gives an il-lustration of the most common Keystone back-end functions and the technologiesused to provide them. The components in green viz., Assignment, Resource andCatalog are usually SQL (Structured Query Language) based, the red identitycomponent indicate usually LDAP (Lightweight Directory Access Protocol) orSQL, the blue Token indicates SQL or Memcache and lastly, the gray componentis the Policy and served from a �le.

SQL backend

When Keystone acts as an identity provider, the credentials and informationsuch as name, password and descriptions can be stored in databases like MySQL,PostgreSQL and DB2. The bene�ts of using these databases are that they canbe easily set up and user management can be done through OpenStack APIs.However, Keystone provides weak password support and no password rotation orrecovery. Most enterprises have their own LDAP server that they want to use.

LDAP backend

The bene�t of using LDAP backend is that Keystone is no longer the identityprovider and the security of the identity service will depend on the enterpriseLDAP service. Keystone still needs to authorize the users and thus users need tobe created in Keystone as well, which may not be feasible in hybrid clouds withhundreds of hosts. Service accounts for each OpenStack service also need to bestored somewhere and enterprises may not want it to be in LDAP back-ends.

Federated Identity backend

In the case of Federation, OpenStack leverages the existing infrastructure to au-thenticate users and retrieve info about users. There is a further separationbetween Keystone and handling identity information along with possibilities forsingle sign-on service and hybrid clouds. The actual passwords don't go through

Chapter 2. Conceptual Background 8

Figure 2.3: Keystone Backends

Keystone and identity providers handle authentication completely. The short-coming of this approach is in the complexity of setting up identity sources. Thefollowing section will give a more detailed description of the federated identity inOpenStack.

2.2.2 Federated Identity

OpenStack provides an extension to its identity service, Keystone, which allows auser associated with one cloud to access resources of another trusted cloud. Thisextension is called 'Federated identity' and is supported in OpenStack versionsstarting from Kilo. Through this extension, authentication and authorizationfunctions of the identity service are separated and assigned to separate entities.These entities are the Identity Provider (IDP) which manages credentials and theservice provider (SP) which authorizes and grants access to resources.

Federation involves some common steps to implement, irrespective of the iden-tity provider used. They are generalized as follows.

Con�guring Apache to use federation capable modules.

This includes setting up the Apache modules for the corresponding Federationprotocol used (SAML or OpenID connect) and also con�guring the IDP serviceto be properly discovered by the SP.

Con�guring Keystone to support federation.

This includes creating the OpenStack projects, domains, groups, roles, mappingsetc., as well as con�guring the SP and IDP for use with their respective clouds.

There are two ways in which OpenStack consumes federated identity, usingexternal identity providers and using Keystone as the identity provider.

Chapter 2. Conceptual Background 9

2.2.3 External Identity Providers (IDP)

OpenStack supports external identity providers and broadly two kinds of identityproviders viz., Security Assertion Markup Language (SAML) based and OpenIDConnect (OIDC) based identity providers [8]. These external identity providerscurrently operate when Keystone is run under the Apache web server. The twoApache modules mod_shib and mod_auth_openidc are used as clients to theExternal IDPs. These modules act as an interface between Keystone service andthe IDP service.

SAML

The Security Assertion Markup Language (SAML) was built on x.509 by de�ningauthentication and attribute assertions in XML rather than in ASN.1. Shibbo-leth, an implementation of the SAML standard was developed as part of the "In-ternet2 Middleware Initiative" to address resource sharing among organizationsthat use di�erent authentication and authorization methods. The implementa-tion of Shibboleth in an Apache module mod_shib interfaces with Keystone andthe identity provider.

OpenID connect

OpenID Connect is an authentication layer on top of OAuth version 2.0 (OpenAuthorization), an open standard for token-based authorization and authenti-cation for internet users to authorize websites and access their information onother websites without exchanging passwords. OpenID Connect was speci�callydesigned for federated identity management and SSO (Single SignOn) [8]. Theusers' identity is encoded in a secure JSON Web Token (JWT), which is portableand supports a range of signature and encryption algorithms. The Apache mod-ule mod_auth_openidc interfaces with Keystone and OpenID connect identityproviders in OpenStack clouds.

2.2.4 Keystone as an IDP

OpenStack also supports Keystone service itself being an identity provider, in-stead of depending on external identity providers. Thus, the credentials of oneKeystone can be used to log into multiple OpenStack clouds that have their ownKeystone services for authorization. The users log in through the IDP Keystoneand receive a signed SAML assertion which they use to authenticate into the SPKeystones. This �ow uses the SAML protocol outlined in section 2.2.3.

Chapter 2. Conceptual Background 10

2.2.5 Mapping

Keystone Federated identity is made possible through the use of a concept knownas 'Mapping'. As the name suggests, there is a mapping of remote users with thatof local users or groups. The Authorising Keystone service can then make policydecisions based on Role Based Access Control (RBAC). For this purpose, Open-Stack creates temporary 'ethereal' users which have control over the resources,these users are removed once their tasks are complete. Projects can also be pro-visioned at the time of authentication. The Figure 2.4 gives an illustration ofthe mapping document which is given as an argument to the mapping functionof Keystone. In the �gure, the users described by the regular expression "*@stu-dent.com" are mapped to a local group given by the group id. Thus the remoteusers can have the same access permissions as the local group.

2.3 Container Technology

Compared to Virtual Machines, containers are faster to start and stop and also uselesser resources for the same tasks. They do not need to boot an entire machineand only require a small part of the Operating System to run applications. Thisenables users to run a higher number of applications per physical machine andalso multiple instances of the same application without dependency issues.

2.3.1 LXD

LXD is a container management system that handles creation, con�guration andnetworking in LXC containers. It is built on top of LXC containers which is anOS-level virtualization for running containers using two kernel features namelyCgroups and Namespaces. These features provide isolation of resources and man-age access to them. The bene�t of LXD compared to other container technologieslike Docker is its closeness in features to hypervisors like Xen and KVM. Thusit makes sense to use LXD instead of Docker containers to deploy software likeOpenStack which is an incorporation of multiple applications and processes.

2.3.2 Docker

Docker combines the isolation features of the kernel like Cgroups and Namespaceswith a union-capable �le system to start multiple containers that are streamlinedto run one process or app and scale up and down as required. Docker containersalso provide faster deployment and easier management of processes. Thus Dockeris best suited for applications like MySQL, Apache web servers etc.

Chapter 2. Conceptual Background 11

Figure 2.4: Mapping of remote user to local user

Chapter 3

Related Work

Distributed Cloud computing and federated clouds have been a topic ofresearch interest since the invention of virtualization technology. Khethavath etal. [17] combine the concept of distributed hash tables and an auction-basedgame model to address the issue of resource discovery and resource allocationin a distributed cloud. This paper deals only with a theoretical approach withsome simulation. Cloud computing and P2P technologies were used by Mayeret al. [21] to provide a Platform-as-a-Service that emphasize on self-adaptabilityand reliability. Buyya et al. [6], Rochwerger et al.[30] and Munoz et al. [22] onthe other hand, independently propose their architectures to manage federatedclouds. However, they do not cover the scaling of these collaborative clouds andthe architectures do not solve the complete decentralized management of the cloudservices.

Apart from the general approaches to federation in clouds, DecentralizedAuthentication mechanisms and algorithms are also investigated extensively inthe research community. Yu et al. [37] explore the concept of trust in P2P net-works and propose a distributed reputation mechanism to detect unwanted andmalicious peers. Although the concepts and techniques are novel and e�ectiveagainst malicious peers, the authentication and identity management itself arenot considered. Neuman et al. [24] described one of the earliest authenticationsystems and protocols, Kerberos, upon which the OpenStack Keystone servicewas based. Kerberos was implemented alongside Keystone by Ayed et al. [2]which also use access control lists along with authorization tickets to implementa generic access control system for clouds. The simulations are done in CloudSimsoftware and also implemented in OpenStack with Kerberos integration. Al-though Kerberos included the concept of authenticating multiple Kerberos serversin a decentralized manner, it could not account for the issues of discovery andinteroperability in hybrid clouds.

Pustchi et al. [28] give a thorough and detailed elaboration of federated au-thorization and also propose a multi-cloud trust model at cloud, domain andproject levels. Sitaram et al. [32] present a method of federating clouds thatinvolves extending the OpenStack API to allow users to be authenticated againstmultiple clouds, including AWS clouds. This method has been superseded by the

12

Chapter 3. Related Work 13

OpenStack federated identity which is built into the newer versions of OpenStack.Chadwick et al. [8] explain the implementation and function of the federated iden-tity extension of OpenStack that allows users of one cloud to get authenticatedand use resources of other trusted clouds. This technology has been incorporatedinto OpenStack since the Kilo version release.

Management and Scheduling of Virtual resources like VM and VMIsin large-scale virtual environments is an np-complete problem that is still a topicof interest and relevance for researchers of current times. Víctor Méndez et al.[22] describe the use of an image contextualization manager in their federatedcloud architecture Rafhyc. In the paper, they utilize a golden boot image andnecessary information for di�erent contextualization methods. This method maybe suitable for scienti�c applications and PaaS, but in the case of P2P-enhancedOpenStack clouds, it would not be feasible to include contextualization for everyuse case. Razhavi et al. [29] leverage the ability of hypervisors to start a VM withonly a tiny part of the image locally stored. The remaining parts are downloadedon request. Caches are used to store the chunks of the image that are requiredto boot a VM. Wan et al. [36], Peng et al.[27], Lin et al. [19] and Kochut et al.[18] propose independent but similar solutions that use deduplication technology.The image is split into chunks and duplicate data blocks are avoided by using�ngerprinting technology. This helps save storage space and reduces the load onthe network. However, deduplication of image data in decentralized clouds withindependent storage servers has not yet been studied.

The works by Chen et al. [9] and Huang et al. [16] utilize the multi-sourcedownload capabilities of the BitTorrent protocol to deliver VM images. The VMsthat boot from a single image will share the resources among themselves andreduce the network load and time in fetching images from the storage devices.However, these works do not deal with the complete decentralization of sourcingthe images and is only e�cient when several VMs start from the same images.The method of using P2P for provisioning images was also analyzed by [13] and analgorithm was proposed for server bandwidth provision for e�cient distributionof chunks in the P2P model.

In summary, the work presented in this paper builds on previous research toexplore how Peer-to-Peer and distributed technologies can be used to address theissues of OpenStack scalability. While the earlier works focus on distributed andhybrid clouds and few works relate to using some P2P technologies for certaincloud functionalities, we study the possibilities of combining the two and evaluatethe suitability of Peer-to-Peer technologies in OpenStack clouds.

Chapter 4

Method

4.1 Overview of the applied methods

The methods followed in this thesis work are listed below and will be describedin detail in the subsequent sections.

� Analysis of the Basic Functional Requirements in OpenStack clouds.

� Functional Analysis of the capabilities of available Peer-to-Peer technologiesfor network control applications and for OpenStack clouds.

� Analysis of Numerical Scalability and Performance of Features of Peer-to-Peer architectures.

� Inference Phase

� Veri�cation Phase: Implementation and Enhancement of P2P technologiesin OpenStack clouds

Figure 4.1 illustrates the methods and phases employed in the thesis work.The �rst three methods constituted the accumulation of data, processing it andanalyzing the information. Then we performed inference and decided on what toimplement. The �nal stage was the implementation of the solution and testing itagainst the predicted behavior in a realistic scenario.

4.2 Analysis of Basic Functional Requirements in

OpenStack

The thesis conducted an analysis of the services of OpenStack clouds and theirrequirements. The previous work on P2P-OpenStack clouds was studied and thetechnical scope and performance requirements were de�ned. The OpenStack ar-chitecture was analyzed by [12] and it was discovered that the scalability of thecompute service, Nova, in terms of the number of compute nodes was limited.

14

Chapter 4. Method 15

Figure 4.1: Outline of phases and methods employed in the thesis

This was because of the limited number of socket descriptors in RabbitMQ mes-sage broker service and also scalability issues with certain components, whichled to the inception of the P2P-enhanced OpenStack solution described by thepaper. The paper also mentions the lack of a decentralized identity service inthe solution presented. Thus after an analysis of the P2P-enhanced OpenStacksolution, a brief outline of the requirements in OpenStack clouds is presentedfor improving their scalability. The requirements, based on the analysis done onOpenStack clouds and the P2P-enhanced OpenStack clouds described in Chapter2 are given below.

4.2.1 Identity service

The current P2P-enhanced OpenStack, as illustrated by Figure 2.1, uses a cen-tralized Keystone node to manage authentication and authorization of all theclouds in the P2P network. This arrangement makes the single node vulnera-ble to security and performance issues since it can act as a bottleneck or singlepoint of failure. Additionally, since this system involves the use of multiple clouds(spread across devices) that act as one P2P system, the resources of one user mayalso be spread across multiple clouds. This centralized Keystone service becomesa bottleneck for a system size exceeding 16 clouds. The challenge then becomesto authenticate one user across all the clouds (which accommodate the user's re-sources). It has to be noted that a P2P system may not have a global view of thesystem and also that the system should be able to authenticate a user belongingto any cloud dynamically [8].

Chapter 4. Method 16

Figure 4.2: Flow of image between OpenStack clouds

4.2.2 Image management

The current implementation performed the distribution and management of Vir-tual machine images (VMI) among the di�erent OpenStack clouds in the P2Poverlay, by sending the entire binary image �le from the source cloud to the tar-get cloud. This implies that the image is �rst copied from the glance storageback-end to the target glance service, which copies it to its back-end. When arequest is sent to the target cloud to boot a VM using the image �le, the computenode again copies the �le from the glance back-end to its local �le system. Asit happens to be a P2P system, if such a request is received by a peer that isneither the source nor the target for the image �le, as illustrated by Figure 4.2,it may need to download the entire source image �le to its local storage and thenupload the �le to the target cloud. Since the image �les are usually in the order ofgigabytes, this process is ine�cient with respect to both, the network utilizationand the �le I/O.

4.2.3 Scheduling of Virtual Resources

According to many research works including [26] , �nding optimal solutions toallocate VMs to physical resources is an NP-complete problem. Scheduling ofVirtual Machines becomes an even greater hurdle when it comes to P2P systemsin which a scheduler does not have a global view of the system. The schedulingof VMs may depend on the policies that are employed in the organization aswell as the requirements of the users themselves. The policies of the organizationcould be to balance the load on the entire system and reduce the utilization ofthe physical devices to accommodate more users, whereas the users may preferto have their own VMs on the same network or in the vicinity. Hence there is arequirement to evaluate the scheduling algorithms and heuristics and choose themost optimal one for the right scenario and design a solution that dynamicallychooses the algorithm or heuristic and allocates the VMs to the physical resources.

Chapter 4. Method 17

4.2.4 Networking Service

In the P2P-enhanced cloud environment, the users may require their VMs tobe able to communicate with each other. Hence, if there are multiple VMs in-stantiated on di�erent resources, there will be the need to create virtual overlaynetworks. If the VMs are booted in the same physical network, then their la-tency will be minimum, compared to those created in clouds outside the physicalnetwork, thus there will be a requirement to create and manage overlay virtualnetworks and subnets such that the end user only sees the networks that he isassociated with and is not a�ected by the complexity of the underlying connec-tions. It is to be noted that creation and deletion of VMs are tasks that occurvery frequently in the cloud.

4.3 Functional Analysis of capabilities of Peer-to-

Peer technologies for OpenStack

The functional design space spans the possibilities space from which the archi-tecture of a speci�c P2P service overlay for a given control task can be chosen.In the case of OpenStack clouds, the functional design space would span thepossibilities of all the functions of P2P systems that a given OpenStack servicefunctionality can be extended with. In this paper, we have analyzed the func-tional components of di�erent P2P systems and the functional requirements ofOpenStack clouds with respect to scalability and performance and tried to classifythe P2P functions based on the possibility and bene�ts of their implementationin OpenStack clouds.

Figure 4.3 was taken from the work described in [34]. The paper reportsan initial work on investigating the functional possibilities of P2P architecturesand additionally, the emphasis was on the two basic P2P functions, "resourcemediation" and "resource access control and exchange". The virtualization of theP2P functions was also of interest, which had values ranging from 'centralized' to'decentralized'. Figure 4.3 illustrates the two-dimensional design space and alsoa classi�cation of the basic P2P architectures.

The architecture of the P2P-enhanced OpenStack can be taken from the func-tional design space by selecting the most suitable P2P architecture for the partic-ular application of OpenStack. An enhancement of individual services of Open-Stack may not require the incorporation of the entire architecture and thus anadditional dimension can be added to the functional design space to include theindividual functional components that can be taken from the P2P architecture.

Figure 4.4 illustrates the additional dimension added to the original functionaldesign space. For representational purposes, we show only a few P2P architec-tures and only two dimensions of the Functional design space. This design spaceconsists of the feature set that serves the purpose of each of the P2P architectures.

Chapter 4. Method 18

Figure 4.3: Functional design space of P2P service overlays

Each block in the �gure represents one feature of a Particular P2P architecture.Thus, depending on the use-case, a feature can be taken from the feature set andincorporated into the OpenStack Services.

A functional analysis was done on some of the popular P2P architectures andthe functionalities are presented in brief as follows.

Chord

Chord is a structured P2P system and uses Distributed Hash Tables in elegantlystructuring the overlay topology. The randomness of peer-ID assignment impliesthat the successor peers may be diverse in ownership, jurisdiction and geography.This may lead to high overhead in the underlying physical network.

Pastry

The Pastry protocol is a structured P2P overlay that takes into account the local-ity of the peers. Using a heuristic approach, it leverages two criteria, numericalcloseness and local proximity simultaneously. Pastry networks are vulnerable tointernal attacks like the Sybil attacks.

Kademlia

Kademlia makes use of the shortest unique pre�x of a peer's identi�er. Everymessage is used to update the bucket of a Kademlia peer and hence it does notprovide a periodic overhead to the underlying network. Kademlia also leverages

Chapter 4. Method 19

Figure 4.4: View of the Additional Dimension of the Functional Design space

parallel querying which e�ciently reduces the time taken to return a query re-sponse. The performance statistics of this protocol is well known for a largenumber of users.

CAN

Content Addressable Network (CAN) maps the resources to individual points in avirtual coordinate system. Multiple dimensions and realities can be chosen whichenables to add speci�c e�ciency requirements to the application. Its distinctfeature among others is the mechanism of taking over the zone of another peer thathad recently exited the system. The bene�ts of CAN include Data availability,fault tolerance along with an ability to increase routing e�ciency by increasingthe dimensions of the coordinate space. The complexity of the CAN architectureis one of its disadvantages.

Gnutella

The Gnutella protocol has the advantage of serverless operations along with sim-plicity. The peers also do not require any additional synchronization when a nodeleaves the overlay. The disadvantages of Gnutella is in its use of �ooding whichgenerates high network tra�c [34]. Thus it does not scale well for large networks.The limited search implies that a lookup may not be exhaustive. Additionally,the randomness of the overlay leads to ine�ciency in operations. [23]

Edonkey

Edonkey has a simple and centralized resource mediation architecture with a lownumber of server entities and thus each operation has a low number of hops.

Chapter 4. Method 20

There is more control over the availability of the resources but as there are alow number of index servers, it is more prone to shutdowns and server overloads.Edonkey assumes that every peer is equal and altruistic and thus is limited in itsadoption of peers that don't share resources ie., free riders.

BitTorrent

BitTorrent is a content distribution overlay which focuses on quick replication ofa single �le by leveraging a sophisticated incentive based Multi-Source Downloadmechanism and Piece selection strategies. The centralized tracker provides morecontrol over the resources and hence can help in maintaining control and to censorover the �ow of information. The tracker can also become a single point of failureand can make the overlay vulnerable against attacks. A distributed tracker-lessprotocol has also been proposed.

CYCLON

CYCLON overlays try to ful�ll the requirement of low diameter, are resilient tomassive node failure and deal with high churn rates, while retaining a randomgraph property [35]. Killing half of the nodes of the network does not threatenthe connectivity of the nodes.

Yappers

Yappers uses a combination of Gnutella and DHT and can group nodes accordingto some attribute they have in common [11]. It can perform a lookup on everynode as well as limit which nodes to search relative to the type of informationsearched.

4.4 Analysis of Numerical Scalability and Perfor-

mance of Features of P2P Architectures

The thesis work involved a comprehensive literature study and analysis on themethods for evaluating the scalability and performance of Peer-to-Peer architec-tures. The main research was done on the unique and e�ective ways of evaluatingthe scalability of popular P2P architectures.

The numerical scalability of P2P architectures has been evaluated by di�erentmethods and using di�erent measures based on the type of the P2P architectureand the function that it ful�lls in the system. [33] introduces the concept ofStochastic scalability. Traditionally scalability deals with the size of the systembut it would also be bene�cial to evaluate the mid-term and short-term stochasticbehavior of the system to get a better idea of how the system would scale and

Chapter 4. Method 21

perform. Many random variables like inter-arrival time, mean online time, queryrate and churn rates are some of the measures of stochastic scalability of a system.

In the case of Chord based P2P overlays which implement DHT algorithmsand form a ring topology, the connectivity of the peers with respect to otherpeers is considered as important as the functional scalability of the system. Twoimportant concepts have been introduced by [3] for evaluating the stability ofChord based platforms. Global disconnection i.e., the case where one speci�cpeer loses all its connections to the overlay and Local disconnection i.e., whereat least one peer in the overlay loses all its connections to the overlay. As thering becomes larger, the probability of local disconnection may reduce but theprobability of global disconnection will be higher. Thus evaluating these twoprobabilities can help evaluate the scalability of the system itself.

Physical link delays also a�ect the performance of searches in a P2P overlaynetwork. [4] evaluates the impact of network delay in the search times in DHT., toprove the scalability in very large chord rings and to guarantee QoS demands. Itwas found that the search delay in Chord based P2P rapidly increases in smallerring sizes but stays moderate for very large populations. This is because, oncethe size of the population grows by the next power of 2, the �nger table of thepeers grows by one entry and thus an additional peer can be reached by onlyone hop. Thus, mean search delays slightly decreases as the population increasesfrom 2i − 1 to 2i ∀i ∈ {0, 1, 2...}.

[5] evaluates the scalability of a Kademlia based distributed network man-agement tool by evaluating the search times for the peers with respect to theincreasing size of the overlay. The overlay size was not the only criteria for evalu-ating the scalability of the system. The correctness of the system does not dependon the size of the system and thus, the average number of known neighbors werecalculated with respect to the churn times of the P2P systems.

In the case of Pastry based overlay described in [15], the Cumulative Distri-bution Function (CDF) was calculated with respect to the search times of thedi�erent scenarios. The probability of successful search with respect to the hopswas also analyzed. Since the overlay network of Pastry does not consider thephysical network layout, longer routes are taken as often as shorter routes withrespect to the underlay and this could be a problem for some applications of P2Pand thus an evaluation of the search times with respect to the underlying scaleof the scenario was also considered. The paper also found that the leaf set ofsize 4 was a su�cient balance in the trade-o� between the improvements and theoverhead introduced into the system.

[14] and [25] evaluate the performance of an Edonkey based P2P overlay.Among others, they implemented a cache server which caches and serves themost popular �les to e�ciently reduce the network usage. The performance ofthe caching peer was evaluated by analyzing the Complementary CumulativeDistribution Function (CCDF) with respect to the download time of a 3 MB �lefor the di�erent applications of cache peers and also for the di�erent churn times

Chapter 4. Method 22

of peers.

4.5 Inference Phase

The thesis applied an inference phase where the functional and numerical scal-ability analysis of P2P systems were matched with the analysis of the basic re-quirements in OpenStack and P2P-enhanced OpenStack clouds, with the coreobjective of improving the scalability of P2P-enhanced OpenStack clouds.

The following subsections will outline the decisions involved in implementingthe solution.

4.5.1 Keystone solution decision

From section 4.2.1, we found two primary requirements in case of Keystone servicein P2P-enhanced OpenStack clouds, listed below.

1. Ability to authenticate a user against multiple clouds

2. Overcome the limitations of a centralized Keystone service

Thus, following the methodology, we matched these requirements with thecapabilities of P2P technologies and designed a solution that is scalable and im-proves on the previous implementation.

Peer-to-Peer based solution

The OpenStack identity service broadly provides two types of functions, Authenti-cation and Authorization. Keystone authenticates users by providing credentialswhen they register themselves to OpenStack services. Whenever a function callarises, the user authorization is veri�ed by lookup of databases. If the veri�ca-tion is successful, a token is provided which includes a list of projects and rolesassigned to the user and can be used for subsequent function calls, instead ofre-authenticating. These two functionalities correspond with the two functional-ities of P2P Architectures namely, resource mediation and resource-access-and-exchange [34]. Here, the resource is analogous to the user identity itself. Theproblems of identity management in the current P2P-enhanced OpenStack im-plementation viz., Storing Identities of all the clouds in the P2P overlay can beexternalized to another entity which can reduce the load on the Keystone service.In this way, every cloud can be provisioned with its own Keystone service whichcan authorize users generated by the external entity without having to store everyuser's data in its database. BitTorrent, which has centralized resource mediationand decentralized resource access and exchange, solves this issue. The BitTorrenttrackers provide the peers with information about other peers and resources, in

Chapter 4. Method 23

the same way, the external identity provider provides the identities and each peerin the OpenStack P2P network authorizes these identities and grants access to itsresources. The concept of Federated identity described in Chapter 2 enables theresource mediation function of Keystone to be externalized. From the possiblesolutions, Keystone-to-Keystone Federation was chosen as opposed to an exter-nal identity provider. Using the services of external IDPs introduces additionalcomplex components to maintain and also security issues which are best avoidedby making use of OpenStack components. Thus Keystone-to-Keystone was themost preferred option in designing the solution.

4.5.2 Image management solution decisions

P2P algorithms have been implemented by the research community to speed upthe provisioning of resources. [13] describes two models of P2P image manage-ment speci�cally using BitTorrent-based implementations, but on images that aresourced from a centralized data center. When the number of VMs instantiatedfrom a single image is not very large, these algorithms are ine�ective in reducingnetwork load as explained by [27], which also proposes a chunk level distributionalgorithm that leverages the ability of a VM to boot with only a small part ofthe image �le. Thus, a small portion can be fetched �rst and the rest can befetched on demand. [16] and [18] both emphasize the caching of similar chunksof the images and use multipoint collaboration to minimize latency in fetchingimages. The technique used is to divide the image �le into equal-sized or variablesized chunks and depending on the similarity of the VMIs in the host, locallyreconstruct these chunks and only retrieve the dissimilar chunks. This methodreduces the amount of information passing through the network apart from reduc-ing the time taken to provision a VM. [29] describes a similar approach of cachingthe popular chunks and discusses the factors for VM instance slow-down namely,placement of the image on the physical node and transfer of the image to thecompute node. The same issues occur in the case of P2P OpenStack clouds, thedi�erence being that the placement and transfer of images will be on a broaderlevel i.e., clouds as opposed to nodes. [31] concentrated on e�ective replication ofimages to make them highly available and looks into optimizing the number anduniformity of placement of VMIs.

What is chosen and why?

Even though there were several ways of improving the management of Virtualmachine images, the conceptualized solution was similar to the implementationof [29] and the caching peer of [25] but instead of caching the chunks of theimages, we leverage the caching of the whole binary image �le. We designed asimple HTTP server which makes the image available through a URL and wealso leverage the compute node of the OpenStack cloud as a cache server. In the

Chapter 4. Method 24

previous implementation (Chapter 2), when an agent sends a request to boot aVM on a cloud which does not have the required image �le, the whole image �leis uploaded to the target cloud as described by Figure 4.2. The current solutiondoes not upload the image �le but makes the source image �le available to thetarget cloud through a URL generated by the HTTP server itself. This URLis used to register an image in the Glance service of the target cloud and uponstarting a VM, the respective compute node of the target cloud downloads theimage from the URL and stores it in its local storage. Subsequent boots will causethe image to be directly used by the compute node and thus avoids unnecessaryload on the network, without installing additional components to the OpenStackcloud.

4.6 Implementation and Veri�cation

The implementation part of this thesis was carried out by extending the existingsolution described by Chapter 2. Two functional requirements of OpenStackclouds mentioned in Section 4.2 were addressed in the case of implementation andevaluation. The implementation and evaluation of each functional requirementwere done as follows.

4.6.1 Image management solution

The image management issue was addressed by caching the binary image �lesusing OpenStack-based components. In the P2P-enhanced OpenStack agent,an additional simple component was added. This component, not only createsan image entry in the glance registry of the target cloud but also serves thebinary images via a URL available through Glance API. The component createsa symbolic link to the image on the local directory and then using REST API, itmakes the image available to the OpenStack agent. Since this solution solved theissue that reduces the number of copies of the images made in the complete P2Psystem, actual tests were not considered to be necessary and the time spent inwriting tests and verifying the reduction in the number of copies of images wouldnot do justice to the decentralized Identity service solution and hence testing ofthe image management solution was not given priority.

4.6.2 Decentralized Identity Service solution

The issue with the identity service was addressed by using the Federated identityextension of OpenStack, speci�cally, we used the Keystone-to-keystone (K2K)Federation, described in detail in section 2.2.2, where one Keystone acts as anIdentity Provider and all other Keystones in the system become the service

Chapter 4. Method 25

providers. This solution e�ectively reduces the load on the centralized Key-stone service by separating the Authentication and authorization functions ofKeystone. Thus the Keystone which acts as the Identity provider can, in futureworks, become decentralized itself. But in this thesis work, Keystone-to-KeystoneFederation was used and implemented as follows based on [8]. The implementedKeystone should, in theory, be able to allow users of more number of clouds toget authorized to the system resources.

Figure 4.5: Architecture of the Identity service solution

Figure 4.5 represents the architecture implemented by this thesis work. Thecentralized component is basically a Keystone service that has been con�guredas the Identity Provider (IDP) and individual clouds of the entire system areprovisioned with their own Keystone services which are con�gured as ServiceProviders (SP). The con�guration is as described in section 2.2.2.

IDP

The IDP is where users and identities are stored and veri�ed upon request. Itmainly consists of a Keystone service which is con�gured with SSL and provi-sioned to manage SAML assertions using Apache and Python modules.

SP

The service provider is where the accesses to the cloud resources are de�ned andauthorization is performed. The �ow of requests between two agents with respectto authorization tokens is described in Figure 4.6. When an agent requests for

Chapter 4. Method 26

Figure 4.6: Flow of Requests between Agents for Tokens

an object like VM or image to be created on a particular cloud, the SP willverify with the IDP whether the user is valid and by the use of mapping andRole Based Access Control (RBAC), the user is authenticated. This processinvolves an exchange of SAML assertion which is basically an XML documentthat describes the roles and information about the user. Subsequently, a cookieis generated which is then used to retrieve the token from the target cloud. Oncea token is received, an ephemeral user is created at the target cloud and thestandard OpenStack �ow is followed as if the cloud is local to the requesting user.Additionally, The tokens that were retrieved by a cloud from every other cloudis cached to be reused until they expire.

The testing and evaluation of this work include the testing of the scalabilityand performance of the identity solution as a system and also the testing of thescalability of the clouds after incorporating this identity solution into the P2Psystem that is already existing, as described in chapter 2.

Experimental Setup for Identity management

The Experiment for evaluating the scalability of the Keystone-to-Keystone Fed-eration setup involved the use of 5 servers provided through the tool MAAS(Metal-as-a-service). The speci�cations of each server are:

� 24 Core processor

� 24 GB RAM

� Ubuntu 16.04 (Xenial) OS

� 224.6 GB Storage

One Keystone IDP (Identity Provider) was setup and con�gured in a Dockercontainer on one of the servers and the other four were used to run the SPs(Service Provider) along with one common OpenStack service, Glance. The testinvolved a maximum of 32 SPs and to avoid any resource de�ciency, only 8 SPswere setup in each server with each SP inside its own Docker container.

The metrics involved are the CPU-time, response time and the number ofSPs. CPU-time, provided by the Linux kernel, indicates the total time spentby the CPU in execution. The response time is the time taken to perform an

Chapter 4. Method 27

OpenStack service request, speci�cally, 'glance image-list'. Python scripts werewritten to send the requests to the target number of clouds and with a set numberof repetitions and concurrency. The tests were performed for 64 requests andwith a logarithmic increase in the number of SPs. The area under the CPU-timegraph was calculated (the cumulative time spent by the CPU in performing allthe requests). Thus the expected result for a scalable system would be that thiscumulative execution time does not increase for an increasing number of Serviceproviders. The tests were performed with 10 iterations and the values were plottedand presented in section 5.

Experimental Setup for P2P system

To evaluate the decentralized identity solution as a part of the whole P2P system,the components of the decentralized solution were incorporated into the P2P sys-tem and tests were performed to compare this P2P-enhanced OpenStack cloudwith a standard OpenStack cloud. Since the previous solution could only accom-modate 16 clouds and was limited by the centralized Keystone component, thecurrent implementation was evaluated for up to 32 clouds.

The performance of the P2P system is evaluated under di�erent scenarios,changing the size of the system as well as load on the system. The results of testson the previous version of this system was documented in [12] which involved theuse of VMs. In the current version we have set up the system in LXD containersand the results are documented.

The OpenStack services corresponding to the controller node, compute nodeand the controller with agent services were deployed in a separate container. Theresources were restricted to the containers to get an understanding of the systemperformance at its limit (2 CPU cores and 6 GB of RAM for the controller, 2 CPUcores and 4 GB of RAM for compute nodes). A separate container is assigned todeploy the Keystone IDP.

To perform the tests, scripts were written which send requests to the systems,similar to users of a real cloud. The script attempts to boot a number of VMinstances concurrently and keeps track of the boot time and success rate of theboots.

The performance of the system is compared with that of an equivalent Open-Stack deployed as a standard set up. The resources of this standard OpenStacksystem are also constrained (4 CPU cores and 8GB of RAM for the controller,2 CPU cores and 4GB of RAM for the compute). The P2P and the standardsystem are considered equivalent when they both have the same number of com-pute nodes. The test scripts in the case of the standard OpenStack sends all therequests concurrently to the controller node.

Chapter 5

Results

The results section contains two subsections, section 5.1 gives the evaluationresults of the identity solution set up as an independent system and section 5.2gives the evaluation results of the P2P system that has been enhanced with theidentity solution.

5.1 Decentralized Identity Solution

The following �gures 5.1 and 5.2 represent the change in response times for 64requests with respect to the change in the number of SPs in the identity solutionset up as a separate system. The requests in this scenario is that of listing images(image-list) using the OpenStack Glance service. The axes of the following graphsare logarithmic to give us a proper perception of the changing parameters withrespect to the logarithmic increase in the scale of the system.

Figure 5.1 represents the results of the tests performed for 64 requests. Theresults show that the curve for the response times reduces when the system scalesto 8 and then remains more or less constant for increasing scale of the system.

Figure 5.2 represents the results of the tests performed for 64 requests. Theresults show that the curve for the average area under the CPU-time graph reduceswhen the system scales to 8 and then remains more or less constant for increasingscale of the system.

To evaluate the stochastic scalability of the system, the tests were also runwhile changing number of requests (logarithmic increments from 1 to 10000 re-quests) along with a changing number of service providers and the results arerepresented in Figure 5.3. The results show that the response times are consis-tent and relatively lower for a higher number of SPs.

5.2 Peer-to-Peer System

We also ran experiments to understand how the P2P system incorporated withthe identity solution performs when it scales, and how its performance comparesto an equivalent standard OpenStack deployment. In this experiment, we start

28

Chapter 5. Results 29

Figure 5.1: Average Response times for 64 requests

with a deployment of OpenStack with one cloud instance and con�gure a Pythontest script with one user. This script simulates a user and attempts to start 4VMs concurrently and measures how long it takes, on average, to boot up a VMand also what fraction of the attempts failed. This is repeated 15 times. Wethen double the system size (and the number of users) and redo the experiment.This is continued until the system has 32 instances with 32 users generating loadconcurrently. The entire set of experiments are repeated with the equivalentstandard OpenStack setup whereby the number of compute nodes is increasedwith the number of users generating the load. The results are provided in Figure5.4.

The following are the graphs showing a comparison of the performance ofthe P2P system (which incorporates the Keystone solution) and the standardOpenStack. The test scripts are run that boot 4 VMs concurrently and measurehow long it takes to boot up a VM and also the percentage of VMs that failed.We then double the system size and redo the experiment until there are 32 usersperforming the boot requests concurrently. This test is similarly performed in theStandard OpenStack and the results are presented.

The Figure 5.4 describes the change in average VM boot time and percentageof failed VMs with the increasing size of the system. The curves for the P2Psystem improves from system sizes larger than 8. At the size of 32, the boottimes for the standard OpenStack is 70% more than that of the P2P system andthe failure percentage is considerably larger compared to the P2P system. We

Chapter 5. Results 30

Figure 5.2: Average area under cputime curve for 64 requests

also notice that the performance of the P2P system does not show deteriorationcompared to the Standard system even at a system size greater than 16.

We also performed additional tests by �xing the system size at 32 and in-creasing the concurrency of the requests. Figure 5.5 shows that the Standardsystem has much greater fail rate and boot time compared to the P2P system forincreasing concurrency of VM boot requests.

Chapter 5. Results 31

Figure 5.3: Averages of Response times versus Number of requests and numberof SPs

Figure 5.4: VM startup time and fail rate of the standard and P2P systems withincreasing system size and load

Chapter 5. Results 32

Figure 5.5: VM failure rate of the standard and P2P systems (size 32) withincreasing concurrent VM requests

Chapter 6

Analysis and Discussion

6.1 How can Peer-to-Peer architectures be char-

acterized for their suitability in P2P-based Open-

Stack clouds?

P2P architectures can be characterized by their unique methods and techniquesimplemented to support their scalability. From this thesis work and the existingsolution that this work tries to improve on, it is clear that P2P architectures canbe used in two ways in OpenStack based clouds.

� Creating the overlay network of the OpenStack clouds

� Enhancing services of the OpenStack clouds

OpenStack is a large suite of services that each provide a di�erent functionalitybut collectively provides the cloud computing platform to the end users. Thecontroller node is the backbone on which OpenStack functions and so an e�ectivemeans of leveraging the strengths of multiple OpenStack clouds is to incorporatethe P2P membership protocols into the controller nodes.

From this thesis work, including the results in section 5, we have shown thatthe separation of resource mediation and resource-access-and-exchange as donein the BitTorrent protocol was e�ective in overcoming the issue of the centralizedidentity in the P2P system. We also proposed a solution to the image managementissue by designing an HTTP server for images that also acts as a client to theGlance server. Thus the individual services of OpenStack can be enhanced byincorporating the unique strategies and functionalities that P2P architecturesuse to achieve high scalability. The failure rate in section 5.2 in the case ofthe Standard System is asserted to be because of the limitation of the singlecentralized controller in handling the larger requests and in the case of the P2Psystem, because of the limitations with the Python thread and multiprocessinglibraries.

33

Chapter 6. Analysis and Discussion 34

6.2 How can P2P technology be used for decen-

tralized identity service in OpenStack clouds

and which P2P-technology should be applied

for high scalability of this service?

Keystone as described in section 2.2.1 provides two functions, Authenticationand Authorization. We have shown that separating the two functionalities isadvantageous in enabling the authentication and authorization between clouds.This can be done by recon�guring the Keystone service of each cloud to utilizeonly the authorization functionality while the authentication functionality can beexternalized to another identity provider.

The centralized component will be the authentication component and thisprovides the identities to the clouds. The decentralized component is the Key-stone service providers at each cloud that authorizes users of the system againstthe authorized resources of the system.

The results of this thesis work show that the performance of this identity so-lution does not reduce for increasing scale of the system both as an independentservice and also as part of the overall P2P-enhanced OpenStack cloud. On thecontrary, it has been shown that for increasing scale of the system, the perfor-mance of the identity solution as an independent system improves as the scaleincreases. And more importantly, it has been shown that it has been possible toscale the system beyond 16 clouds and overcome the limitation of the centralizedKeystone in the P2P system of the previous work.

Chapter 7

Conclusions and Future Work

This thesis work analyzed the scalability issues in P2P-enhanced OpenStackclouds and also the functionalities and architectures of P2P systems. Solutionsto the issues of the centralized identity and image management were proposedand evaluated. It was shown that Federated identity management componentof OpenStack was e�ective in improving the identity service in P2P-enhancedOpenStack clouds.

As future work, further research is essential to �nd solutions to the otherlimitations of the P2P-enhanced OpenStack clouds, especially the networkingbetween VMs of di�erent clouds. Another direction to further this work is to �ndapproaches to completely decentralize the Identity Provider Keystone. It wouldalso be interesting to see how functionalities of more P2P architectures can beused to enhance the increasing number of services of OpenStack.

35

References

[1] Luc Onana Alima, Ali Ghodsi, and Seif Haridi. A Framework for Struc-tured Peer-to-Peer Overlay Networks. In Global Computing, pages 223�249.Springer, Berlin, Heidelberg, March 2004.

[2] Hella Ka�el-Ben Ayed and Bilel Zaghdoudi. A generic Kerberos-based accesscontrol system for the cloud. Annals of Telecommunications, 71(9-10):555�567, October 2016.

[3] A. Binzenhofer, D. Staehle, and R. Henjes. On the stability of chord-basedP2p systems. In GLOBECOM '05. IEEE Global Telecommunications Con-ference, 2005., volume 2, pages 5 pp.�, November 2005.

[4] Andreas Binzenhöfer, Phuoc Tran-gia, Andreas Binzenhöfer, and PhuocTran-gia. Delay Analysis of a Chord-based Peer-to-Peer File-Sharing System.In in ATNAC 2004, 2004.

[5] Andreas Binzenhöfer, Kurt Tutschku, Björn auf dem Graben, MarkusFiedler, and Patrik Arlos. A P2p-Based Framework for Distributed Net-work Management. In Wireless Systems and Network Architectures in NextGeneration Internet, pages 198�210. Springer, Berlin, Heidelberg, July 2005.

[6] Rajkumar Buyya, Rajiv Ranjan, and Rodrigo N. Calheiros. InterCloud:Utility-Oriented Federation of Cloud Computing Environments for Scalingof Application Services. In Algorithms and Architectures for Parallel Pro-cessing, pages 13�31. Springer, Berlin, Heidelberg, May 2010.

[7] Gonzalo Camarillo. Peer-to-Peer (P2P) Architecture: De�nition, Tax-onomies, Examples, and Applicability. RFC 5694, November 2009.

[8] David W. Chadwick, Kristy Siu, Craig Lee, Yann Fouillat, and Damien Ger-monville. Adding Federated Identity Management to OpenStack. Journal ofGrid Computing, 12(1):3�27, March 2014.

[9] Z. Chen, Y. Zhao, X. Miao, Y. Chen, and Q. Wang. Rapid Provisioningof Cloud Infrastructure Leveraging Peer-to-Peer Networks. In 2009 29thIEEE International Conference on Distributed Computing Systems Work-shops, pages 324�329, June 2009.

36

References 37

[10] OpenStack Foundation. Home � OpenStack Open Source Cloud ComputingSoftware. https://www.openstack.org. Accessed: 2017-03-01.

[11] Prasanna Ganesan, Q. Sun, and H. Garcia-Molina. YAPPERS: a peer-to-peer lookup service over arbitrary topology. In IEEE INFOCOM 2003.Twenty-second Annual Joint Conference of the IEEE Computer and Commu-nications Societies (IEEE Cat. No.03CH37428), volume 2, pages 1250�1260vol.2, March 2003.

[12] Xin Han. Scaling OpenStack Clouds Using Peer-to-peer Technologies. Mas-ter's thesis, Chalmers University of Technology, Gothenburg, Sweden, 2017.

[13] Jian He, Yi Liang, Yonggang Wen, Di Wu, and Yupeng Zeng. On P2pMechanisms for VM Image Distribution in Cloud Data Centers: Modeling,Analysis and Improvement. In Proceedings of the 2012 IEEE 4th Interna-tional Conference on Cloud Computing Technology and Science (CloudCom),CLOUDCOM '12, pages 50�57, Washington, DC, USA, 2012. IEEE Com-puter Society.

[14] T. Hopfeld, K. Tutschku, F. U. Andersen, H. de Meer, and J. O. Oberen-der. Simulative performance evaluation of a mobile peer-to-peer �le-sharingsystem. In Next Generation Internet Networks, 2005, pages 281�287, April2005.

[15] T. Hossfeld, S. Oechsner, and K. Tutschku. Evaluation of a pastry-based P2poverlay for supporting vertical handover. In IEEE Wireless Communicationsand Networking Conference, 2006. WCNC 2006., volume 4, pages 2291�2296,April 2006.

[16] B. Huang, R. Lin, K. Peng, H. Zou, and F. Yang. Minimizing Latency inFetching Virtual Machine Images Based on Multi-point Collaborative Ap-proach. In 2013 IEEE International Conference on Green Computing andCommunications and IEEE Internet of Things and IEEE Cyber, Physicaland Social Computing, pages 262�267, August 2013.

[17] P. Khethavath, J. Thomas, E. Chan-Tin, and H. Liu. Introducing a Dis-tributed Cloud Architecture with E�cient Resource Discovery and OptimalResource Allocation. In 2013 IEEE Ninth World Congress on Services, pages386�392, June 2013.

[18] A. Kochut and A. Karve. Leveraging local image redundancy for e�cientvirtual machine provisioning. In 2012 IEEE Network Operations and Man-agement Symposium, pages 179�187, April 2012.

[19] C. Lin, Q. Cao, H. Zhang, G. Huang, and C. Xie. SDVC: A Scalable Dedu-plication Cluster for Virtual Machine Images in Cloud. In 2014 9th IEEE

References 38

International Conference on Networking, Architecture, and Storage, pages88�92, August 2014.

[20] Steve Martinelli, Henry Nash, and Brad Topol. Identity, Authentication, andAccess Management in OpenStack: Implementing and Deploying Keystone.O'Reilly Media, Inc., 1st edition, 2015.

[21] P. Mayer, A. Klarl, R. Hennicker, M. Puviani, F. Tiezzi, R. Pugliese,J. Keznikl, and T. Bure. The Autonomic Cloud: A Vision of Voluntary,Peer-2-Peer Cloud Computing. In 2013 IEEE 7th International Conferenceon Self-Adaptation and Self-Organizing Systems Workshops, pages 89�94,September 2013.

[22] Víctor Méndez Muñoz, Adrian Casajús Ramo, Víctor Fernández Albor, Ri-cardo Graciani Diaz, and Gonzalo Merino Arévalo. Rafhyc: an Architecturefor Constructing Resilient Services on Federated Hybrid Clouds. Journal ofGrid Computing, 11(4):753�770, December 2013.

[23] Jari Mäntylä. Scalability of Peer-to-peer systems. Technical report, HelsinkiUniversity of Technology, 4 2005.

[24] B. C. Neuman and T. Ts'o. Kerberos: an authentication service for computernetworks. IEEE Communications Magazine, 32(9):33�38, September 1994.

[25] Jens O. Oberender, Frank-Uwe Andersen, Hermann de Meer, Ivan Dedinski,Tobias Hoÿfeld, Cornelia Kappler, Andreas Mäder, and Kurt Tutschku. En-abling Mobile Peer-to-Peer Networking. In Wireless Systems and Mobility inNext Generation Internet, pages 219�234. Springer, Berlin, Heidelberg, June2004.

[26] Elina Pacini, Cristian Mateos, and Carlos García Garino. Multi-objectiveSwarm Intelligence schedulers for online scienti�c Clouds. Computing,98(5):495�522, May 2016.

[27] C. Peng, M. Kim, Z. Zhang, and H. Lei. VDN: Virtual machine image distri-bution network for cloud data centers. In 2012 Proceedings IEEE INFOCOM,pages 181�189, March 2012.

[28] Navid Pustchi, Ram Krishnan, and Ravi Sandhu. Authorization Federationin IaaS Multi Cloud. In Proceedings of the 3rd International Workshop onSecurity in Cloud Computing, SCC '15, pages 63�71, New York, NY, USA,2015. ACM.

[29] K. Razavi and T. Kielmann. Scalable virtual machine deployment using VMimage caches. In 2013 SC - International Conference for High PerformanceComputing, Networking, Storage and Analysis (SC), pages 1�12, November2013.

References 39

[30] B. Rochwerger, D. Breitgand, A. Epstein, D. Hadas, I. Loy, K. Nagin,J. Tordsson, C. Ragusa, M. Villari, S. Clayman, E. Levy, A. Maraschini,P. Massonet, H. Muñoz, and G. Tofetti. Reservoir - When One Cloud Is NotEnough. Computer, 44(3):44�51, March 2011.

[31] D. Shen, F. Dong, J. Zhang, and J. Luo. Cost-E�ective Virtual MachineImage Replication Management for Cloud Data Centers. In 2014 IEEE IntlConf on High Performance Computing and Communications, 2014 IEEE 6thIntl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf onEmbedded Software and Syst (HPCC,CSS,ICESS), pages 229�236, August2014.

[32] D. Sitaram, H. L. Phalachandra, A. Vishwanath, P. Ramesh, M. Prashanth,A. G. Joshi, A. R. Desai, Harikrishna Prabhu C. R, Prafulla, Shwetha R,and Yashaswini A. Keystone federated security. In 8th International Con-ference for Internet Technology and Secured Transactions (ICITST-2013),pages 659�664, December 2013.

[33] Phuoc Tran-Gia and Andreas Binzenhöfer. On the stochastic scalability of in-formation sharing platforms. Technical Report 364, University of Wuerzburg,6 2005.

[34] Kurt Tutschku. Peer-to-Peer Service Overlays � Capitalizing on P2PTechnology for the Design of the Future Internet. other, University ofWürzburg,Institute for Computer Science, Department of Distributed Sys-tems (Informatik III), June 2008.

[35] Spyros Voulgaris, Daniela Gavidia, and Maarten van Steen. CYCLON: Inex-pensive Membership Management for Unstructured P2p Overlays. Journalof Network and Systems Management, 13(2):197�217, June 2005.

[36] J. Wan, S. Han, J. Zhang, B. Zhu, and L. Zhou. An Image ManagementSystem Implemented on Open-Source Cloud Platform. In 2013 IEEE Inter-national Symposium on Parallel Distributed Processing, Workshops and PhdForum, pages 2064�2070, May 2013.

[37] Bin Yu, M. P. Singh, and K. Sycara. Developing trust in large-scale peer-to-peer systems. In IEEE First Symposium onMulti-Agent Security and Sur-vivability, 2004, pages 1�10, August 2004.