Cost and risk modeling of virtualized computing labs
-
Upload
norbert-madarasz -
Category
Technology
-
view
490 -
download
0
Transcript of Cost and risk modeling of virtualized computing labs
1
Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics
Department of Measurement and Information Systems
Madarász, Norbert
COST AND RISK MODELING OF
VIRTUAL COMPUTING LABS
ASSISTANT
Kocsis, Imre
BUDAPEST, 2014
HALLGATÓI NYILATKOZAT
Alulírott Madarász, Norbert, szigorló hallgató kijelentem, hogy ezt a diplomatervet
meg nem engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat
(szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint,
vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a
forrás megadásával megjelöltem.
Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar nyelvű
tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan
hozzáférhető elektronikus formában, a munka teljes szövegét pedig az egyetem belső
hálózatán keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem,
hogy a benyújtott munka és annak elektronikus verziója megegyezik. Dékáni
engedéllyel titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után
válik hozzáférhetővé.
Kelt: Budapest, 2014. 05. 15.
...…………………………………………….
Madarász, Norbert
Összefoglaló
A diplomaterv-feladat virtualizált számítógéplaborok (Apache VCL) költség- és
kockázatmodellezése volt annak érdekében, hogy az Apache VCL megoldás költségeit,
kockázatait és szolgáltatási szintjeit különböző hibrid számítási felhő konfigurációkban
lehessen vizsgálni. Az elkészített modellt egy hipotetikus, kari szintű Apache VCL
bevezetési projekt esetében használtam az optimális privát/publikus felhő konfiguráció
megtalálásához.
Kezdetben a nyílt forráskódú Apache VCL megoldással, illetve a tanszéken
használt Apache VCL-lel ismerkedtem meg. A megismerkedés során megértettem az
Apache VCL által használt foglalási mechanizmusokat, melyek a modellezési
feladathoz elengedhetetlenek voltak. Három fő folyamatot azonosítottam és
ismertettem. Ezek segítségével készült el egy egyszerűsített, csak a releváns folyamat
lépéseket tartalmazó Apache VCL modell.
Az egyszerűsített modell megalkotása után a következő feladat egy olyan
számítási felhő szimulátor megtalálása volt, mellyel valósághűen lehet modellezni és
szimulálni az Apache VCL működését. Erre a célra a szintén nyílt forráskódú CloudSim
szimulátort választottam. A CloudSim alapesetben nem volt tökéletesen alkalmas a
feladatra, ezért sok mindent kellett hozzáfejleszteni, hogy modellezni lehessen
különböző Apache VCL hibrid felhő konfigurációkat, azok költségeit, kihasználtságait,
jövőbeli kéréseit, illetve szolgáltatási szintjeit.
Végezetül a kibővített CloudSim szimulátor segítségével költség- és
kockázatoptimalizálást végeztem egy hipotetikus Apache VCL bevezetési projektet
példáján. Az esettanulmányon keresztül bebizonyítottam, hogy adott rendszerszintű
terhelés feltételezése mellett az elkészített szimulátor alkalmas az optimális méretű és
konfigurációjú felhő megtalálására. A döntés meghozatalához a felhasználói kérések
átlagos egység költségét és a felhő konfiguráció szolgáltatási szint értékét használtam.
Konklúzióként elmondható, hogy létezik egy olyan optimálisnak tekintett hibrid
felhő konfiguráció, ahol a felhasználói kérések egység költsége minimális, míg a
szolgáltatási szint felveszi maximumát. Kisebb egység költség csak a szolgáltatás
minőségének romlásával érhető el.
6
Abstract
The scope of the master thesis was to model cost and risk factors of virtual
computing labs (Apache VCL) in order to make cost, risk and service performance
optimizing researches on hybrid Apache VCL cloud setups. Using the created model a
hypothetical Apache VCL implementation project at a hypothetical Hungarian
university had to be optimized based on predefined criteria for a specific load profile.
At the beginning I got familiar with the open source Apache VCL and the
Apache VCL used at the Department of Measurement and Information Systems (DMIS)
of Budapest University of Technology and Economics (BME) in order to understand the
processes of computer allocations in Apache VCL to be able to simulate them later on.
There were three main types of allocations in Apache VCL which were circuitously
recognized and described. Using this knowledge I created a filtered, simplified model of
Apache VCL in order to simulate only the relevant parts.
After having the appropriate information about Apache VCL and the simplified
model the next step was to find such a cloud simulator that could be used to truly model
and simulate Apache VCL including the handling of reservation requests and slot
management as well. Eventually, the open source CloudSim cloud simulator was
chosen. CloudSim cannot be used to simulate Apache VCL without changes in its
source code. This modification contained four must have conceptual features which
were missing from CloudSim: cost, system utilization, future request and service
performance modeling related to hybrid cloud setups.
Finally, the relevance and usability of the extended CloudSim was demonstrated
through a hypothetical implementation project of Apache VCL at a Hungarian
university. During the case study I have proven that the created simulator can be used to
get cost optimal cloud setups for assumed future image requests. Decision making was
done by unit cost and service performance indicator of cloud infrastructure
implementation projects.
The conclusion is that there is an optimal hybrid cloud configuration where the
unit cost of image requests is minimal while the service performance indicator has its
maximum. Smaller unit cost can be only reached if the quality of the service is
decreased.
7
Table of contents
1 Introduction .................................................................................................................. 9
2 Apache VCL ............................................................................................................... 11
2.1 Virtual Desktop Infrastructure .............................................................................. 11
2.2 Apache VCL: an academic service ....................................................................... 11
2.3 Architecture of Apache VCL ................................................................................ 13
2.3.1 Self-service web portal .................................................................................. 14
2.3.2 Data model ..................................................................................................... 15
2.3.3 Management node .......................................................................................... 17
2.3.4 Network layout .............................................................................................. 17
2.3.5 Privileges ....................................................................................................... 18
2.4 Apache VCL of BME DMIS ................................................................................ 18
2.5 Modeling allocation policies of Apache VCL ...................................................... 19
2.5.1 Allocation by normal reservation .................................................................. 19
2.5.2 Block allocation ............................................................................................. 26
2.5.3 Predictive loading modules ............................................................................ 31
3 Simulation model ....................................................................................................... 37
3.1 Cloud simulation ................................................................................................... 37
3.2 CloudSim: the simulation framework ................................................................... 37
3.2.1 Architecture ................................................................................................... 38
3.2.2 Design and implementation ........................................................................... 41
3.2.3 Simulation framework ................................................................................... 43
3.2.4 Data center internal processing ...................................................................... 45
3.2.5 Communication among entities ..................................................................... 46
3.3 Simplified model of Apache VCL ........................................................................ 47
3.3.1 Simplified allocation by normal reservation .................................................. 48
3.3.2 Simplified block allocation ............................................................................ 49
3.3.3 Simplified predictive loading modules .......................................................... 50
3.4 Comparing Apache VCL and CloudSim .............................................................. 52
3.4.1 User ................................................................................................................ 53
3.4.2 Request........................................................................................................... 53
3.4.3 Image ............................................................................................................. 53
3.4.4 VMSlot........................................................................................................... 54
8
3.4.5 UserGroup ...................................................................................................... 54
3.4.6 VMSlotGroup ................................................................................................. 54
3.4.7 Reservation ..................................................................................................... 54
3.4.8 VMHost .......................................................................................................... 54
3.4.9 ManagementNode .......................................................................................... 54
3.4.10 Simplified allocation by normal reservation ................................................ 55
3.4.11 Simplified block allocation .......................................................................... 55
3.4.12 Simplified predictive loading modules ........................................................ 55
3.5 Additional conceptual features .............................................................................. 56
3.5.1 Cost modeling ................................................................................................ 56
3.5.2 Data center utilization .................................................................................... 56
3.5.3 Future request generation ............................................................................... 57
3.5.4 Service performance ....................................................................................... 57
3.6 Improvements in CloudSim .................................................................................. 57
3.6.1 Modified Java classes ..................................................................................... 58
3.6.2 New Java classes ............................................................................................ 61
3.6.3 Used Java libraries ......................................................................................... 65
3.7 How to use the simulator ....................................................................................... 66
3.7.1 Providing input parameters ............................................................................ 66
3.7.2 Starting the simulation ................................................................................... 69
3.7.3 Description of the output files ........................................................................ 70
4 Case study: a hypothetical implementation project ................................................ 72
4.1 Input parameters .................................................................................................... 72
4.1.1 Image types .................................................................................................... 72
4.1.2 Cloud computing infrastructures .................................................................... 73
4.1.3 CapEx and OpEx costs of private data center ................................................ 73
4.1.4 Opportunity cost for NPV .............................................................................. 75
4.1.5 SLA parameter ............................................................................................... 75
4.1.6 Future image reservations .............................................................................. 75
4.2 Simulation results .................................................................................................. 78
4.3 Finding the optimal cloud setup ............................................................................ 81
5 Summary ..................................................................................................................... 83
Bibliography .................................................................................................................. 85
Abbreviation list ............................................................................................................ 87
9
1 Introduction
Nowadays cloud computing service delivery model plays a very key role in
information technology (IT) services of both most business and non-business oriented
companies. Of course everything has some disadvantages even cloud computing has
some but these obstacles may slow down the transition from on premise architectures
but probably will not stop it. Cloud type service market is already worth €100bn, and is
still growing at 20% yearly [1]. As a consequence it can be drawn that it supersedes
older styles of IT in many areas as it has happened for a past few years.
Using cloud, universities could take advantage of cost reduction, flexibility,
faster deployment, better retain, performance, and the like. The question is what kind of
cloud computing services or solutions should be used by the universities to profit from
them and cloud is really worth to use at all? The answer is not so simple and there may
be various answers which all are right. There are several service models to be take into
account like Infrastructure-as-a-service (IaaS), Desktop-as-a-Service (DaaS), Platform-
as-a-Service (PaaS) or Software-as-a-Service (SaaS) and several solutions to the service
models as Virtual Data Center (VDC), Virtual Desktop Infrastructure (VDI), Java
platform or Customer Relationship Management (CRM) from cloud. Because of the
bunch of cloud services it must be considered by a specific university why some part of
the cloud is necessary or even useful for a specific university.
VDI takes lots of advantage: lecturers and students do not have to take part of
the labs personally to get access to the physical resources (e.g. computers, software) of
laboratories; rather the laboratories should get to be virtualized. VDI has the ability to
stop the dependency on lab premises and times, and make the laboratories more
flexible. VDI can be also used in e-learning as the students could have access to the
necessary resources of the education institutions for working out and submitting
assignments from anywhere.
At a university it is very important that appropriate number of resources for
students to be at service during and outside of the laboratory times. The capacity
planning is really ponderous under such circumstances. In order to serve the students
sizing of the private VDI cloud for peaks does not seem to be the best strategy. The
private VDI cloud has to serve only the average demand of students and the rest of
10
demand – above the average – public cloud should be made use. To reduce the costs
best an environment specific and optimized hybrid cloud setup (mix of private/public
clouds) should be defined in a very prudent way. For universities it is unacceptable not
to serve the demands coming from the students or having laboratory delays. These risk
factors of the hybrid cloud VDI solution must be handled to provide predefined high
availability, Service Level Agreements (SLAs) and Quality of Service (QoS)
parameters. At least all these requirements have to be taken into account when a
department or a faculty is thinking about building hybrid virtual computing lab
infrastructure. The topic optimizing cost and risk factors is relevant these days when all
entities including the education institutes need to cut their due to some economic
influential factors. My solution is exposed in this essay helps to utilize the power and
advantages of cloud computing service model namely virtualized computing labs.
In the study the open source Apache Virtual Computing Lab (VCL) will be used
as the private virtual computing lab infrastructure. The key functions of VCL should be
simulated to get statistical information about user behaviors, resource utilizations, and
so on. A cloud simulator will be used for analysis purposes of different types of VCL
cloud setups in order to simulate various reservation arrival processes, cost and possibly
fault models. Hybrid VCL setups – as mixing an in-house data center with a public
cloud – are not a technical reality for VCL yet, because VCL does not support the
submitting of image reservations in public clouds. Even so the question is in the thesis
that which VCL setup gives the optimal solution for a given number of requests from
the long-term cost optimization and risk mitigation point of view.
The desired cloud simulator – which truly models Apache VCL – needs to be
able to analyze the costs assigned to different private/public configurations. The cost
modeling ought to process the historical and real usage patterns of the Apache VCL
environment at BME DMIS. Likewise, the model has to be able to report some QoS
parameters to provide information about system’s performance.
Finally, this study will present a hypothetical hybrid virtual computing lab
implementation project at a Hungarian university where the created simulation model
and the simulator will be used to optimize cost- and risk factors based on several
assumptions like usage of Apache VCL, number of students, number of lectures,
resource demands and quality requirements.
11
2 Apache VCL
After a short introduction where the main points and motivations of this master
thesis have been mentioned I give a brief overview about the open source cloud
computing platform called Apache VCL. VCL was basically developed for universities
to provide predefined work environments for students at anytime from anywhere in
order to turn originally offline laboratories and workshops into an on-line, on-demand
type. It is also highly recommended to use VCL where bigger compute environment is
needed from time to time due to laboratories or workshops. In order to prepare and
preconfigure virtual compute environments for such occasions VCL provides the
opportunity to make it possible.
2.1 Virtual Desktop Infrastructure
Apache VCL platform is one of the realizations of desktop virtualization called
Virtual Desktop Infrastructure. Desktop virtualization is a software technology that
enables remote access to set of preinstalled and preconfigured virtual desktop
environments. Whereas VDI is a desktop-centric service that hosts users’ desktop
environments on remote servers and/or blade PCs. The environments are accessed over
a network using a remote display protocol by the users. [2] For the users this system
architecture takes advantage of accessing their desktop environment from anywhere.
Furthermore the users get the same applications and data from any location because the
servers and other resources are centralized so there is no restriction where the users
want to access their environments from. VDI provides more efficient way to maintain
the client environments as they are centralized.
2.2 Apache VCL: an academic service
VCL became an Apache Software Foundation top level project on June 20,
2012. [3] The Apache VCL platform is responsible for managing, controlling and
delivering desktop environments to users from centralized resources. There are number
of possible environments VCL can provision to, for example virtual machine on
different hypervisors, traditional bare-metal computer or clustered physical server.
The users make reservation via the VCL’s self-service web portal. Before
making reservation users have to select one from the available environments. After
12
making reservations web portal’s scheduling components determine which computer
resources are assigned to which reservations to run the chosen environments. After that
the requested environments are dynamically provisioned and configured to allow remote
access to the specific environments by the users. Then the users can remotely connect to
their specific prepared environments via remote desktop or Secure Shell (SSH) through
internet, Campus Area Network (CAN) or Local Area Network (LAN). Figure 2.1
depicts the overview of Apache VCL platform.
Apache VCL
Virtualized Data Center
...
Virtualmachines
Internet/CAN/LAN
Remote client
ReservationEstablishing connection
Remote desktop or terminal access
Figure 2.1 VCL overview (Source: [4])
VCL supports three types of authentication to confirm the identity of the users
using remotely the self-service web portal. [5] First is the built-in authentication
method. The users can log in with their usernames and passwords. The users are added
to VCL database by administrators. These users are called local VCL accounts. The
13
second one is Lightweight Directory Access Protocol (LDAP). VCL frontend can be
configured to use existing LDAP server. Last authentication method is Shibboleth. [6]
An environment or image is a collection of software that is installed on an
operating system. Users select from the list of environments they have access to. In
VCL every image may have different revisions. VCL controls the revisions of images
and also the creation of the images and the revisions. Creation of both images and
revisions depends on VCL’s privilege and authorization model. Users are granted
access to parts of the VCL web site and to resources through the privilege tree. All
together nine user permissions can be granted, and three resource attributes can be
assigned to a resource group in the privilege tree. [7]
So VCL is used to dynamically provision and broker remote access to a
dedicated environment for users. In general VCL provisioned computers are housed in
data centers and may be bare-metal, blade, rack mounted servers, standalone computer
lab machines, or virtual machines. For virtual machine provisioning VCL supports
VMware ESXi 4.x/5.x, ESX Standard server, Free Server, KVM and VirtualBox. For
physical, bare-metal provisioning Extreme Cloud Administration Toolkit (xCAT) is
supported by VCL. [8]
In order to determine how and when environments are used VCL has built-in
statistics page. The page provides data in readable format about number of reservations,
total and unique by user count, by OS and by environment. The page contains some
graphs about the usage trends of VCL as well.
2.3 Architecture of Apache VCL
VCL’s architecture consists of three main components:
Self-service web portal (also called frontend)
Database
Management nodes (also called backend)
14
Figure 2.2 VCL architecture (Source: [9])
Figure 2.2 shows the relationships among the components and main
responsibilities of the three parts.
2.3.1 Self-service web portal
The self-service web portal is the VCL access point for users. The User Interface
(UI) is responsible to authenticate the users either via the built-in, LDAP or Shibboleth
authentication methods, and to give authorization to the users to access only the parts of
the web portal. Then users can select from list of images. Users do not see all the
images in VCL, only those ones which users have rights to. The UI posts lots of
requests to the database because all of the information about the current system state is
stored in VCL’s database. For instance, frontend posts tens of select queries to
determine only the images which meet the privilege requirements, or to check if the
time period chosen by the user is still available and there is still available computer
node to fulfill the reservation. Not only select queries are executed in the database but
15
insert and update queries too. There is Application Programming Interface (API)
support in the frontend for making requests and provisioning resources from other
external software. [10] VCL Scheduler deals with scheduling the requests coming from
the users. The scheduler and the procedure of the reservations are going to be described
in chapter 2.5. For developing the frontend Apache Hypertext Transfer Protocol (HTTP)
Server with Secure Sockets Layer (SSL), PHP and Dojo Toolkit were used.
2.3.2 Data model
As can be seen VCL is very database driven technology so in this section I give
an overview about the most important tables of the VCL’s database schema as these
tables reflect the main notions and abstracts used by VCL.
The VCL database server has to be a MySQL 5.0 or later Relational Database
Management System (RDBMS). VCL database is a vcl schema in the MySQL RDBMS
and the tables are created during the Apache VCL installation. Database stores data
about the main entities in dedicated tables like computer, image, revision, log,
managementnode, module, OS, request, reservation, resource, schedule, state, user, and
other information as resource mappings, user privileges or image profiles. Table 2.1
contains the explanation of the database tables that are required to understand the main
VCL concepts later on.
Name of the table Comment on the table
blockComputers Tracks which computers have been allocated to individual
block allocation time slots.
blockRequest Contains all of the block allocations that have been
requested and their current state.
blockTimes Contains all of the time slots associated with a block
allocation that are active or have not yet been reached.
Computer
Contains all information about compute nodes and VMs that
VCL controls. All bare metal computers, virtual hosts, and
virtual machines must have an entry in this table. Images are
deployed on to computers. VCL needs to know about all of
the computers it will be managing. Entries for both physical
computers and VMs (also called VM slots) need to be
created in VCL for it to be able to manage them.
image
Contains all information about the images available through
VCL. It comes with a single required special image - "No
image" that is used to signify when a computer is not loaded
with anything. An image (or environment) is a collection of
software that is installed on an operating system. These
images can be deployed, used, modified, and saved. Images
can be designed to run directly on a computer (bare-metal)
16
or under a hypervisor (virtualized images).
imagerevision Contains an entry for every revision of each image.
managementnode
Contains information about each management node.
Management nodes run the VCL backend code (Perl code)
that is responsible for deploying images to computers when
users make reservations for images. Each management node
can manage a mix of physical and virtual computers.
OS Contains information about OSs VCL knows about.
request Contains information about every current or future
reservation.
reservation Contains information about every current or future
reservation.
resource
Contains an entry for every resource VCL knows about.
Every resource has a unique ID from this table, and a sub ID
from a resource specific table (computer, image,
management node).
resourcegroup
Contains all of the resource groups. Each resource group has
a type associated with it which can be one of image,
computer, management node, or schedule. The resource
groups are used to grant users access to resources and also to
allow VCL to know which resources can be used in relation
to other resources.
resourcegroupmembers Contains a list of which resources are in which resource
groups.
resourcemap
Contains which resource groups map to other resource
groups. Image groups are mapped to computer groups, and
management node groups are mapped to computer groups.
Any image in an image group can be run on any computer in
a computer group to which it is mapped if a user has
sufficient privileges to do so.
resourcetype Contains a list of all the resource types.
schedule
Contains all of the schedules available. Each computer must
have a schedule associated with it. Schedules provide a way
to define what times during a week a computer is available
through VCL.
state Contains all of the states used in VCL. Not all states are used
any place where states are used.
user Contains an entry for every user that has every logged in to
VCL.
usergroup
Contains all of the user groups. Each user group has certain
attributes associated with it. There are various places within
VCL that user groups can be used, with the primary place
being granting access to resources in the privilege tree.
usergroupmembers Tracks which users are members of which user groups.
vmhost Contains an entry for each virtual host.
vmtype Contains all of the virtual machine types.
Table 2.1 Main VCL concepts (as reflected by database tables) (Source: [11])
17
2.3.3 Management node
VCL management node or backend processes really often requests to the
database due to the same think like frontend does. The main task of backend is to
manage the computer nodes that mean loading, preloading, stopping, restarting and
configuring the nodes at specific time. The management node contains some
provisioning engines for bare-metal server and for virtual machines which are used to
communicate with the physical or virtual nodes in order to deploy, start or stop users’
requested environments. The VCL daemon that runs on the management node processes
requests to the database after a short period of time to be notified if there are any new
requests. For example, if there is at least one new request and if the request’s start time
is close enough to the actual time then the deployer accomplishes the deployment of the
user’s requested image. After finishing the deployment the user is notified via the web
portal that the compute environment is ready to use. [8]
In general for a production VCL environment image library is a shared storage,
either Network Attached Storage (NAS) or Storage Area Network (SAN). Image library
collects the image files, image metadata, Linux install trees. Image library is used by
backend during image deployment or other image related operations.
2.3.4 Network layout
This section describes the typical network layout required for VCL. VCL
originally architected using the classic approach to separate physically the “workload”
and the management communications. Due to that in most cases all hosts taking role in
VCL (e.g.: management node, hypervisor hosts) therefore they have to have two
physical interfaces: 1. public network: the users access the virtual machines remotely
through this network; 2. private network: this network applies to provisioning modules
where compute node is reloaded (ESX, VMware, etc.) [12] Figure 2.3 shows the typical
network layout.
18
Figure 2.3 VCL typical network layout (Source: [12])
2.3.5 Privileges
“Users are granted access to parts of the VCL web site and to resources through
the privilege tree. User permissions and resource attributes can both be cascaded down
from one node to all of its children. Additionally, cascaded user permissions and
resource attributes can be blocked at a node so that they do not cascade down to that
node or any of its children.
There number of user permissions that can be granted to users. They can be
granted to users directly or to user groups.” [13]
2.4 Apache VCL of BME DMIS
The university Apache VCL environment is used for laboratories, homework
assignments, and other research purposes. The VCL management node and the MySQL
database are installed on separate servers. There are 9 physical hosts that are responsible
for hosting virtual machines/images. Each server is an IBM x3550 with 8 Central
Processing Unit (CPU) cores, 32 GB memory, 2 * 136 GB Serial Attached SCSI (SAS)
disk. One more additional server belongs to the environment; the 1 TB Network File
19
System (NFS) shared storage. Figure 2.4 depicts the key elements of the VCL
environment of BME DMIS.
x3550 x3550 x3550 x3550 x3550 x3550 x3550 x3550 CentOS 6.4
ESXi ESXi ESXi ESXi ESXi ESXi ESXi ESXi
NFS
Bond Net.
Public
PrivateInternet
Storage
CentOS 6.4CentOS 6.4XenServer 6.2
Gateway
Shibboleth
OpenVPN
x3550
Host forservice VMs
Man. Node
Windows 8 CentOS 6.4
MySQLVCL Backend
VPN Client
x3550
ESXiAdministrative Console
Nagios
Collectd
DNS Forw.
CentOS 6.5
PXE Boot Serrver
Windows Server 2012
Database
ESXi Performance
moitoring
Cloud VM
vm-small-xx
Cloud VM
vm-medium-xx
Cloud VM
vm-large-xx
Cloud VM
vm-xlarge-xx
Cloud VM
vm-2xlarge-xx
Cloud VM
vm-4xlarge-xx
Cloud VM
vm-test-xx
Cloud VM
vm-tiny-xx
AdministratorUser
VPN Client
Figure 2.4 Network topology of VCL of BME DMIS (Source: [14])
2.5 Modeling allocation policies of Apache VCL
In order to achieve my goals understanding the processes of computer
allocations is a must. Under allocation process I understand how a computer is assigned
to a specific user request. The second notable part is the preloading scheduling logic. It
tries to predict which images the users would like to use in the future and the predictive
module preload the most probable images to reduce the image deployment times in case
of ad-hoc requests. As these policies are not documented by default it took lots of time
to investigate and understand them.
There are three main types of allocations in Apache VCL version 2.3: normal
reservation by a user/student, block allocation by a user/teacher, and predictive loading
modules. Each of these allocation types are described below.
2.5.1 Allocation by normal reservation
Figure 2.5, Figure 2.6, Figure 2.7, Figure 2.8 and Figure 2.9 show the main
function calls and other thesis relevant events that are done when a user submits his/her
request on the New Reservation page in VCL. The flow chart has been created looking
and going through the VCL Perl and PHP codes. For more information the source code
of VCL should be checked.
In order to serve users’ requests VCL scheduler (part of the frontend and written
in PHP programming language) executes the functions and tasks are shown on Figure
20
2.5, Figure 2.6, Figure 2.7, Figure 2.8 and Figure 2.9. After a submitted request function
newReservation() in requests.php file prints the form for submitting a new request by a
user. Function getUserResources() in utils.php returns the resources the user has access
to.
When the user chooses the image, reservation time, and submits his request
VCL scheduler ensures that the request can fit in the schedule or not. It adds if the
request fits, or notifies the user either way. Function submitRequest() in requests.php is
called when the user hits the submit button on the New Reservation page. In order to
check that the specific request constitute an available request function isAvailable() is
called by submitRequest(). If the return value is an integer > 0, it means there is
available computer which the environment can be processed on. This is good news
because the user can be served. In other cases (-3, -2, -1, 0) user is notified with error
message according to the error case.
21
newReservation()prints form for submitting a new reservation
getUserResources()
list of resources a user has access to
and returns it
getImages()
an array of images
checkValidImage()
If valid environment
was selected or view available
reservation times
Image and reservation time
chosenNo valid
environment or reservation time
User clicks on New Reservation page
Figure 2.5 VCL normal reservation flow chart, part one
22
Figure 2.6 VCL normal reservation flow chart, part two
23
Figure 2.7 VCL normal reservation flow chart, part three
24
Figure 2.8 VCL normal reservation flow chart, part four
25
Figure 2.9 VCL normal reservation flow chart, part five
26
2.5.2 Block allocation
Block allocation is a way to have a set of compute nodes preloaded with a
particular environment at a specified time for a specific group of users. This is ideal for
such occasions when a group of students will need access to the same image for a
limited time (classrooms or workshops). It can be made available on a repeating
schedule such as when a course meets each week. The images are preloaded prior to the
start time of the workshop. When the workshop starts, only those users get access to
these block allocated environments who are the members of the given user group. After
the lab or workshop is done the resources are again available for the other users from
different user groups.
Block allocation only allocates machines for the group of users – it does not
create the actual, end user reservations for the machines. All users still must log in to
the VCL web site and make their own reservations during the period a block allocation
is active. The forms on the Block Allocations page provide a way for the user to submit
a request for a block allocation. After reviewing the block allocation requests system
administrator approve or reject them. If a user just needs to use a machine through VCL,
he/she has to use the New Reservation page for that instead of submitting block
allocation.
Figure 2.10, Figure 2.11, Figure 2.12, Figure 2.13 and Figure 2.14 show the
main function calls and other thesis relevant events that are done when a user submits a
block allocation on the Block allocations page in VCL. The flow chart has been created
by looking and going through the VCL Perl and PHP codes. For more information the
source code of VCL should be checked. The figure contains the substantial operations
during a block allocation. If the system administrator approves a block allocation the
details are stored in the VCL database. There is a Perl function called main() that is the
vcld or vcl daemon. This is that code which always runs on the backend and plays a key
role in processing block allocations. It gets all the block requests assigned to this
specific management node and loops through them. It checks if the block request is
already being processed. If no, then it calls function check_blockrequest_time() in
utils.pm that is responsible for checking block request start, end and expire time. If the
expire time is in the past then it returns with expire for the block request to be removed.
If it is 30 minutes to 6 hours in advance to the start time of the block allocation the
request has to be started and it returns with start. If end time is less than 1 minute from
27
the actual time then it returns end for the request to be ended. If there are any block
requests to be processed then function make_new_child() is called to begin processing.
After that it calls Process() in blockrequest.pm which does different things depending
on start, end or expire mode is set before make_new_child() was called. Function
Process() is used to start or end nodes physically.
Figure 2.10 VCL block allocation flow chart, part one
28
Figure 2.11 VCL block allocation flow chart, part two
29
Figure 2.12 VCL block allocation flow chart, part three
30
Figure 2.13 VCL block allocation flow chart, part four
31
Figure 2.14 VCL block allocation flow chart, part five
2.5.3 Predictive loading modules
Figure 2.15, Figure 2.16, Figure 2.17 and Figure 2.18 present the main function
calls and other thesis relevant events are done when VCL backend decides which image
will be the next image for the machines. VCL uses two type of predicting algorithm:
Level_0 and Level_1. You can change the default Level_0 algorithm on page
Management Node Information via VCL UI. Flow charts have been created by looking
and going through the source code of VCL backend.
Level_0 algorithm visualized on Figure 2.15 is the simpler. If a reservation is
expired then function get_next_image() in Level_0.pm is called which contains the
algorithm. If the node is part of block reservation then the next image is the one
reserved in block reservation. If there is a reservation according to the specific computer
with start time less than 50 minutes and the request state is new, reload or imageprep
32
then the next image is that one which belongs to the request. Else, there are no
upcoming reservations on the computer so algorithm fetches the next image information
and reloads that image. Next image information is editable by the admin of the
computer on page Computers.
Request in reload phase
check if node is part of block reservation
get_block_request_image_info()
Checks the blockcomputers table matching image
Select image for the given computer
where the request state is IN(new,
reload, imageprep)
Check if there is a reservation
according to the computer with start time less
than 50 minutes from now
Yes, give the image belongs to that
reservation
Yes
No upcoming reservations on the
computer
Fetch next image information, reload the next image of
the computer
Next image chosen
No
Next image policy set by
admin
Figure 2.15 VCL predictive loading module for Level_0 algorithm flow chart
Level_1 – shown on Figure 2.16, Figure 2.17 and Figure 2.18 – is more
sophisticated than Level_0 because of the consideration of the past. Firstly, the online
computers (where state is available, reserved, reloading, inuse, timeout) are counted
33
then the available computers (computers which state is available). After that, two
variables are calculated: and
. Based
on the usage the following time frame is determined, if usage bigger than:
40%, timeframe = 1 day
35%, timeframe = 2 days
30%, timeframe = 3 days
25%, timeframe = 4 days
20%, timeframe = 5 days
15%, timeframe = 10 days
10%, timeframe = 20 days
5%, timeframe = 30 days
0%, timeframe = 2 months
With help of this value algorithm defines the most popular but actually not
loaded image in the time frame. Of course, only those possible images can be selected
which can run on the computer observing the resource mapping rules. If something goes
wrong and no image is selected then algorithm uses the next image information of the
machine.
34
Figure 2.16 VCL predictive loading module for Level_1 algorithm flow chart, part one
35
Figure 2.17 VCL predictive loading module for Level_1 algorithm flow chart, part two
36
Figure 2.18 VCL predictive loading module for Level_1 algorithm flow chart, part three
37
3 Simulation model
Afterwards we got closer to Apache VCL and understood the most important
parts of that the next step is to find an appropriate cloud simulator which can be used to
truly model and simulate the working of Apache VCL including the handling of
reservation requests and slot management as well. The desired simulator must be also
able to model hybrid virtual computing labs, namely Apache VCL as private and/or
public cloud computing solution. This chapter demonstrates how the CloudSim
simulation framework works and how the framework was extended to support the
requirements and getting my goals done.
3.1 Cloud simulation
Simulation is a much faster way to have the needed statistical results because it
is not necessary to wait for the running of the real VCL system as the simulator
simulates the execution without the need of physical realization of the operations. As an
example, the simulator does not need to deploy requested images physically on VMs.
Finding such a cloud computing simulator that can be used for my master thesis
without any modification in the code is not an easy task. Other problem is that most of
the current well-known free and open source grid or cloud simulators like GangSim,
SimGrid, iCanCloud, GridSim or CloudSim cannot be directly used to model hybrid
virtual computing lab environments. [15][16][17][18][19] Lack of the documentation
and the support also encumbers the searching of the ideal simulator. Developing own
simulator would be a huge work and it is not necessary even because from the listed
simulators CloudSim seemed good enough taking into account the logic, functionalities,
support, documentation, quality of the code and the usage of the cloud simulator. Of
course, CloudSim framework needs to be extended VCL to simulate as well. Eventually
CloudSim was chosen to model the working of Apache VCL.
3.2 CloudSim: the simulation framework
CloudSim was developed at the University of Melbourne, Australia in Cloud
Computing and Distributed Systems (CLOUDS) Laboratory of Department of
Computer Science and Software Engineering. CloudSim is an extensible simulation
framework that allows modeling, simulation, and experimentation of cloud computing
38
infrastructures and application services. [19] CloudSim can be used for initial
performance testing: it requires less effort and time to implement Apache VCL’s image
reservations and to test the performance of VCL in hybrid cloud environments with
little programming and deployment effort.
Thesis relevant functionalities of CloudSim:
support for modeling and simulation of (federated) cloud computing data
centers
support for modeling and simulation of virtualized server hosts, with
customizable policies for provisioning host resources to virtual machines
support for dynamic insertion of simulation elements, stop and resume of
simulation
support for user-defined policies for allocation of hosts to virtual
machines and policies for allocation of host resources to virtual machines
3.2.1 Architecture
Figure 3.1 shows the multi-layered design of the CloudSim software framework
and its architectural components.
Figure 3.1 CloudSim simulation layers (Source: [20])
“The CloudSim simulation layer provides support for modeling and simulation
of virtualized cloud-based data center environments including dedicated management
39
interfaces for virtual machines (VMs), memory, storage, and bandwidth. The
fundamental issues such as provisioning of hosts to VMs, managing application
execution, and monitoring dynamic system state are handled by this layer. It is possible
to compare different policies in allocating hosts to VMs (VM provisioning). This layer
also supports different policies in provisioning hosts to VMs. A host can be
concurrently allocated to a set of VMs that execute applications/images. User Code
exposes basic entities for hosts (number of machines), applications (number of tasks and
their requirements), VMs, number of users and their application types, and broker
scheduling policies.” [20]
3.2.1.1 Modeling the cloud
“The Datacenter entity manages host entities. The hosts are assigned to one or
more VMs based on a VM allocation policy. Here, the VM policy stands for the
operations control policies related to VM life cycle such as: provisioning of a host to a
VM, creation, destruction, and migration of a VM. Similarly, one or more application
services can be provisioned within a single VM instance.
A Datacenter entity can manage several hosts that in turn manage VMs during
their life cycles. Host is a CloudSim component that represents a physical computing
server in the cloud: it is assigned to pre-configured processing capability (expressed in
millions of instructions per second (MIPS)), memory, storage, and a provisioning policy
for allocating processing cores. VM allocation (provisioning) is the process of creating
VM instances on hosts that match the critical characteristics (storage, memory),
configurations (software environment), and requirements (availability zone) of the
provider.
An application service is assigned to one or more pre-instantiated VMs through
a service specific allocation policy. Allocation of application-specific VMs to Hosts in a
cloud-based data center is the responsibility of a Virtual Machine Allocation controller
component (called VmAllocationPolicy). By default, VmAllocationPolicy implements a
straightforward policy that allocates VMs to the host in First-Come-First-Serve (FCFS)
basis. Hardware requirements such as the number of processing cores, memory and
storage form the basis for the provisioning.
For each Host component, the allocation of processing cores to VMs is done
based on a host allocation policy. This policy takes into account several hardware
40
characteristics such as number of CPU cores, CPU share, and amount of memory that
are allocated to a given VM instance. CloudSim supports several simulation scenarios
that assign specific CPU cores to specific VMs (a space-shared policy) or dynamically
distribute the capacity of a core among VMs (time-shared policy), and assign cores to
VMs on demand. Each host component also instantiates a VM scheduler component,
which can either implement the space-shared or the time-shared policy for allocating
cores to VMs.” [20]
3.2.1.2 Modeling the cloud market
“Modeling of costs and economic policies are important aspects to be considered
when cloud simulator is designed. The cloud market is modeled based on a two layers
design. The first layer contains economic of features related to IaaS model such as cost
per unit of memory, cost per unit of storage, and cost per unit of used bandwidth. Cloud
customers (SaaS providers) have to pay for the costs of memory and storage when they
create and instantiate VMs, whereas the costs for network usage are only incurred in
event of data transfer. The second layer models the cost metrics related to SaaS model.
Costs at this layer are directly applicable to the task units (application service requests)
that are served by the application services. Hence, if a cloud customer provisions a VM
without an application service (task unit), then they would only be charged for layer 1
resources (i.e. the costs of memory and storage).” [20]
3.2.1.3 Modeling a federation of clouds
“In order to federate multiple clouds, there is a requirement for modeling a cloud
coordinator entity. This entity is responsible not only for communicating with other data
centers and end-users in the simulation environment, but also for monitoring and
managing the internal state of a data center entity. The information received as part of
the monitoring process, that is active throughout the simulation period, is utilized for
making decisions related to inter-cloud provisioning.” [20]
3.2.1.4 Modeling dynamic entities creation
“CloudSim supports dynamic creation of different kinds of entities. Apart from
the dynamic creation of user and broker entities, it is also possible to add and remove
data center entities at run time. After creation, new entities automatically register
41
themselves in Cloud Information Service (CIS) to enable dynamic resource discovery.”
[20]
3.2.2 Design and implementation
In this section you can see details related to the fundamental classes of
CloudSim, which are the building blocks of the simulator. The overall class design
diagram for CloudSim is shown on Figure 3.2.
Figure 3.2 CloudSim class design diagram (Source: [20])
BwProvisioner: This class models the policy for provisioning of bandwidth to
VMs. The main role of this component is to undertake the allocation of network
bandwidths to a set of competing VMs that are deployed across the data center.
BwProvisioningSimple allows a VM to reserve as much bandwidth as required,
however this is constrained by the total available bandwidth of the host.
CloudCoordinator: This abstract class extends a cloud-based data center to the
federation. It is responsible for periodically monitoring the internal state of data center
resources. Concrete implementation of this component includes the specific sensors and
the policy that should be followed during load-shredding. This component can also be
extended for simulating cloud-based services such as the Amazon Elastic Compute
Cloud (EC2) Load-Balancer.
Cloudlet: This class models the cloud-based application. CloudSim orchestrates
the complexity of an application in terms of its computational requirements. Every
42
application service has a pre-assigned instruction length and data transfer overhead that
it needs to undertake during its life-cycle.
CloudletScheduler: This abstract class is extended by implementation of
different policies that determine the share of processing power among cloudlets in a
virtual machine. Two types of provisioning policies are offered: space-shared
(CloudetSchedulerSpaceShared) and time-shared (CloudletSchedulerTimeShared).
Datacenter: This class models the core infrastructure level services that are
offered by cloud providers (for example Amazon EC2, Microsoft Azure or Google App
Engine). It encapsulates a set of compute hosts that can either be homogeneous or
heterogeneous with respect to their hardware configurations (memory, cores, capacity,
and storage). Furthermore, every datacenter component instantiates a generalized
application provisioning component that implements a set of policies for allocating
bandwidth, memory, and storage devices to hosts and VMs.
DatacenterBroker: This class models a broker, which is responsible for
mediating negotiations between SaaS and cloud providers. The broker acts on behalf of
SaaS providers. It discovers suitable Cloud service providers by querying the CIS and
undertakes on-line negotiations for allocation of resources/services. The difference
between the broker and the CloudCoordinator is that the former represents the customer,
while the latter acts on behalf of the data center.
DatacenterCharacteristics: This class contains configuration information of data
center resources.
Host: This class models a physical resource such as a compute or storage server.
It encapsulates important information such as the amount of memory and storage, a list
and type of processing cores, an allocation of policy for sharing the processing power
among virtual machines, and policies for provisioning memory and bandwidth to the
virtual machines.
NetworkTopology: This class contains the information for inducing network
behavior (latencies) in the simulation. It stores the topology information, which is
generated using Boston university Representative Internet Topology Generator (BRITE)
topology generator.
RamProvisioner: This is an abstract class that represents the provisioning policy
for allocating RAM to the VMs. The execution and deployment of VM on a host is
43
feasible only if the RamProvisioner component approves that the host has the required
amount of free memory. The RamProvisionerSimple does not enforce any limitation on
the amount of memory a VM may request. However, if the request is beyond available
memory capacity then it is simply rejected.
SanStorage: This class models a storage area network. SanStorage implements a
simple interface that can be used to simulate storage and retrieval of any amount of data,
subject to the availability of network bandwidth.
Sensor: This interface must be implemented to instantiate a sensor component
that can be used by a CloudCoordinator for monitoring specific performance
parameters.
VM: This class models a virtual machine, which is managed and hosted by a
cloud host component. Every VM component has access to a component that stores the
following characteristics related to a VM: accessible memory, processor, storage size,
and the VM’s internal provisioning policy that is extended from an abstract component
called the CloudletScheduler.
VmAllocationPolicy: This abstract class represents a provisioning policy that a
VM Monitor utilizes for allocating VMs to Hosts. The chief functionality of the
VmAllocationPolicy is to select available host in a data center that meets the memory,
storage, and availability requirement for a VM deployment.
VmScheduler: This is an abstract class implemented by a host component that
models the policies (space-shared, time-shared) required for allocating processor cores
to VMs.
3.2.3 Simulation framework
CLOUDS Lab developed its own discrete event management framework which
is used in the latest version of CloudSim. The class diagram of the framework is
presented on Figure 3.3.
44
Figure 3.3 CloudSim simulation framework (Source: [20])
“CloudSim: This is the main class, which is responsible for managing event
queues and controlling step by step (sequential) execution of simulation events. Every
event that is generated by the CloudSim entity at run-time is stored in the queue called
future events. These events are sorted by their time parameter and inserted into the
queue. Next, the events that are scheduled on each step of the simulation are removed
from the future events queue and transferred to the deferred event queue. Following
this, an event processing method is invoked for each entity, which chooses events from
the deferred event queue and performs appropriate actions.
DeferredQueue: This class implements the deferred event queue used by
CloudSim.
FutureQueue: This class implements the future event queue accessed by
CloudSim.
CloudInformationService: This entity provides resource registration, indexing,
and discovering capabilities.
SimEntity: This is an abstract class, which represents a simulation entity that is
able to send messages to other entities and process received messages as well as fire and
handle events. SimEntity class provides the ability to schedule new events and send
messages to other entities, where network delay is calculated according to the BRITE
model. Once created, entities automatically register with CIS.
CloudSimTags: This class contains various static event/command tags that
indicate the type of action that needs to be undertaken by CloudSim entities when they
receive or send events.
SimEvent: This entity represents a simulation event that is passed between two
or more entities. SimEvent stores information about an event that have to be passed to
the destination entity.
45
CloudSimShutdown: This is an entity that waits for the termination of all end-
user and broker entities, and then signals the end of simulation to CIS.
Predicate: Predicates are used for selecting events from the deferred queue.” [20]
3.2.4 Data center internal processing
“Processing of task units is handled by VMs; therefore their progress must be
continuously updated and monitored at every simulation step. For handling this, an
internal event is generated to inform the DataCenter entity that a task unit completion is
expected in the near future. Thus, at each simulation step, each DataCenter entity
invokes a method called updateVMsProcessing() for every host that it manages.
Following this, the contacted VMs update processing of currently active tasks with the
host. The input parameter type for this method is the current simulation time and the
return parameter type is the next expected completion time of a task currently running in
one of the VMs on that host. The next internal event time is the least time among all the
finish times, which are returned by the hosts.
At the host level, invocation of updateVMsProcessing() triggers an
updateCloudletsProcessing() method that directs every VM to update its tasks unit
status (finish, suspended, executing) with the Datacenter entity. This method
implements a similar logic as described previously for updateVMsProcessing() but at
the VM level. Once this method is called, VMs return the next expected completion
time of the task units currently managed by them. The least completion time among all
the computed values is sent to the Datacenter entity. As a result, completion times are
kept in a queue that is queried by Datacenter after each event processing step. The
completed tasks waiting in the finish queue that are directly returned concern
CloudBroker or CloudCoordinator. This process is depicted on Figure 3.4 in the form of
a sequence diagram.” [20]
46
Figure 3.4 CloudSim data center internal processing (Source: [20])
3.2.5 Communication among entities
“Figure 3.5 depicts the flow of communication among core CloudSim entities.
At the beginning of a simulation, each Datacenter entity registers with the CIS Registry.
Next, the DataCenterBrokers acting on behalf of users consult the CIS service to obtain
the list of cloud providers who can offer infrastructure services that match application’s
hardware, and software requirements. In the event of a match, the DataCenterBroker
deploys the application with the CIS suggested cloud. The communication flow
described relates to the basic flow in a simulated experiment. Some variations in this
flow are possible depending on policies. For example, messages from
DataCenterBrokers to Datacenters may require a confirmation from other parts of the
47
Datacenter, about the execution of an action, or about the maximum number of VMs
that a user can create.” [20]
Figure 3.5 CloudSim communication among entities (Source: [20])
3.3 Simplified model of Apache VCL
In order to model virtual computing labs it is highly recommended to create
VCL’s modified model because VCL contains lots of such information which are not in
the scope. To get the modified model of VCL I had two options: brute force solution is
to model everything what VCL does, while the second option is to collect the necessary
information from VCL – by understanding its logic with huge work – and use the
gathered info to create my own model based on the filtered VCL knowledge. The
second option was obviously the better choice since a lot of parts of VCL was not
relevant. Hereafter I call my own model “simplified model” because it does not simulate
all functionality of the Apache VCL, only the relevant ones. The simplified model’s
configuration diagram is on Figure 3.6.
In the simplified model only virtual machines (or vmslots) have in mind so I do
not deal with the non-virtual computers. The entities are listed in the simplified model
suit to their corresponding table in VCL schema described in Table 2.1.
48
+ImageID : int+ImageName : string+ProcCoreNumber : int+ProcSpeedMhz : int+MemorySizeMB : int+DiskSizeGB : int+NetworkSpeedMbps : int+MaxConcurrentUsage : int+EstimatedReloadTimeMin : int
Image
+UserID : int+UserName : string
User
+UserGroupID : int+MaxOverlappingReservation : int
UserGroup
+VMSlotID : int+VMSlotName : string+State+ProcCoreNumber : int+ProcSpeedMhz : int+MemorySizeMB : int+DiskSizeGB : int+NetworkSpeedMbps : int+InBlockAlloc : bool
VMSlot
+VMHostID : int+ProcCoreNumber : int+ProcSpeedMhz : int+MemorySizeMB : int+DiskSizeGB : int+NetworkSpeedMbps : int+MaxVMSlots : int
VMHost
+ReservationID : int+StartTime : char+EndTime : char+BlockReservation : bool
Reservation
+RequestID : int+State+StartTime : char+EndTime : char+HasProcessed+BeingProcessed+BlockRequest : bool+NumberVMSlots : int
Request
+ManagementNodeID : int+ManagementNodeName : string
ManagementNode
1
*
belongs to
*
*
has
1
1..*
owns
1
*
assigned to
1
*
has
+VMSlotGroupID : int+VMSlotGroupName : string
VMSlotGroup
1
*
has
*
*
mapped to
1
*
configured next image
1
*
owns
*
1
belongs to
*
1
handled by
1
*
processed to
1
*
reservated
*
*
mapped to
0..1
*
selected (BlockAllocation)
*
*
mapped to
Figure 3.6 Simplified model of VCL
3.3.1 Simplified allocation by normal reservation
After the same consideration what can be read in chapter 3.3 I recreated the flow
chart – see Figure 3.7 – of normal reservation exposed in chapter 2.5.1. Otherwise I did
not made any fundamental change in the allocation process of normal reservation.
49
User submits a new reservation
Get request attributes from
Request
Create new Request()
Check for max concurrent usage of
image
Image.MaxConcurrent
Get vmslots that can run specific image
#Mapping Images-VMSlots
Get list of available, not scheduled
vmslots for that specific time
#Reservation
get list of vmslots we can provision
image to
1. Resource requirements are
fulfilled by vmslots#VMSlot
2. vmslots are ranked DESC by
procspeed * procnumber, RAM,
network
Return:A: vmslots are not
preloaded, provisionableB: vmslots are
preloaded, provisionable
C: all vmslots that are part of a block
allocation the logged in user is a
part of that are available between start and end time
find vmslots whose hosts can handle
the required RAM
(we don't need to do this if there are
VMs with the requested image already available
because they would already fit within
the host's available RAM)
#VMHost
Determines a vmslots to use from
return A, B, C looking at the arrays
in that order and tries to allocate a
management node for it
Return: The first vmslot that passed
Empty?
No, there is available vmslot found
Yes, no vmslot
available for specific time
End of reservation procedure/Begin of
provisioning
Determine how many overlapping reservations User can have based on
the groups User is a member of
UserGroup.MaxNumberImage
Figure 3.7 Simplified normal reservation flow chart
3.3.2 Simplified block allocation
After the same consideration what is stated in chapter 3.3 I recreated the flow
chart – see Figure 3.8 – of block allocation exposed in chapter 2.5.2. Otherwise I did not
made any fundamental change in the allocation process of block reservation.
50
Vcld.main()[in vcld]
Get all the block requests assigned to this management node
#Request
Only START mode will be handled
Update the BeingProcessed flag to 1
#Request
Loop through the block requests assigned to this
management node
Check if the block request is already being processed
Yes, next blockrequest
No
Expire time is in the past?
Yes, remove block request
Return: „expire”
30min to 6 hrs in advance to the request
start time?
Yes, start assigning resources
Return: „start”
End time it is less than 1 minute from
now?
Yes, end the block requestReturn: „end”
No, block request does
not need to be processed now
Return: „0"Return
Return value?
0, next blockrequest
„start” and block request has been
already processed, next blockrequest
Else
Blockrequest_mode equal „start”?
Add any vmslots from future reservations Users in the Usergroup maden and vmslots are available for
whole block time
#Reservation
Update the HasProcessed flag to 1
#Requests
All blockrequest_vmslots allocated?
Yes
No, print how many
images could be allocated
Eq „end” or „expire”: not interested in
during master thesis
Return to vcld.main()
Timer = 12 sec
Calculate allocated vmslots
Calculate requests to allocate
Determine start time of block vmslots
#Image.Loadtime+10min
Get available vmslots
(call isAvailable())
Reserve returned vmslots and insert into #Reservation
No
No
Figure 3.8 Simplified block allocation flow chart
3.3.3 Simplified predictive loading modules
After the same consideration what was mentioned in chapter 3.3 I recreated the
flow charts – see Figure 3.9 and Figure 3.10 – of predictive loading modules exposed in
chapter 2.5.3. Otherwise I did not made any fundamental change in the predictive
loading modules.
51
Request is in reload phase after inuse
state
vmslot is part of block reservation?
Get block request image info
#Reservation
Select image for the given vmslot where the request state is
IN(new, reload, imageprep)
There is a reservation
according to the specific vmslot with start time less than
50 minutes from now?
Yes, give the image belongs to that
reservation
Yes
No upcoming reservations for
vmslot
Fetch next image information, reload
the next image of the vmslot
#VMSlot.NextImageID
Next image chosen
No
Figure 3.9 Simplified predictive loading module for Level_0 algorithm flow chart flow chart
52
Request is in reload phase after inuse state
vmslot is part of block reservation?
Get block request image
info
#Reservation
Select image for the given vmslot where the request
state is IN(new, reload, imageprep)
There is a reservation according to the specific
vmslot with start time less than 50 minutes from now?
Yes, give the image belongs to that
reservation
Yes
No upcoming reservations for vmslot
Fetch next image information, reload the next image of the vmslot
#VMSlot.NextImageID
Next image chosen
No
Count online vmslots, VMSlot.State IN (available, reserved, reloading, inuse,
timeout)
Count available vmslots, VMSlot.State IN
(available)
notavail = online -
available
usage = notavail /
online
usage > X% to look at past Y days, otherwise, look at
past 2 months?
Check#Mapping Images-VMSlots
Else:timeframe = 2
months
If usage > 40%, timeframe = 1 dayIf usage > 35%, timeframe = 2 daysIf usage > 30%, timeframe = 3 daysIf usage > 25%, timeframe = 4 daysIf usage > 20%, timeframe = 5 days
If usage > 15%, timeframe = 10 daysIf usage > 10%, timeframe = 20 daysIf usage > 5%, timeframe = 30 days
Fetch preferred, possible images for vmslot (preferred means available images can
go on the vmslot)
Fetch which of those images are already loaded
#VMSlot
Which of those are not loaded
(difference of preferred and
loaded images)
Get the most popular, not loaded image in the timeframe
Something went wrong, no next image
selected?
Yes
No
Figure 3.10 Simplified predictive loading module for Level_1 algorithm flow chart flow chart
3.4 Comparing Apache VCL and CloudSim
Basically the aim of extending CloudSim is to support the simplified model of
Apache VCL. The simplified model of VCL was discussed in chapter 3.3. The model
represents the key functionalities of VCL which should be somehow modeled by the
simulator. CloudSim can be capable to model after some modification in its framework.
53
3.4.1 User
In CloudSim the DataCenterBroker entity acts on behalf of a user and can
submit requests. Brokers are created by calling createNewBroker_MN() described in
chapter 3.6.2.1.
3.4.2 Request
VCL request entity has two important parameters: start time and end time of the
image usage. Also important that the user requests an image so the requested image
belongs to the user entity. In CloudSim there is no option to set start time for a
requested cloudlet (VCL image = CloudSim cloudlet + CloudSim VM) only the length
of the cloudlet can be set in million instructions (MI). Creation of the CloudSim broker
can be postponed or shifted in CloudSim simulation clock time so the problem can be
solved by implementing such an algorithm that can convert the start time into clock
time and convert the requested running time into MI to set cloudlet length. These are
done by functions calcDelay_calendarToMs_MN() and calcCloudletLength_MN()
described in chapter 3.6.2.1. More information about CloudSim cloudlet and CloudSim
VM can be found in chapter 3.6.1.1 and 3.6.1.5.
One more missing feature in CloudSim that the cloudlet requests are originally
terminated if they are not immediately (for the first try) served by a VM in any data
centers. This was also modified by calling recreateVmsInDatacenter_MN() described in
chapter 3.6.1.3.
3.4.3 Image
Image entity has different parameters: name, virtual CPU, memory size, disk
size, network speed, maximum concurrent usage and estimated reload time. Users owns
the requests. Images actually are virtual machines which have to be started in one of the
hosts when the user submit a request for an image reservation. In VCL an image cannot
be directly launched in a host because firstly an appropriate virtual machine slot has to
be found. In CloudSim there is no entity called image but virtual machine. However, the
virtual machines cannot be requested by the users. Users can only request for cloudlets
in CloudSim. By default cloudlets have length in MI, file size and output size so no
resource and request time parameters like images are defined. Therefore, the image
entity is modeled by a mixed entity of cloudlet and virtual machine. This modeling is
54
done in function createNewBroker_MN() described in 3.6.2.1. More information about
CloudSim cloudlet and CloudSim VM can be found in chapter 3.6.1.1 and 3.6.1.5.
3.4.4 VMSlot
VMSlot is used in VCL to divide physical hosts to little logical resource units.
These resource units are used to run users’ image requests. For example a VMSlot with
1 virtual CPU, 1024 MB RAM, 10 GB disk space and 100 Mbps bandwidth can be used
to execute an image with less resource need than the VMSlot has. VCL creates the
images in VMSlots by minimizing the not used resources of the VMSlots. In CloudSim
there are no virtual machine slots by default. Images can be created in any host.
3.4.5 UserGroup
In CloudSim there are no user groups and were not extended to support user
groups.
3.4.6 VMSlotGroup
Likewise to UserGroup virtual machine slot groups are also not supported by
CloudSim and were not implemented into CloudSim.
3.4.7 Reservation
VCL reservation entity is used to store the accepted requests submitted by the
users. In CloudSim all of the users’ requests are submitted as I assume that none of the
users breaks the rules of requesting images.
3.4.8 VMHost
In CloudSim host entity is the very same like virtual machine host in VCL. Host
are created in function createHost_MN() described in chapter 3.6.2.1. Host entity has an
ID, CPU core count, CPU speed MIPS value, memory size, disk size, and network
speed, server room cost per month, power cost per month, operational cost per month
and investment cost.
3.4.9 ManagementNode
VCL management node is an entity to handle virtual machine hosts. This
functionality is represented in CloudSim by data centers. Data centers are created in
55
function createDatacenter_MN() described in chapter 3.6.2.1 and have the following
parameters: name, architecture, operation system, virtual machine monitor, time zone,
cost of an image per hour, cost per network usage.
3.4.10 Simplified allocation by normal reservation
This reservation process is described on Figure 3.7. In CloudSim normal
reservations are done almost in the same way like in VCL only the checking steps
(checking of overlapping reservations, maximum concurrent usage of image, vmslots
that can run specific image) are skipped. A CloudSim broker is equal with a VCL user.
The reservation will be placed in that host which can handle the resource need of an
image. It is possible to create more data centers, in this case the simulator tries to create
the virtual machine of the reservation within the data center in the order of their creation
at the beginning of the simulation (the order is the same as they were added in the input
Excel file).
3.4.11 Simplified block allocation
This allocation process is described on Figure 3.8. In CloudSim block
allocations can be modeled the very same way as a normal reservation is done except it
need to be set the Number of images requested (block reservation) input parameter for
the requested reservation. If an image request cannot be fulfilled because there is not
enough resource capacity in any host then the simulator continues to create the image in
later time again until one of the hosts has enough resource capacity to handle the
reservation (creating the virtual machine).
3.4.12 Simplified predictive loading modules
These modules are described in chapter 3.3.3. In the Level_0 predictive loading
module that specific image will be pre-started in the virtual machine slot which was set
by the administrator on VCL admin website. Level_1 algorithm preloads the most
popular, not loaded image in a specific time frame (this time frame varies from 1 day to
30 days) in the vmslot. As in CloudSim no vmslot entity exists modeling the loading
modules is a bit strange. To solve this issue the virtual machine creation time can be
delayed to get similar working. By default, the simulator assumes that there is no delay
for any VM creation. In VCL terminology it means that each vmslot is preloaded with
the right image. The advanced solution is that the creation of a specific image request
56
has a specific delay value and all the time when that image is created the creation time
will be delayed with that specific time.
3.5 Additional conceptual features
After the comparison of Apache VCL and CloudSim, there are still some
missing system related – such as cost, system utilization, future request and service
performance modeling related to hybrid cloud setups – features from CloudSim which
features are also missing from Apache VCL. All of these features provide important
information about the simulation of cloud infrastructures.
3.5.1 Cost modeling
In my master thesis work I look all of the costs (total costs of ownership (TCO))
of a cloud infrastructure as outgoing cash flows and I follow the basics of time value of
money theory to appraise projects as private cloud investment project. Therefore, net
present value (NPV) was used to calculate the value of the investment projects. It
compares the present value of money today to the present value of money in the future,
taking returns into account. Because the project has not incomings, only outgoing cash
flows I will use the outgoing values as positive values to make the visualizations easier.
[21] NPV is given by the period t, the cash flow at a given time Ct, the total number of
periods N and the opportunity cost of the money r:
∑
This NPV calculation is done for private data centers’ hosts in monthly basis by
the method getHostRunningCostMonthBasedNPV_MN() described in chapter 3.6.1.4.
For public cloud environment the pricing policy is designed on the Amazon’s business
model, so the cost of an instance is charged hourly as described in chapter 3.6.1.1.
3.5.2 Data center utilization
After the simulation terminates the utilization of private data center is really
important parameter to see whether the capacity of the private infrastructure is enough
to securely serve user requests in the future. For this purpose the simulator was
extended in that way to support this requirement: data center utilization is calculated as
57
the average CPU and RAM utilization of data centers’ hosts. This process is described
more detailed in chapter 3.6.2.1.
3.5.3 Future request generation
For generating thousands of future request loads an own algorithm was
developed. This algorithm takes some input parameters from the user like lecture
names, peak dates of the lectures, mean and standard deviation of average reservation
lengths and peaks. For each peak different peak dates and images can be added. The
algorithm uses normal distribution to generate requests randomly in time with peak date
as mean and hourly defined standard deviation. To get the reservation lengths normal
distribution was used with average reservation length as mean and hourly standard
deviation of average reservation length. Similar lectures’ start dates are shifted
randomly (based uniform distribution) to each other. The algorithm does the shifting
carefully to avoid starting and ending dates outside of the time windows of the current
semester (autumn or spring). Algorithm is the part of the method
createNewWorkloads_MN() which is described in chapter 3.6.2.1.
3.5.4 Service performance
Performance of a cloud computing system is a really important part of the result.
CloudSim does not provide rich output after a simulation, so no service performance or
the like is printed. That is the reason why I have extended the simulator with such
feature.
This is done by method calcSzolgaltatasbiztonsag_MN() described in chapter
3.6.2.1. The output file called “ServicePerformance*.txt” summarizes all important
information about the simulated requests: request IDs; user names; user IDs; image
names; data center IDs; data center names; host IDs; VM IDs; requested start, end and
running times; simulated actual start, end and running times; and wait times of the
requests.
3.6 Improvements in CloudSim
There have been modified couple of built in CloudSim core Java classes, created
some new Java classes and added three libraries to map Apache VCL in CloudSim
better than CloudSim would do without modification.
58
3.6.1 Modified Java classes
During mapping the VCL key functions on CloudSim the following CloudSim
Java classes were modified:
Cloudlet.java
Datacenter.java
DatacenterBroker.java
Host.java
VM.java
Code license of CloudSim is General Public License (GPL) which requires to
make the modified source code available so it was uploaded to make it public. [22]
3.6.1.1 Cloudlet.java
Cloudlet Java class is an extension to the cloudlet. By default CloudSim
provides the function called getCostPerSec() to get the running cost of a cloudlet in the
data center. This function returns the cost associated with running the cloudlet. The cost
value is set when data center’s characteristic is created. It means that the cost of all
cloudlets in a specific data center is calculated by using the same cost per second value.
This built in function is not totally perfect because I need hour basis costs and different
running costs for different cloudlet types (images). In order to get through these
problems I created a new function called getCloudletCostAmazonHourBased_MN().
This function has an input parameter costPerHour which solves both problems. The
pricing policy is designed on the Amazon’s business model, so the cost of an instance is
charged hourly.
3.6.1.2 Datacenter.java
Datacenter Java class deals with processing of VM queries (e.g. handling VMs)
instead of processing cloudlet related queries. The constructor had to be modified to set
the creation time of hosts. This is done by calling host’s setTimeCreated_MN()
function. Similarly, the destroy time is set to the host in shutdownEntity() function by
calling host’s setTimeDestroyed_MN() function.
3.6.1.3 DatacenterBroker.java
DatacentreBroker Java class represents a broker acting on behalf of a user. It
hides VM management, as VM creation, submission of cloudlets to this VMs and
59
destruction of VMs. Authors of CloudSim encourage developers to develop their own
broker policies to submit VMs and cloudlets according to the specific rules of the
simulated scenario. I took theirs advice and modified the original broker more times.
I have created three new attributes:
scheduleInMs: Brokers are basically created at the start of the simulation
but I needed brokers to start at different times. To make it possible I used
my own calculation method to postpone the creation of the broker. This
delay time in seconds is stored in scheduleInMs variable.
startTime: The requested start time of a reservation given by the user.
endTime: The requested end time of a reservation given by the user.
To be able to create new broker entities with the above listed variables new
constructor was necessary. It was done by extending the input parameters with the three
new attributes.
Original broker entity finishes its execution if it is not able to create a requested
VM in any data centers. This is not suitable for me, so the method processVmCreate()
was modified to process again the VM creation later. This is done by calling
recreateVmsInDatacenter_MN() which function sends a new VM creation message to
the data center then the data center tries to create the VM again in 4 minutes. The VM
creation cycle goes until the VM is created in one of the data centers.
CloudSim’s basic broker does not deal with reload times. The VMs are created
immediately (namely it takes 1 second) in one of the data centers if there is enough
resource in a data center. To handle this issue I had to modify the
createVmsInDatacenter_MN() method, which is used to create the virtual machines in a
datacenter by using a postponed VM creation message. The postpone time then is the
reload time of the specific image that was set by the user at the beginning of the
simulation. Each image has its own average reload time value and that value is used for
the VM creation.
DatacenterBroker extends the CloudSim SimEntity class and there are some
methods to be overwritten. One of these methods is startEntity() which is responsible
for starting the entity up. In order to schedule the broker entity to start up later as it
happens in VCL the scheduleInMs value should be used.
60
3.6.1.4 Host.java
Host Java class executes actions related to the management of virtual machines.
A host has a defined policy for provisioning memory and bandwidth, as well as an
allocation policy for processing elements to virtual machines. A host is associated to a
datacenter. It can host virtual machines.
In order to model private clouds’ TCO some modification had to be applied in
code. CloudSim does not provide any solution for that, so the modification in this class
was absolutely necessary.
New attributes and their roles:
timeCreated: Time value when the host was created, expressed in
Cloudsim simulation basis in seconds.
timeDestroyed: Time value when the host was destroyed, expressed in
Cloudsim simulation basis in seconds.
operationalCostMonth: Host’s average monthly operational room cost,
used to calculate NPV.
investmentCost: Host’s initial, one time investment cost, used to
calculate NPV.
backup: Boolean value to identify whether the host is in normal use or
for backing the system up.
runningTimeSec: Running time of the host during simulation in seconds.
datacenterName: Name of host’s data center.
NPV calculation is already mentioned in chapter 3.5.1 and done for every hosts
in method getHostRunningCostMonthBasedNPV_MN(). If the host is set to backup the
NPV is equal with the investment cost of the host, else the NPV is calculated, where the
period is month, total number of periods is the simulation interval in month, monthly
cash flow is equal with operational costs, and opportunity cost is given by the user.
There were also created setter and getter methods for my own attributes like
timeCreated, timeDestroyed, operationalCostMonth, investmentCost, backup.
setDatacenter() function is extended with setting the name of the data center.
3.6.1.5 VM.java
VM Java class represents a VM: it runs inside a host and processes cloudlets.
Only one change was applied in this class: creating reloadTimeMin attribute which is
used in the data center broker to simulate the reload time of an image like in VCL.
61
3.6.2 New Java classes
The following classes were created by me in order to map VCL functionalities:
CloudSim_jar.java
DcHoUtil.java
DCinput.java
Image.java
Szolgbiz.java
Workload.java
3.6.2.1 CloudSim_jar.java
CloudSim_jar Java class contains the main() static function. Beside I have
created some other static methods to support the VCL mapping.
In order to start the Java application function main() is called and that method is
responsible for using the appropriate Java classes and calling methods during the
simulation. First, the input Excel file is read by this method. After that, the brokers are
created by calling createNewBroker_MN() with the input file parameters. Hosts are also
created by createHost_MN() method as the input file defines the details of them. The
data centers are created the same way as the brokers but createDatacenter_MN() is
called to do that. After reading the input file the workbook is closed and the simulation
can be started calling startSimulation(). The simulation’s time depends on the time to be
simulated and on the number of total requests, so it can vary minutes to hours. Before
generating the output files service performance metrics are calculated in method
calcSzolgaltatasbiztonsag_MN() and host utilization percentages in
calcHostsUtil_MN(). Having the result of the simulation the output files are written out
in method printDatacenterCostsAndUtils_MN() and in method printSzolgbizList_MN().
Short summary about the output files:
Request*.txt: Stores the submitted requests.
Summary*.xls: Summarizes the costs of data centers’ and some other
useful information about wait times and served requests.
ServicePerformance*.txt: Contains the list of submitted requests and the
simulation parameters such as the simulated start time of the request or
the simulated wait time.
Java method createNewWorkloads_MN() is used for multiplying loads for a
specific lecture. It means that the user gives different metrics for a lecture like mean and
62
standard deviation of average reservation length, and based on these input parameters
the given number of similar lectures is created in such a way that only the starting time
parameters are modified by shifting them between the time frame of the semester
(autumn or spring semester) using uniform distribution. Using this solution user does
not have to type thousands of future reservations to simulate similar lectures’ requests.
Method calcHostsUtil_MN() is used to calculate the simulated hosts’ CPU and
RAM utilization during the simulated timeframe. It goes through the simulated
cloudlets and counts each cloudlet’s running time for its host. After that the running
times are summed for each host and the total cloudlets’ running time is divided by the
total running time of the host. In this way every hosts have its own utilization value.
Because I was interested in data center utilization I calculated the average of hosts’
utilization value for a specific data center, so at the end I use data center CPU and RAM
utilization.
Outputting the end results of the simulation method
printDatacenterCostsAndUtils_MN() was created. Using this method “Summary*.xls”
document is outputted at that path where the user starts the simulation JAR file from.
For further analysis of system’s performance “ServicePerformance*.txt” filed is created
after simulation is finished. This file contains detailed information for each image
requests. The output printing is done in method printSzolgbizList_MN(). The content of
the output files will be explained later.
Making more easier and comfortable the host generation method
createHost_MN() was developed. Number of hosts and other parameters are given by
the user via the input Excel file. For correct simulation of VCL environment the built in,
simple provisioning policy was used for RAM and bandwidth provisioning, for VM
scheduling space-shared scheduling algorithm was used. More information about space-
shared algorithm is described in chapter 3.2.1.1. Additionally, the host’s investment cost
and monthly operational cost are set here. Method createDatacenter_MN() is used to
create CloudSim data center entity with the user given data center characteristics and
the built in simple VM allocation policy.
By default CloudSim does not deal with scheduled cloudlet processing. There is
only the zero time and all cloudlet requests are processed at the same time. If any of the
cloudlet requests cannot be served because of lack of compute resource than those
requests are destroyed and flagged with unsuccessful state. In order to support future
63
reservations as VCL does, I modified the broker creation process in CloudSim. This
modification is impersonated in method createNewBroker_MN(). The user determinates
the start time of the simulation and the start time of the image requests in input file. The
difference between the two dates is used as delay time to start the broker later. The
delay time calculation is done by the method calcDelay_calendarToMs_MN(). Another
missing feature of CloudSim that it does not deal with start and end time of a cloudlet
request. It defines only length of a cloudlet given as MI. Based on user’s input
parameters for image requests the cloudlet length could be calculated. This happens in
method calcCloudletLength_MN(). Calculating cloudlet length in this way was
necessary to not change CloudSim core source code fundamentally.
Prior to adding the costs of the hosts related to a data center one host cost has to
be defined. This is done by calling the method
getHostRunningCostMonthBasedNPV_MN() of hosts. At the end of the simulation user
needs only one cost value for the data center. Method getDatacenterHostsCost_MN()
sums the costs of hosts for a given data center using NPV calculation with user given
opportunity cost. This part is very important for the thesis because it is mandatory to see
whether building up an own private cloud data center or a public cloud service is worth
better as it can help for the decision making.
Similar to the data center cost I have created the method
getDatacenterCloudletsCost_MN() to calculate data center’s image running costs based
on the hourly cost of the images. This cost value means that if the images was executed
in a public, pay-as-you-go cloud service the user has to pay this value for those images.
Using fees of images are accounted monthly and the monthly image using costs are
discounted to zero time by the user given opportunity cost. If the user would like to
decide between the two projects (building own private cloud or using public cloud) the
project with the less NPV value must be chosen.
Performance of the cloud computing systems is also an important part of the
result. Method calcSzolgaltatasbiztonsag_MN() makes the report of the simulation for
each requests. It means it calculates the wait time for them and collects every
meaningful information about a request: request ID, user name, user ID, image name,
data center ID, data center name, host ID, VM ID, requested start, end and running time,
simulated actual start, end and running time, and wait time of a given request. These
information are stored in “ServicePerformance*.txt” file after the simulation is finished.
64
In method createVM_MN() the virtual machines are created. Every virtual
machine uses time-shared cloudlet scheduler built in CloudSim scheduler. Each virtual
machine has its own reload time which reload time is given by the user for each image.
CloudSim uses cloudlets to simulate requests. Cloudlets have ID, length in MI,
processing elements (PEs), file size, output size and three utilization parameters. IDs are
the same with VMs’ IDs. Length is calculated by calcCloudletLength_MN(). PEs
number is equal with image’s virtual CPU number. File and output size is for moving in
and out size in MB over the network. In VCL Remote Desktop Protocol (RDP) or SSH
is used therefore only little data is transferred over the network. Because they are below
1 GB per month and Amazon charges only above 1 GB/month traffic I did not count
with these costs. For utilization model the built in UtilizationModelFull() was used.
This all is done in createCloudlet_MN().
3.6.2.2 DcHoUtil.java
DcHoUtil Java class was created to store all relevant information about a host as
ID, data center name, CPU and RAM utilization during the life time of the host. Calling
setUtilization() the utilization is calculated for the host.
3.6.2.3 DCinput.java
DCinput Java class stores information (name, type (private or public), number of
hosts, monthly cost and investment cost) about a data center. These values are given by
the user at starting the simulation.
3.6.2.4 Image.java
Image Java class represents an Apache VCL image entity in the simulation
environment. Attributes are the following: name, input and output size, storage and
RAM size, speed of the CPU cores, network speed, number of virtual CPUs, virtual
machine monitor name, hourly running cost of the image in public cloud and average
reload time of the image.
3.6.2.5 Szolgbiz.java
Java class called Szolgbiz is used to encapsulate the most important data of a
simulated reservation. The entity contains the following values related to a simulated
request: identifier of the cloudlet/request, user identifier of the broker/user, user name of
65
the broker/user, data center’s identifier, data center’s name, host’s identifier, virtual
machine’s identifier, name of the broker/user, name of the image requested, identifier of
that data center executed the image, name of the data center, identifier for the host
where the image was created, identifier of the virtual machine/request, requested start
time of the reservation, requested end time of the reservation, requested reservation
length, start time of the reservation after the simulation done, end time of the reservation
after the simulation done, reservation length after the simulation done, and wait time of
the user to get access to the requested image. To sum up the results of the simulation
these values are written out into the output text file.
3.6.2.6 Workload.java
Workload Java class is used to model a reservation’s basic information like
name of the broker, request’s start and end date, image name and number of the requests
(for block allocation).
3.6.3 Used Java libraries
Besides creating or modifying Java classes three libraries were added to the Java
project:
flanagan.jar
commons-math3-3.2.jar
jxl.jar
3.6.3.1 Flanagan.jar
Flanagan library is Michael Thomas Flanagan's Java Scientific Library which is
used by several times by CloudSim framework. [23]
3.6.3.2 Commons-math3-3.2.jar
Commons Math is a library of mathematics and statistics components addressing
the most common problems not available in the Java programming language or
Commons Lang. This library has to be added to the simulation project as CloudSim uses
some components from that. [24]
3.6.3.3 Jxl.jar
Java Excel API library is an open source Java API which was used to read the
Excel spreadsheet containing the input parameters and to generate dynamically the
66
simulation results into Excel spreadsheet. Java Excel API contains a mechanism which
allows Java applications to read in a spreadsheet, modify some cells and write out the
new spreadsheet. [25]
3.7 How to use the simulator
This part of my documentation aims to explain and share all needed information
how to use the simulation environment I have done in frame of the master thesis. First, I
go through the input files users to understand the basic inputs for the simulator. After
that the Java simulation application starting hints are coming. The output files are
covered at last.
3.7.1 Providing input parameters
To start the simulator an input Excel file is required. This Excel file must be in
Microsoft Excel Binary File Format (.XLS). The newer format, .XLSX format is not
supported by the Jxl.jar library. The format of the template Excel cannot be changed as
the simulator reads the cells in a predefined order. To protect the template Excel file’s
structure the input file is protected and only the cells with green and yellow background
color are allowed to modify by the user. The yellow color means that data validation is
set for those cells. The red ones cannot be modified because they are computed values
and if they are not calculated perfectly the simulator will not start. The worksheets’
names cannot be protected but they should not be changed because the simulator will
not work with other sheet names.
First of all the input worksheet has to be filled in. The simulation start date has
to be defined in the following format: yyyy.mm.dd HH:mm:ss as shown on Figure 3.11.
Figure 3.11 Simulator input parameter for simulation start date
Different images can be defined as in Apache VCL. One image consists of
different parameters as shown on Figure 3.12.
Then the data centers must be defined as shown on Figure 3.13. Investment cost
of data center is all capital expenditure (CapEx), while operational cost of data center is
all operating expenditure (OpEx) per month.
1
A B
Simulation start date: 2013.09.01 00:00:00
67
Figure 3.12 Simulator input parameters for images
Figure 3.13 Simulator input parameters for physical host servers
After having different number of data centers the physical hosts should be given
by the user as shown on Figure 3.14. Backup host flag means that only the CapEx of the
host is calculated to the data center result and discounted. It is allowed to add more
hosts to the same data center. For simulating infinite data center the physical hosts have
to be created using huge numbers (for example maximum value of integer) for CPU
cores and RAM parameters.
Figure 3.14 Simulator input parameters for data centers
Calculating the opportunity cost to employ NPV theory for evaluating private or
public cloud strategy, risk free rate, beta and expected market return have to be given as
shown on Figure 3.15.
Figure 3.15 Simulator input parameters for opportunity cost calculation
Service performance (SLA) parameter for the simulation is the accepted
maximum response time for a request. This value can be defined in minutes as shown
on Figure 3.16.
12
13
14
15
16
17
18
A B C
Name of the image: BSc AUTUMN MSc 1 AUTUMN
Size of input files to be moved via network, MB: 50 50
Size of output files to be moved via network, MB: 50 50
RAM size, MB: 3072 1024
Number of vCPUs: 2 1
Hourly cost executing this image in public datecenter, $: 0,35 0,12
Image reload time, min: 8 4
21
22
23
24
25
A B C
Name of datacenter: Private Amazon
Type: PRIVATE PUBLIC
Number of hosts: 9 1
Investment cost of datacenter, $: 34666,67 0,00
Monthly cost of datacenter, $: 1580,00 0,00
28
29
30
31
A B C D
Name of datacenter: Private Private Amazon
Number of CPU cores: 8 8 5000
Size of RAM, MB: 32768 32768 2147483647
Backup host: FALSE FALSE FALSE
34
35
36
A B
Risk free rate of intereset, %: 2,00%
ß - beta of the investment: 1,50
Expected return of the market, %: 8,00%
68
Figure 3.16 Simulator input parameter for service performance parameter
Generating high number of request loads for lectures is very easy with the
request generation algorithm which is already described in chapter 3.5.3. The algorithm
requires some input parameters which can be filled in using the input worksheet of
Excel input file as depicted on Figure 3.18. Firstly, the name of the lecture, number of
students, semester and number of similar lectures has to be defined. After that, up to 5
peaks can be used to model the loads of a lecture during a semester. For each peak
different peak dates and images can be added. The load generator uses normal
distribution to generate future requests with peak date as mean and hourly defined
standard deviation. For getting the reservation lengths also normal distribution was used
with average reservation length as mean and hourly stander deviation of average
reservation length. The rate between students and requested images value of a peak is a
multiplier which defines how many images should be created for specific peak if a
given number of students took that lecture.
If the user does not want the simulator to generate requests randomly requests
can be add one by one on worksheet workload. Identifier of the request has to contain
the name of the image with three or more “_” delimiters. Requested start dates must be
greater than the simulation start date and the images had to be already added on input
worksheet as shown on Figure 3.17. For VCL block allocation the values in the last
column should be modified.
Figure 3.17 Simulator input parameters for one by one request defining
It is also possible to provide requests not only in Excel format but in text file.
This feature is useful when the user would like to simulate more requests than the
number of rows (65 536) handled by Excel .XLS. In this case, the text file should have
the very same structure like the workload Excel sheet but using tabulator delimiters and
leaving the first label row out.
39
A B
Accepted maximum response time, min: 10
1
2
3
4
5
6
A B C D E
Request ID/Broker name Requested start date Requested end date Name of the image Number of images requested (for block allocation)
BSc_autumn_2013_1 2013.09.18 15:54:23 2013.09.18 16:01:59 BSc AUTUMN 1
MSc_autumn_2013_1 2013.09.18 19:03:56 2013.09.18 19:09:40 MSc 1 AUTUMN 1
MSc_autumn_2013_2 2013.09.18 19:25:29 2013.09.19 03:29:57 MSc 2 AUTUMN 1
MSc_autumn_2013_3 2013.09.19 17:04:07 2013.09.19 18:14:45 MSc 1 AUTUMN 1
MSc_autumn_2013_4 2013.09.19 17:04:07 2013.09.19 18:14:45 MSc 2 AUTUMN 1
69
Figure 3.18 Simulator input parameters for random request generator
3.7.2 Starting the simulation
After the input Excel file is filled the simulation can be started. The CloudSim is
written in Java and my own codes also were written in Java. The Java project is built by
NetBeans IDE 7.4 with Java version 1.7.0_51.
The dist folder contains the external libraries (commons-math3-3.2.jar,
flanagan.jar and jxl.jar) and cloudSim_jar.jar Java archived application. The
application needs two mandatory and one optional arguments: 1. the name of the input
file; 2. the output files’ suffix to be able to differentiate the different simulations’ output
files; 3. the name of the input file of requests text file is optional. Application is looking
for the input file at that path where the application was executed from.
Example starting code on Windows platform via Command Prompt (CMD):
cd C:\Users\exampleUser\inputFileFolder java -jar C:\Users\exampleUser\dist\cloudSim_jar.jar inputFile1.xls suffix1 [inputRequests1.txt]
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
A B C
Name of the lecture: BSc_autumn_2013 MSc_autumn_2013
Number of students: 200 50
Semester: AUTUMN AUTUMN
Number of same lectures to be simulated: 15 25
Image used at peak 1: BSc AUTUMN MSc 1 AUTUMN
Rate between students and requested images by peak 1: 18,00 9,00
Mean of the requests' length in hour by peak 1: 2,00 2,50
Standard deviation of the requests' length in hour by peak 1: 1,00 1,50
Mean of peak 1, date: 2013.10.27 15:00:00 2013.09.27 19:00:00
Standard deviation of the requests' time in hour by peak 1: 120,00 96,00
Image used at peak 2: BSc AUTUMN MSc 1 AUTUMN
Rate between students and requested images by peak 2: 11,00 36,00
Mean of the requests' length in hour by peak 2: 2,00 2,50
Standard deviation of the requests' length in hour by peak 2: 1,00 1,50
Mean of peak 2, date: 2013.11.17 17:00:00 2013.10.17 19:00:00
Standard deviation of the requests' time in hour by peak 2: 120,00 108,00
Image used at peak 3: BSc AUTUMN MSc 2 AUTUMN
Rate between students and requested images by peak 3: 1,50 2,50
Mean of the requests' length in hour by peak 3: 2,00 2,80
Standard deviation of the requests' length in hour by peak 3: 1,00 1,00
Mean of peak 3, date: 2013.12.16 17:00:00 2013.11.12 19:00:00
Standard deviation of the requests' time in hour by peak 3: 24,00 48,00
Image used at peak 4: MSc 2 AUTUMN
Rate between students and requested images by peak 4: 2,00
Mean of the requests' length in hour by peak 4: 2,80
Standard deviation of the requests' length in hour by peak 4: 1,00
Mean of peak 4, date: 2013.11.27 19:00:00
Standard deviation of the requests' time in hour by peak 4: 36,00
70
Example starting code on Linux platform via Terminal:
cd /home/exampleUser java -jar /home/exampleUser/dist/cloudSim_jar.jar inputFile1.xls suffix1 [inputRequests1.txt]
The simulator logs all important steps to the terminal during the simulation.
3.7.3 Description of the output files
After the simulation terminates, three output files are created (into that folder
where the application was executed from) by the application: the requests and service
performance text files and the summary Excel file. Name of the files are the following
where [suffix] is given by the user, [YYYYMMDD] date and [HHMMSS] time are
taken by the application at the start of the simulation:
Requests_[suffix]-[YYYYMMDD]_[HHMMSS].txt
ServicePerformance_[suffix]-[YYYYMMDD]_[HHMMSS].txt
Summary_[suffix]-[YYYYMMDD]_[HHMMSS].xls
Requests file contains the inputted requests in table form as it is shown in Table
3.1.
Request ID Request name Image name Requested start
time
Requested end
time
5 BSc_autumn_2013_0_0#0 BSc autumn
2013
2013.11.02
08:28:14
2013.11.02
11:22:55
Table 3.1 Output file for input requests
Service performance has high number of parameters which is also outputted in
table form. This file is tab delimited so it can be easily read for further analysis or
reporting. An example output is shown in Table 3.2.
71
Cloudlet
ID
Broker
ID
Broker
name
Image
name
Datacenter
ID
Datacenter
name
Host
ID
VM
ID
Requested
start time
Requested
end time
Requested
running
time (hour)
Actual start
time
Actual
end time
Actual
running
time
(hour)
Waiting
time
(min)
500 5 BSc_autumn
_2013_0_0#0
BSc
AUTU
MN
3 Private 301 500 2013.11.02
08:28:14
2013.11.02
11:22:55 2.91
2013.11.02
08:36:14
2013.11.02
11:30:54 2.91 8
Table 3.2 Output file for simulated requests
The summary Excel output file contains business related information and a short overview about the simulation. The following values are
created after simulations. Example output content is shown in Table 3.3.
Datacenter
name
Cost of
datacenter
with NPV
calculation,
$
Cost of
executing
images in
datacenter
with NPV
calculation, $
Average
CPU
utilization of
datacenter,
%
Average
memory
utilization of
datacenter,
%
Number of
requests
served in
datacenter
Number
of
requests
Number
of
requests
served
within the
normal
response
time
Number of
requests
not served
within the
normal
response
time
Total
average
waiting
time
(min)
Average
waiting
time of
normal
served
requests
(min)
Average
waiting
time of not
normal
served
requests
(min)
Rate of
not
normally
served
requests
(%)
Private 102 735.33 90 442.23 56.42 19.03 138 519 153 338 153 338 0 6 6 0 0
Amazon 0.00 12 615.65 - - 14 819 - - - - - - -
Table 3.3 Output file for simulation overview
72
4 Case study: a hypothetical implementation project
The case study would like to demonstrate the usage and capability of the
extended and improved CloudSim simulator. The task is to get an optimal sized Apache
VCL cloud computing environment at a hypothetical Hungarian university in order to
serve an estimated system load assumed that all kind of cloud computing types can be
used: private, public and hybrid as well. Decision making is done by regarding that each
cloud infrastructure setup is a business project. During the simulation the NPV is
calculated for each project. The easiest decision would be to choose the project with the
smallest NPV value. Unfortunately, it will be not 100% right because some other
parameters should be also taken into account like utilization of system, average user
wait time or service performance indicator. To sum up, the decision making is more
complicated than comparing only NPV values of the projects.
4.1 Input parameters
To run the simulator the first step is to fill in the input Excel document. In this
section I summarize what types of input parameters were used for this case study.
Simulation start time was set to 2013.09.01 15:00:00 (yyyy.mm.dd HH:mm:ss)
as I assumed that the autumn semester starts at 1st of September.
4.1.1 Image types
Three different images were created for the case study:
BSc AUTUMN: 2 virtual CPU cores, 3072 MB RAM, Windows, 8
minutes reload time, semester autumn, $0.35 hourly cost of running this
image in Amazon public cloud. [26] This image has similar size to the
image of lecture System Modeling, autumn version
MSc 1 AUTUMN: 1 virtual CPU cores, 1024 MB RAM, Linux, 4
minutes reload time, semester autumn, $0.12 hourly cost of running this
image in Amazon public cloud. This image has similar size to the first
DMIS labor of lecture IT Engineering Laboratory 2.
MSc 2 AUTUMN: 2 virtual CPU cores, 3072 MB RAM, $0.12, Linux, 4
minutes reload time, semester autumn, $0.12 hourly cost of running this
image in Amazon public cloud. This image has similar size to the second
DMIS labor of lecture IT Engineering Laboratory 2.
73
4.1.2 Cloud computing infrastructures
Two types of clouds were defined to cover private, public and hybrid cloud
configurations:
Private: to simulate own cloud environment of the university; investment
and monthly operational costs are calculated automatically based on the
appropriate cost estimation algorithm described in chapter 4.1.3.
Amazon: to simulate public cloud environment; null values were used
for investment and operational costs, so the image specific instance
prices cover the public data center’s usage costs.
Depending on the specific simulation a given number of physical servers were
created in the private data center. Each server contained 8 CPU cores, 32 768 MB RAM
and false backup flag. For Amazon’s data center only one server was created with 5 000
CPU cores, 2 147 483 647 MB RAM and with false backup flag in order to simulate
endless public resource capacity.
4.1.3 CapEx and OpEx costs of private data center
Before starting the simulation the input Excel file has to be filled to provide the
basic input parameters for the Java cloud simulator. For this university hypothetical
implementation project I had to determine two input values related to university’s
hypothetical private cloud infrastructure: the one-time investment cost and the monthly
operational cost of the private cloud computing infrastructure at the university. Getting
the values is not an easy thing because there are numerous cost influencers. At the end
the cost were calculated based on experts estimate and on empirical information.
Empirical information are based on the current costs of the current private data center
which currently consists of much fewer physical servers than the private data center is
required in this hypothetical project.
Table 4.1 depicts the key influencers of the capital expenditure. The table also
contains that how many items/devices had to have for a working private data center.
After that the market price of the item is shown which can vary in other countries or for
other type of items/devices. The last column defines the full investment cost of a
specific item for the private cloud infrastructure. For instance, the first row contains the
investment cost of the air conditioner. Each item is calculated for a given number of
physical servers. In the table the devices are calculated for a 45 physical server private
cloud configuration.
74
Name of the item Number of the item for
one physical server
Price of the
item, USD
One-time cost of the
item for 45 physical
servers, USD
Air conditioner 0.03 3 555.56 7 111.11
Construction 0.01 1 777.78 1 777.78
IBM x3550 server 1.00 2 666.67 120 000.00
Network switch 0.05 444.44 1 333.33
Storage 0.1 2 666.67 13 333.33
UPS 0.1 1 777.78 8 888.89
Others 0.01 444.44 444.44
Sum 152 888.89
Table 4.1 Sample capital expenditure estimation for private data center with 45 servers
The operational expenditure looks similar as the capital expenditure, except that
the monthly upkeep of working private cloud environment was taken. For each item its
specific unit number is taken to calculate the required costs for the private data center.
In this example the data center has 45 same IBM x3550 servers. Table 4.2 shows the
estimation of the OpEx.
Name of the item
Number of the
item for one
physical server
Cost per
month, USD
Unit price,
USD
Monthly cost of the
item for 45 physical
servers, USD
Power for air
conditioning 0.03 289 8.75 577.78
Human resource 0.01 889 8.89 888.89
Data center 0.01 222 2.22 222.22
Power for physical
servers 1.00 20 20.00 900.00
Sum 2 588.89
Table 4.2 Sample operational expenditure estimation for private data center with 45 servers
In this example both CapEx and OpEx were estimated for private data center
with 45 physical servers, where CapEx of the 45 server implementation project is
$152 888.89 and monthly OpEx is $2 588.89, so these values are used in the input file
75
when I executed the simulation with 45 physical server private data center. For other
sized private data centers the last columns should be recalculated based on the other
parameters.
4.1.4 Opportunity cost for NPV
In the case study I used 11% as the opportunity cost of capital. That comes from
2% risk free rate of interest, 2.38 beta (Computers [CPR]) and 8% expected return of
the market. [27]
4.1.5 SLA parameter
Accepted maximum response time to serve users’ image requests was set to 17
minutes. If the request was served slower than 17 minutes the service performance
indicator of the systems is decreased.
4.1.6 Future image reservations
At BME DMIS the Apache VCL system was launched in 2013 therefore there
are historical statistics about the usage of laboratory images of lectures Systems
Modeling and IT Engineering Laboratory 2. Figure 4.1 shows the histogram of
reservations for lecture Systems Modeling.
Histogram for past reservations for the lecture IT Engineering Laboratory 2 is
shown on Figure 4.2.
In order to generate future requests the characteristics of the past reservations
were used. Generating such future requests the random load generation algorithm was
set with the parameters listed on Figure 4.3. Totally, 15 different bachelor lectures were
created with 200 students per lecture. For simulating master lectures, 25 different
lectures with 50 students per lecture were given.
In this way the modeled and simulated image reservations were depicted on
Figure 4.4. Total number of generated image requests for one autumn semester is
153 373.
76
Figure 4.1 Past reservations of lecture Systems Modeling [28]
Figure 4.2 Past reservations of lecture IT Engineering Laboratory 2 [28]
77
Figure 4.3 Parameters for modeling past reservations
Figure 4.4 Generated and simulated image reservations [28]
Name of the lecture: BSc_autumn_2013 MSc_autumn_2013
Number of students: 200 50
Semester (SPRING/AUTUMN) AUTUMN AUTUMN
Number of same lectures to be simulated: 15 25
Image used at peak 1: BSc AUTUMN MSc 1 AUTUMN
Rate between students and requested images by peak 1: 18,00 9,00
Mean of the requests' length in hour by peak 1: 2,00 2,50
Standard deviation of the requests' length in hour by peak 1: 1,00 1,50
Mean of peak 1, date: 2013.10.27 15:00:00 2013.09.27 19:00:00
Standard deviation of the requests' time in hour by peak 1: 120,00 96,00
Image used at peak 2: BSc AUTUMN MSc 1 AUTUMN
Rate between students and requested images by peak 2: 11,00 36,00
Mean of the requests' length in hour by peak 2: 2,00 2,50
Standard deviation of the requests' length in hour by peak 2: 1,00 1,50
Mean of peak 2, date: 2013.11.17 17:00:00 2013.10.17 19:00:00
Standard deviation of the requests' time in hour by peak 2: 120,00 108,00
Image used at peak 3: BSc AUTUMN MSc 2 AUTUMN
Rate between students and requested images by peak 3: 1,50 2,50
Mean of the requests' length in hour by peak 3: 2,00 2,80
Standard deviation of the requests' length in hour by peak 3: 1,00 1,00
Mean of peak 3, date: 2013.12.16 17:00:00 2013.11.12 19:00:00
Standard deviation of the requests' time in hour by peak 3: 24,00 48,00
Image used at peak 4: MSc 2 AUTUMN
Rate between students and requested images by peak 4: 2,00
Mean of the requests' length in hour by peak 4: 2,80
Standard deviation of the requests' length in hour by peak 4: 1,00
Mean of peak 4, date: 2013.11.27 19:00:00
Standard deviation of the requests' time in hour by peak 4: 36,00
78
4.2 Simulation results
Having the future image reservation requests the question is which cloud setup
fits best the requirements to host Apache VCL. In order to answer this question couples
of simulations have had to be done with two varying parameters: number of physical
servers in private data center and allowance of using Amazon EC2 public cloud. This
processes is really important because the own created part of the CloudSim simulator
must be used to evaluate cloud infrastructure setups, so this is a proof of concept of my
work.
The simulations are divided into two parts by the allowance of using public
cloud: first one is when only private data center can be used, second one when all of the
image requests are served by the physical hosts at the university. In this case only the
private data center upkeep cost and the one-time investment cost of the private
infrastructure are taken into account. Second one is that both private and public data
centers (hybrid cloud setup) can run image requests. In this case the bill contains the
private cloud’s investment cost and operational cost, and additionally the price of the
instances in Amazon EC2 based on how many images were requested how long they
were used. Of course there is one more case when all of the image requests are handled
in public cloud environment but this case is covered by the first (when only private data
center is allowed) simulation scenario as the summary output file contains this cost by
default.
One more remark to the simulation: only one semester was simulated using the
simulator. For truly evaluating an implementation project at least 3-year project length
should be taken into account. Hence, the results for the hypothetical 3-year project
length were estimated from the half year (one semester) simulation results. Results of
the real simulations are represented on Figure 4.5 and Figure 4.6, while the
corresponding approximations for the 3-year length projects on Figure 4.7 and Figure
4.8.
Horizontal axis shows the number of hosts used in private cloud. On the left
vertical axis the following values are shown with colored lines:
Aqua blue line shows the current average unit cost if the image requests.
This value is calculated as dividing the total NPV cost with the number
of the total requests – smaller value is better.
79
Orange line is used to show the average unit cost of the image requests if
all of the requests were executed in Amazon EC2 public cloud – smaller
is better.
Green line depicts the service performance indicator, namely how many
percent of all requests were served above the user defined SLA time
parameter – bigger is better.
Dark blue line visualizes the NPV of the projects for specific number of
physical servers in private cloud – smaller is better.
Pink line shows the percentage of the requests that executed in the
private cloud – depends.
Ginger yellow line is for the CPU utilization of the private cloud –
depends
Brown is for the RAM utilization of the private cloud – depends.
Figure 4.5 Simulated result of half year long hypothetical private cloud only implementation
project [28]
80
Figure 4.6 Simulated result of half year long hypothetical hybrid cloud implementation project [28]
Figure 4.7 Estimated results of 3-year long hypothetical private cloud only implementation project
[28]
81
Figure 4.8 Estimated results of 3-year long hypothetical hybrid cloud implementation project [28]
4.3 Finding the optimal cloud setup
After numerous simulations of different cloud setup cases getting the cost and
risk optimal cloud computing configuration for the predefined workload (153 373 image
requests) is the next step in the decision making process. Rational choice would be the
cloud setup with the lowest NPV. This statement is almost true except that not only the
project NPV must be taken into account but the service performance indicator or the
utilization of private cloud infrastructure too. The service performance indicator defines
how many requests could be served below the SLA time parameter which is important
for the system availability and for the quality of service. The utilization should be under
60%. If the system utilization is permanently above 60% than the risk for IT system
break down gets higher and higher. According to these facts Figure 4.7 and Figure 4.8
need to be thoroughly considered to get the optimal solution based on the own decision
criteria.
82
On Figure 4.7 private only and public only (implicitly) cloud setup is depicted.
At the first look it is obvious that using only public cloud is not sustainable in case of 3-
year project length as the unit cost of Amazon EC2 (orange line) is $0.67 which is
bigger than the unit cost for using only private cloud (aqua blue line). Decreasing the
number of physical servers the service performance (green line) is also decreasing as the
same happens with the average cost of a request and the project NPV. The system
utilization (CPU and RAM) is increasing if the servers’ number decreases.
On Figure 4.8 different hybrid cloud setups are shown where the size of private
cloud is changing. The optimal hybrid cloud configuration has the lowest unit cost. This
optimal setup is reached when 38 physical servers were used in the private cloud. In this
case the unit cost is $0.2357 and total cost of ownership for 3 years is $216 935. CPU
and RAM utilization is 49.91% and 17.02%, while 98.18% of the requests are executed
in the private cloud. Of course the service performance indicator is 100%. That means
all of the requests are served within the predefined 17 minutes SLA parameter.
Choosing between private cloud only and hybrid cloud setups is up to the
decision makers. The private clouds with 40 physical hosts or less have lower unit costs
($0.2275 or less) than the optimal hybrid cloud setup with 38 private hosts ($0.2357),
but the service performance is worse (96.02%) than it would be in case of the optimal
hybrid cloud solution (100%). The system utilization also needs to be checked case by
case because it will be bigger if the number of servers is decreased.
83
5 Summary
Cloud computing service delivery model plays a very key role in IT services of
most business and non-business oriented companies around the world. Using cloud,
universities also could take advantage of cost reduction, flexibility, performance, and
the like. VDI takes lots of advantage in a university environment. Lecturers and students
do not have to take part of the labs personally only because of getting access to the
physical resources; having virtualized laboratories seems better solution. VDI has the
ability to stop the dependency on lab premises and on times. It makes the laboratories
more flexible as well. For this purpose BME DMIS launched its own Apache VCL
infrastructure in 2013.
In this master thesis the main task was to model the cost and risk factors of
Apache VCL in order to make cost, risk and service performance optimizing researches
on hybrid VCL cloud setups. Using the created model a hypothetical Apache VCL
implementation project at a hypothetical Hungarian university had to be optimized
based on some predefined criteria for a specific load profile.
At the beginning I gave a brief overview about the open source Apache VCL
and about the Apache VCL used by BME DMIS. To get familiar with Apache VCL the
processes of computer allocations of Apache VCL were checked. There were three main
types of allocations: the normal reservation by a user/student, the block allocation by a
user/teacher, and the predictive loading modules. These processes contained lots of
information which were not in the scope; therefore I created a filtered, simplified model
of Apache VCL to simulate only the relevant parts.
After having the appropriate information about Apache VCL and the simplified
model a cloud simulator had to be chosen to truly model and simulate Apache VCL
including the handling of reservation requests and the slot management. The simulator
had to be able to simulate hybrid cloud computing infrastructures too. Using simulator
was a must because simulation is a very fast way to have the necessary results.
Eventually the open source CloudSim cloud simulator was chosen. CloudSim could not
be used to simulate Apache VCL without changes in the source code. The modification
of CloudSim source code was the biggest task, bigger than understanding the working
of Apache VCL. This modification contained four must have conceptual features which
84
were missing from CloudSim: cost, system utilization, future request and service
performance modeling.
Finally, the relevance and the usability of the extended CloudSim were
demonstrated through a hypothetical implementation project of Apache VCL at a
Hungarian university. During the case study I have proven that the created simulator can
be used to get cost optimal cloud setups for assumed future image requests. Decision
making was done by the unit cost and the service performance indicator of the cloud
infrastructure implementation projects.
The conclusion of the simulations using different private only, public only and
hybrid cloud setups for 3-year long projects is that there is an optimal hybrid cloud
configuration where the unit cost is minimal while the service performance indicator has
its maximum as it can be seen on Figure 5.1. Smaller unit cost can be only reached if
the quality of the service is decreased.
Figure 5.1 Results of different hybrid cloud configurations [28]
85
Bibliography
[1] Pierre Audoin Consultants: Growing Cloud Market Presents Huge Opportunities
for SITS Vendors; https://www.pac-
online.com/pac/pac/live/pac_world/global/press_corner/press_releases/index.html
?lenya.usecase=show-
rapport&document=pac_sitsi_reports/press_release/PR_Cloud_Feb13&xsl=press_
release (May 2013)
[2] Vmware: Virtual Desktop Infrastucture, page 1;
http://www.vmware.com/pdf/virtual_desktop_infrastructure_wp.pdf (May 2013)
[3] Apache VCL: Homepage of Apache VCL; http://vcl.apache.org (March 2013)
[4] Imre Kocsis: Oktatás felhőben, slide 15;
http://www.slideshare.net/ImreKocsis1/oktats-apache-vcl-felhvel-tempus-
felsoktatsi-mhely (May 2014)
[5] Apache VCL: Authentication of Apache VCL;
https://cwiki.apache.org/confluence/display/VCL/VCL+2.3+Configure+Frontend
+Authentication (March 2013)
[6] Shibboleth: Homepage of Shibboleth; http://shibboleth.net/ (May 2013)
[7] Apache VCL: User privileges of Apache VCL; https://cwiki.apache.org/VCL/for-
vcl-users.html#ForVCLUsers-Privileges (March 2013)
[8] Apache VCL: Apache VCL documentation;
https://cwiki.apache.org/confluence/display/VCL/Apache+VCL (March 2013)
[9] Apache VCL: Apache VCL architecture; https://cwiki.apache.org/VCL/vcl-
architecture.html (April 2013)
[10] Apache VCL: Apache VCL XMLRPC API;
http://people.apache.org/~jfthomps/vcl_xmlrpc_api.html (May 2013)
[11] Apache VCL: Database schema of Apache VCL;
https://cwiki.apache.org/confluence/display/VCL/Database+Schema (April 2013)
[12] Apache VCL: Network layouts of Apache VCL;
https://cwiki.apache.org/VCL/network-layout.html (April 2013)
[13] Apache VCL: Resources, Groups & Privileges of Apache VCL;
https://cwiki.apache.org/VCL/resources-groups-privileges.html (April 2013)
[14] Imre Kocsis and Áron Tóth: Oktatás felhőben, slide 21;
http://www.slideshare.net/ImreKocsis1/oktats-apache-vcl-felhvel-tempus-
felsoktatsi-mhely (March 2014)
86
[15] Dumitrescu, Catalin L., and Ian Foster. "GangSim: a simulator for grid scheduling
studies." Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International
Symposium on. Vol. 2. IEEE, 2005.
[16] Casanova, Henri. "Simgrid: A toolkit for the simulation of application
scheduling." Cluster Computing and the Grid, 2001. Proceedings. First
IEEE/ACM International Symposium on. IEEE, 2001.
[17] Núñez, Alberto, et al. "iCanCloud: A flexible and scalable cloud infrastructure
simulator." Journal of Grid Computing 10.1 (2012): 185-209.
[18] Buyya, Rajkumar, and Manzur Murshed. "Gridsim: A toolkit for the modeling and
simulation of distributed resource management and scheduling for grid
computing." Concurrency and computation: practice and experience 14.13‐15
(2002): 1175-1220.
[19] CloudSim: Web site of CloudSim; http://www.cloudbus.org/cloudsim (March
2014)
[20] Calheiros, Rodrigo N., et al. "CloudSim: a toolkit for modeling and simulation of
cloud computing environments and evaluation of resource provisioning
algorithms." Software: Practice and Experience 41.1 (2011): 23-50.
[21] Investopedia: Net Present Value – NPV;
http://www.investopedia.com/terms/n/npv.asp (May 2014)
[22] Norbert Madarász: Modified source code of CloudSim;
https://www.dropbox.com/sh/qbmckbhpk69015n/AADsNB1GEl7OvVRBuNf-
hXRba (May 2014)
[23] Michael Thomas: Flanagan's Java Scientific Library;
http://www.ee.ucl.ac.uk/~mflanaga/java (April 2014)
[24] Apache Commons Math: Project web site;
http://commons.apache.org/proper/commons-math/index.html (April 2014)
[25] Java Excel API: Homapage; http://www.andykhan.com/jexcelapi/index.html
(April 2014)
[26] Amazon: Amazon EC2 Pricing; https://aws.amazon.com/ec2/pricing (April 2014)
[27] Dr. Andor, György and Dr. Tóth, Tamás (2010). Vállalati pénzügyek I. oktatási
segédanyag, page 148.
[28] Madarász, Norbert: Tableau figures online;
http://public.tableausoftware.com/profile/fulmi (May 2014)
87
Abbreviation list
API Application Programming Interface
BME Budapest University of Technology and Economics
BRITE Boston university Representative Internet Topology Generator
CAN Campus Area Network
CaPex Capital Expenditure
CIS Cloud Information Service
CLOUDS Cloud Computing and Distributed Systems Laboratory
CMD Command Prompt
CPU Central Processing Unit
CRM Customer Relationship Management
DaaS Desktop-as-a-Service
DMIS Department of Measurement and Information Systems
EC2 Amazon Elastic Compute Cloud
FCFS First-Come-First-Serve
GB Gigabyte
GPL General Public License
HTTP Hypertext Transfer Protocol
IaaS Infrastructure-as-a-Service
IT Information Technology
LAN Local Area Network
LDAP Lightweight Directory Access Protocol
MB Megabyte
Mbps Megabits per second
MI Million Instructions
MIPS Million Instructions Per Second
NAS Network Attached Storage
NFS Network File System
NPV Net Present Value
OpEx Operational Expenditure
PaaS Platform-as-a-Service
PE Processing Element
QoS Quality of Service
88
RAM Random Access memory
RDBMS Relational Database Management System
RDP Remote Desktop Protocol
SaaS Software-as-a-Service
SAN Storage Area Network
SAS Serial Attached SCSI
SCSI Small Computer System Interface
SLA Service Level Agreement
SSH Secure Shell
SSL Secure Sockets Layer
TB Terabyte
TCO Total Cost of Ownership
UI User Interface
VCL Apache Virtual Computing Lab
VDC Virtual Data Centre
VDI Virtual Desktop Infrastructure
VM Virtual Machine
xCAT Extreme Cloud Administration Toolkit
XLS Microsoft Excel Binary File Format