ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
Transcript of ARCHITECTING TENANT BASED QOS IN MULTI-TENANT CLOUD PLATFORMS
ARCHITECTING TENANT BASED
QOS IN MULTI-TENANT CLOUD
- Arun prasath S
Table of Contents
Problem statement 4
Solution overview 5
Typical implementation in production environment 6
Linux Containers 8
Components of LXC 8
Control group (cgroup) 9
blkio Subsystem 10
CPU Subsystem 10
Memory Subsystem 10
Python Controller 11
Other options 13
Achieving QOS in a multi-tenant cloud platforms is still a difficult task and many
companies follow different approaches to solve this problem. Here in this document I tried
architecting a simple solution for achieving different QOS for different tenants in a Multi-tenant
cloud environment based on my experiments.
Openstack steps into platform as service by introducing a new component ‘Trove’
Database as Service (DBaaS) offering in its upcoming Icehouse release. But Openstack announced
that Trove will be operating as Single tenant service (Which means, for each Database instance,
a new VM will be created). This is a costly affair for cloud service providers and also resources
may not be used efficiently in this scenario.
Many big cloud service providers like Google and Amazon provides options for the same
DBaaS as a multi-tenant service. In this case, many instances of the DB will run in a single virtual
machine. This reduces the cost of running extra virtual machines.
But it also have few problems like QOS, security and isolation. The QOS factors are CPU,
Memory, IOPS (Input/Output Operations per second) etc.
More than one DB instance will be running in a single machine. In worst case scenarios
one DB Instance may end up eating large amount of resources which greatly affects other DB
Instances. We need to guarantee the QOS as mentioned in the SLA.
Since more than one DB Instance will be running in a single machine, we have some
security considerations. When one customer’s database gets affected it must not affect other
Also consumers in Single tenant are charged based on their usages like number of IO, total
space, CPU, memory etc. But when it comes to multi-tenant it’s hard to estimate the usage as
more than one DB instance will be running in a single virtual machine.
In another perspective, the existing solution of creating each VM for each customer has a
drawback of running separate operating system for each customer. This separate operating
system is an extra load for the service provider as it need a lot of data space and memory.
In my proposed solution, I used Linux containers running inside a virtual machines for
isolation DB Instances. Each database instance will be running inside a Container. Therefore we
can achieve true isolation, resources can be controlled and metering is also quite easy. With
cgroup feature we can control the IOPS of the container and thereby we can offer different
service level (IOPS) for different tenants.
Existing offering (In Openstack Trove) Proposed solution
Goals Tenant based QOS
True resource isolation for DB Instances
Typical implementation in production environment
This is a typical implementation in the production environment. The user requests for a
database instance using the dashboard. Once when the request is initiated, the Python controller
gets the resources specified for a particular flavor in Nova and then consolidates the existing
If there is space available in the any virtual machines, the container is created there. Or
else a new nova virtual machine is created and then the container is created in that virtual
machine with the user specified parameters (CPU, RAM and IOPS).
Each time a virtual machine is created, it is discovered by puppet and the container
software (LXC or Docker) is installed. Each time a container is created, MySQL is installed.
After the creation and provisioning of the containers the users are provided the access to
the database. (IP Address, MySQL username and password).
The above is a modular approach in provisioning server. However for smaller companies
the architecture can be simplified by using pre-built vagrant or golden images.
The following is the brief of all the components mentioned above.
Linux containers provide light weight operating system level virtualization which isolates
processes and resources in a simpler way compared to full-scale virtual machines. LXC works in
the way similar to virtualization but with the difference that it don’t need separate kernel
instance. It allows us to create many number of sand box environment which is completely
isolated from the host and other containers.
Components of LXC Namespaces – Used to provide process isolation
cgroups – Used to control System management and resource control
SELinux – Ensures isolation between host and the container and also Individual containers
Libvirt- Tool box to manage containers
Since QOS is our primary objective, we are going to focus more on control groups.
Control group (cgroup)
Control group is a kernel feature to limit the resources like CPU, System memory and
network bandwidth among the user-defined groups of tasks.
For example, we can limit a MySQL instance from using all memory. In the same way we
can guarantee that the MySQL instance gets the specified resource.
In this architecture, I am using cgroup feature on Linux containers to isolate DB Instances
and guarantee the minimum QOS for the customer.
Limits for a particular container is defined in the containers configuration file. Hence we
can allocate different resources for different containers based on customer requirements.
In our scenario the containers will be running as process and the processes inside the
containers will be running as the sub process.
Subsystems Subsystems are kernel modules that are aware of cgroups. They are resource controllers
that allocate varying level of system resources to different cgroups. The following are the
subsystems of cgroup.
blkio Subsystem The Block I/O subsystem controls and monitors access to I/O on block devices by tasks in
cgroups. It offers features like proportional weight division and I/O throttling (Upper limit).
blkio.throttle.read_iops_device - specifies the upper limit on the number of read operations a
device can perform
blkio.throttle.read_bps_device - specifies the upper limit on the number of read operations a
device can perform
blkio.throttle.write_bps_device - specifies the upper limit on the number of write operations a
device can perform
CPU Subsystem The cpu subsystem schedules CPU access to cgroups.
cpu.shares - contains an integer value that specifies a relative share of CPU time available to the
tasks in a cgroup
cpu.rt_period_us - specifies a period of time in microseconds (µs, represented here as "us") for
how regularly a cgroup's access to CPU resource should be reallocated
Memory Subsystem The memory subsystem generates automatic reports on memory resources used by the tasks in
a cgroup, and sets limits on memory use by those tasks
memory.usage_in_bytes - reports the total current memory usage by processes in the cgroup (in
memory.max_usage_in_bytes - reports the maximum memory used by processes in the cgroup
memory.limit_in_bytes - sets the maximum amount of user memory
There are also various other subsystems like cpuacct, cpuset, devices, freezer etc. Those can be
used in our scenario for enhanced configurations.
In a fresh Openstack environment when a user requests an instance, a new VM is created.
But in our case we need to provision containers. Hence we need to modify the normal Openstack
One popular way to do this is via REST based API. Since I am a python guy, I am doing this
via Python APIs provided by Openstack.
All details of the containers created by the users is saved in the local MySQL database. In
this scenario, the user is shown a dashboard or a form for database provisioning. When the user
requests the instance, this python controller takes control. It gets the flavor details we used to
build nova VM by enquiring Openstack. Then it consolidates the containers provisioned by using
the local database. If it could not find any space, then a new nova VM is created using API calls
and then the process continues. If existing VM has necessary resource to provision a container,
then the container is created in that existing VM.
The following is a sample python code for creating an Instance.
Initially we can set the resource level options for any particular flavor.
nova-manage flavor set_key --name m1.small --key quota:disk_read_bytes_sec
nova-manage flavor set_key --name m1.small --key quota:disk_write_bytes_sec --value 10240000
Openstack can provide any number of machines based on demand. But to get all those
machines into production (Installing required softwares like LXC or Docker in our scenario), we
need some automation. There are various automation tools for change and configuration
management. In this scenario I used puppet.
Puppet can manage our servers. In a puppet environment, we describe the necessary
machine state in a declarative code. Puppet clients connects to the server and ensures that they
are in the state described by the manifest file in server.
In our scenario we will be defining manifests for installing LXC or Docker. Once after the
necessary container is installed we bring the container under control of puppet for software
Puppet manifest for MySQL is available in Github.
Docker is an open source developer-friendly abstraction layer on the top of Linux
containers (LXC). Docker gives a simple and meaningful layer to play with containers in a cloud
environment. By using Docker we can actually build containers, use it and make changes based
on our need, push our used containers to the Docker repository and pull any time and any
number of time for further usage. This means a lot in a PAAS market.
In a high level terminology, Docker can automate the deployment of applications as highly
portable, self-sufficient containers which are independent of hardware, language, framework,
packaging system and hosting provider.
Docker also provides drivers for Openstack which embeds with Nova and provides ability
to work with containers along with nova virtual machines. Since most Openstack production
environment need to instantiate various different operating systems, we can have a work around
and achieve our need. In this scenario we are going to run a Docker or LXC on the top of a virtual
Docker, along with puppet or chef can be very useful for the Platform as a Service
providers. They are very useful in automated provisioning of platforms required for developers,
in a very convenient and sophisticated way. Thus making operations team work much easier.
The above method is one way of creating multi-tenant cloud environment. But there are
many number of ways to achieve it using various other options.
Rackspace uses OpenVZ to build their cloud platforms. They uses OpenVZ to contain
their customers database and for resource isolation. OpenVZ has many advantages over LXC.
Resource allocation is made simple in OpenVZ. (i.e. Guaranteed RAM and Burstable RAM are
specified using simple commands ). Live migrations are quite easy in OpenVZ when compared
Oracle follows an interesting architecture in its DBaaS offering. They created a
customized ‘Container Database’. All the customer databases are in Pluggable database (PDB)
format and they can be plugged to the container database and can work on.
Thus the tenant based QOS feature is achieved in a multi-tenant cloud platform. I haven’t
mentioned about some other features and drawbacks like migration, scale up, high availability
etc. All those drawbacks can be rectified by having some workaround in the architecture.
1) Linux Plumbers Conference 2013, Rackspace session