University of Ljubljana
Faculty of Computer and Information Science
Naum Gjorgjeski
Complex event processing for
integration of Internet of Things
devices
BACHELOR’S THESIS
UNDERGRADUATE UNIVERSITY STUDY PROGRAMME
COMPUTER AND INFORMATION SCIENCE
Mentor: prof. dr. Matjaz B. Juric
Ljubljana, 2016
Univerza v Ljubljani
Fakulteta za racunalnistvo in informatiko
Naum Gjorgjeski
Kompleksna obdelava dogodkov za
integracijo naprav v Internetu stvari
DIPLOMSKO DELO
UNIVERZITETNI STUDIJSKI PROGRAM
PRVE STOPNJE
RACUNALNISTVO IN INFORMATIKA
Mentor: prof. dr. Matjaz B. Juric
Ljubljana, 2016
The text is formatted with the LATEXeditor.
Besedilo je oblikovano z urejevalnikom besedil LATEX.
Faculty of Computer and Information Science issues the following thesis:
Complex event processing for integration of Internet of Things devices
Subject of the thesis:
Analyze the Internet of Things field, identify applications and characteristics
of the things and challenges in IoT network architecture. Describe the inte-
gration of the Internet of Things and cloud computing, identify the effects of
integration and analyze scalability and elasticity implications. Describe the
microservice architecture and identify the impact of microservices on IoT
solutions with special emphasis on scaling. Identify the need for complex
event processing, analyze and describe key characteristics of complex event
processing. Develop an application, which follows the principles of microser-
vices architecture and demonstrates the integration of the Internet of Things
and cloud computing. Demonstrate how to achieve scalability and elasticity
of the application and implement real-time processing of IoT events.
Fakulteta za racunalnistvo in informatiko izdaja naslednjo nalogo:
Kompleksna obdelava dogodkov za integracijo naprav v Internetu stvari
Tematika naloge:
Analizirajte podrocje Interneta stvari, opredelite aplikacije, lastnosti stvari
in izzive v arhitekturi IoT omrezij. Opisite integracijo Interneta stvari in
racunalnistva v oblaku, opredelite posledice integracije in analizirajte imp-
likacije skalabilnosti in elasticnosti. Opisite mikrostoritveno arhitekturo in
identificirajte vpliv mikrostoritev na IoT resitvah s posebnim poudarkom
na skaliranje. Opredelite potrebo po kompleksni obdelavi dogodkov, anal-
izirajte in opisite kljucne lastnosti kompleksne obdelave dogodkov. Razvijte
aplikacijo, ki sledi nacelom mikrostoritvene arhitekture in demonstrira in-
tegracijo Interneta stvari in racunalnistva v oblaku. Demonstrirajte kako
doseci skalabilnost in elasticnost aplikacije in implementirajte procesiranje
IoT dogodkov v realnem casu.
I would like to thank my mentor prof. dr. Matjaz B. Juric for the guidance
and assistance I have received during the preparation of my Bachelor’s thesis.
I would like to express my deepest and biggest gratitude to my family for
being patient and supportive throughout my studies.
За моето семеjство.
Contents
Abstract
Povzetek
1 Introduction 1
2 Overview of the Internet of Things 5
2.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Characteristics of the ”things” . . . . . . . . . . . . . . . . . . 6
2.3 Challenges in IoT networks . . . . . . . . . . . . . . . . . . . . 7
3 Integration of the Internet of Things and cloud computing 11
3.1 Cloud requirements for IoT solutions . . . . . . . . . . . . . . 11
3.2 Types of clouds . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Cloud services . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Effects of the integration . . . . . . . . . . . . . . . . . . . . . 14
3.5 Scalability and elasticity . . . . . . . . . . . . . . . . . . . . . 15
4 Microservices 21
4.1 Definition of microservices . . . . . . . . . . . . . . . . . . . . 21
4.2 Comparison with monolithic architecture . . . . . . . . . . . . 22
4.3 Impact of microservices architecture on IoT solutions . . . . . 25
4.4 Scaling of microservices . . . . . . . . . . . . . . . . . . . . . . 29
5 Complex event processing 33
5.1 The need for CEP . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Evolution of information flow processing systems . . . . . . . . 34
5.3 Constructs and operators of CEP
languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 A practical example 43
6.1 Goals of the practical example . . . . . . . . . . . . . . . . . . 43
6.2 Structure of the practical example . . . . . . . . . . . . . . . . 43
6.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.4 Apache Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Esper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.6 Deployment and management of the microservices using Ku-
bernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7 Conclusion 79
Bibliography 82
List of abbreviations
Abbreviation Meaning
IoT Internet of Things
CEP Complex event processing
IIoT Industrial Internet of Things
IaaS Infrastructure as a Service
PaaS Platform as a Service
SaaS Software as a Service
REST Representational state transfer
SOAP Simple Object Access Protocol
RPC Remote procedure call
SLA Service-level agreement
DBMS Database management system
DSMS Data stream management system
IFP Information flow processing
JAX-RS Java API for RESTful Web Services
EPL Event Processing Language
CDI Contexts and Dependency Injection
CORS Cross-origin resource sharing
GUI Graphical user interface
QoS Quality of service
Abstract
Title: Complex event processing for integration of Internet of Things devices
Author: Naum Gjorgjeski
As a relatively new technology, the Internet of Things (IoT) faces many chal-
lenges. IoT networks are characterized by a big number of devices. Each of
the devices produces huge amount of events. Therefore, scalability is one of
the key requirements of IoT applications. Cloud computing could help us
achieve scalability by providing virtually unlimited resources. The microser-
vices architecture is becoming increasingly popular for cloud deployments
of applications. We often want to extract real-time information from the
events that are coming from IoT devices. It would be harder to infer useful
information from the enormous amount of raw events, if we store them in
a database. Complex event processing enables us to analyze the events as
the stream of events flows and to infer meaningful information from them
in real time. To demonstrate all of this in practice, we developed an IoT
application, which follows the principles of microservices architecture. It is
able to simulate events, consume them, do complex event processing and
display visualizations. In order to balance the load between the instances
and achieve scalability and elasticity, the microservice which is consuming
the events can be scaled up and scaled down.
Keywords: Internet of Things, cloud computing, microservices, complex
event processing, Esper, Kubernetes.
Povzetek
Naslov: Kompleksna obdelava dogodkov za integracijo naprav v Internetu
stvari
Avtor: Naum Gjorgjeski
Internet stvari (IoT) se kot relativno nova tehnologija sooca s stevilnimi iz-
zivi. Za IoT omrezja je znacilno, da jih sestavlja veliko stevilo naprav. Vsaka
od teh naprav generira ogromno kolicino dogodkov. Zato je skalabilnost ena
od kljucnih zahtev IoT aplikacij. Racunalnistvo v oblaku nam lahko pomaga
doseci skalabilnost tako, da nam zagotavlja virtualno neomejene kolicine vi-
rov. Arhitektura mikrostoritev postaja vse bolj popularna za namestitev
aplikacij v oblaku. Pogosto hocemo iz dogodkov, ki prihajajo iz IoT na-
prav, pridobiti informacije v realnem casu. Tezje bi bilo pridobiti uporabne
informacije iz enormne kolicine neobdelanih dogodkov, ce bi dogodke shra-
njevali v podatkovno bazo. Kompleksna obdelava dogodkov nam omogoca,
da analiziramo dogodke in iz njih pridobivamo uporabne informacije v re-
alnem casu. Da bi vse to demonstrirali, smo razvili IoT aplikacijo, ki sledi
nacelom mikrostoritvene arhitekture. Aplikacija lahko simulira dogodke, jih
sprejema, izvaja kompleksno obdelavo dogodkov in prikazuje vizualizacije.
Mikrostoritev, ki sprejema dogodke, lahko skaliramo navzgor in navzdol s ci-
ljem, da uravnotezimo obremenitev med instancami in dosezemo skalabilnost
in elasticnost.
Keywords: Internet stvari, racunalnistvo v oblaku, mikrostoritve, komple-
ksna obdelava dogodkov, Esper, Kubernetes.
Chapter 1
Introduction
Nowadays, we are surrounded by a wide range of electronic devices used for
collecting and generating data. These devices are usually interconnected and
form a network, which also includes information systems and sometimes other
entities too. That is a simplistic and minimal description of what we call the
Internet of Things (IoT) [18]. In a broader sense, the Internet of Things has
many meanings and by using that term we are essentially referring to one or
many of the following [45]:
• the underlying network connecting these devices (usually wireless tech-
nologies and the Internet);
• the diverse devices themselves (e.g. sensors, actuators, RFIDs);
• applications and services that enable processing of devices’ data.
In an Internet of Things network we have numerous devices, each gener-
ating a massive number of events. Event could be defined as anything that
happens, or is contemplated as happening [12]. Broadly, event could be an
incoming email, a financial trade, a tire puncture, an indication that a light
is turned on or off, etc. In IoT networks, an event could be and often is a
reading from a sensor.
Integration and analysis of the generated data are essential. Thereby, the
data could give us valuable insights, so we can reach conclusions and make
1
2 Naum Gjorgjeski
data-based decisions. One would immediately point to Big data, as we use
this term to denote large and complex quantities of data. The increases in
storage capacities enabled storage of huge amounts of data. At the same time,
advancements in computing power and artificial intelligence contributed for
more efficient analysis of the data. However, the data generated by IoT
networks is not only quantitatively big, but also fast, since it is being created
at enormous rates. Therefore, the term Fast data emerges. If we store the
data in order to process it later, we are missing precious opportunities to
react as the data is being generated. Consequently, not being able to act in
real-time causes the data to lose some of its value [13].
To process events in terms of Fast data, we use complex event processing
(CEP) techniques. Complex event processing is processing of events from
multiple sources in order to identify meaningful events and respond to them
as quickly as possible [4]. A simple example might be a building’s fire de-
tection system. We can place a set of sensors which measure temperature
across the whole building. The sensors send their measurements to an in-
formation system which might do simple pattern matching and check if few
recent measurements are much higher than the previous ones. We can use
more complex CEP techniques as well. If the system detects unusual increase
of temperature in a short amount of time, it raises an alert. The main task
of CEP is to transform a set of base events (e.g. measurements) into one or
more complex events carrying logical semantic content. The set consists of
a different number of base events, varying from a few to thousands of them.
Furthermore, the extracted complex events can be used to derive even more
complex events [4].
Complex event processing systems must enable high throughput of events
and on-the-fly event processing. In order to fulfill these demands, CEP sys-
tems must be extremely scalable. An increase in scalability can be achieved
by distributed systems. In the current era of cloud computing, cloud-based
systems are becoming a crucial architectural part and are a perfect fit for
manipulation of events generated by IoT networks. However, not all archi-
Bachelor’s thesis 3
tectures utilize the potential of cloud environments maximally. We shall see
why microservices architecture might be a better choice than the traditional
monolithic architecture for deployment of applications in cloud environments.
The structure of the thesis is as follows. In chapter 2 we review the
IoT and the challenges that arise in IoT networks. Then in chapter 3 we
describe the effects of the integration of IoT and cloud computing. We shall
see which issues of the IoT architecture are addressed by cloud computing,
with special consideration of scalability and elasticity. Chapter 4 defines the
microservices architecture. We compare it with the monolithic architecture
and explain why the microservices architecture is a better fit for deployment
in cloud environments. As scaling is a key requirement in IoT solutions, the
strategies of scaling are also discussed. Chapter 5 discusses the need for CEP
systems, their evolution and the most common constructs and operators of
CEP languages. In chapter 6 we develop a practical example of a scalable
and elastic IoT application, which is able to produce and consume events,
do complex event processing and display visualizations. Chapter 7 provides
a conclusion of the thesis.
4 Naum Gjorgjeski
Chapter 2
Overview of the Internet of
Things
To feel the added value brought by the Internet of Things, it would be useful
to look at some applications first. Then we will go through the challenges
we face in IoT networks. Potential solutions of some of the challenges are
also briefly reviewed. Since the IoT paradigm has grown recently, in practice
most of the solutions are either newly developed and still under research
or are enhancements and adaptations of existing solutions from other more
established technologies.
2.1 Applications
Internet of Things finds useful scenarios everywhere around us. Some ex-
amples are: home automation, supply chain management, assisted living,
environmental monitoring, traffic monitoring. Therefore, it makes our lives
more comfortable and protects our environment. The industry could also
utilize the potential of the IoT. As the main objective of companies is to
maximize their profits, IoT could play an important role in helping them
reduce costs.
In most cases, when people think about applications that are made pos-
5
6 Naum Gjorgjeski
sible by the ubiquitous devices, they initially recall of home automation, i.e.
smart homes. For instance, in our homes we can install sensors, which sense
and measure some physical conditions and actuators, which act so that they
change those conditions. Then, through an application on our mobile phone
or laptop we can remotely control many parameters. We can check if we
have forgotten to turn off the light bulbs, command home appliances, use
temperature sensors to get data about air conditions, use actuators to adjust
air conditions, use motion sensors to detect unwanted visitors, etc.
Another important application of the Internet of Things are healthcare
systems. For example, sensors which monitor various vital signs can be es-
pecially helpful to older people. Sensors could sense if something undesirable
happens and alarm the responsible person.
Also, a subject of current broad interest is traffic monitoring. Many
developed cities struggling with traffic bottlenecks have already tried to solve
this problem by investing in sensors that monitor the amount of traffic and
information systems that process the data, so that the traffic flow is optimized
or drivers are informed about the current conditions through an application.
The IoT might be the next big thing in the industry as well. Industrial
Internet of Things (IIoT) is emerging as a segment of the IoT which is en-
gaged in the industry only. Businesses find creative ways to use the IoT
and consequently improve their performance. The IoT could help them re-
duce the costs and increase the productivity of their employees. Common
examples that have been adopted in practice are supply chain and energy
management. The IIoT also opens lots of possibilities for development of
new business models [18].
2.2 Characteristics of the ”things”
In order to be aware of the difficulties in IoT networks, one must know what
the ”things” actually are and which are their characteristics and limitations.
A ”thing” in an IoT network is an object which possesses most or all of the
Bachelor’s thesis 7
following functionalities [45]:
• It must represent a physical object;
• It must be uniquely identifiable;
• It must be able to communicate and interact with other ”things” or
other entities in the network. In order to be reachable in the network
and support communication a ”thing” must have an address;
• It must have at least minor computing capabilities;
• It may have an ability to sense and measure some physical conditions
(e.g. temperature, humidity) and act so that it changes those condi-
tions.
2.3 Challenges in IoT networks
At the moment, the IoT is gaining momentum and is becoming increasingly
popular. The estimates of analysts about the growth of the number of de-
vices varies, but they all agree it would continue to increase at very fast pace
in the upcoming period. We could credit its recent popularity and viability
to the advancements in wireless technologies, standardization of communi-
cation protocols, the lowering prices of devices and increases in storage and
computing power [37].
However, when we set up an architecture for an IoT network, we face
different challenges, varying from low level limitations with the wireless net-
work itself to obstacles on the semantic level. During the process of con-
struction of IoT architectures, we must be aware of these challenges and
choose optimal solutions which will best suit our needs. The challenges are
the following [45] [35]:
Devices heterogeneity In an IoT network we usually have plenty of dif-
ferent devices, which may use different protocols for communication.
8 Naum Gjorgjeski
Because the IoT emerged recently, there is still no standardization for
the communication between entities in an IoT network. Consequently,
similar devices made by different vendors often use different protocols.
They also have different computing capabilities, ranging from devices
only able to do some minor computing to powerful devices that can
do a lot more. Depending on the level of heterogeneity, enabling com-
munication in such network and integration of the data from different
devices may be a problem.
Scalability IoT networks often consist of an enormous number of devices,
which are producing an enormous amount of events, therefore expos-
ing scalability as a central issue. Cloud computing establishes itself as
an optimal solution, providing virtually unlimited resources. Achiev-
ing scalability is one of the key points in IoT networks and is further
discussed in the following chapters.
Wireless networking technologies In an IoT network the data is being
generated at high rates and the data flow is often constant. However,
the available spectrum (e.g. the ISM radio band) is limited and is being
used by other technologies at the same time. For that reason, further
research for solutions like cognitive radio and dynamic spectrum man-
agement is carried out. For instance, cognitive radio detects the most
suitable and currently unused frequencies (including reserved frequen-
cies - TV, FM radio etc.). We can use these frequencies for our own
needs as long as they are free. Still, we should be aware that the usage
of reserved frequencies is often regulated by local laws.
Energy efficiency If the devices in an IoT network operate on battery,
they have low energy capacity. Therefore, resource efficiency for the
devices is compulsory in order to minimize the consumption. Also,
new substitutes for battery like micro solar panels can mitigate this
problem.
Autonomy An important challenge imposed by the complexity and the
Bachelor’s thesis 9
size of IoT networks is to make the ”things” smarter, so that they
could be as autonomous as possible, be able to respond to different
situations and circumstances and work without or with minimal human
intervention.
Privacy and security Privacy and security are amongst the biggest obsta-
cles preventing the IoT from expanding even faster. As we said, the
devices often have limited computing capabilities. Energy is a scarce
resource as well. However, we know that security depends heavily on
energy and computing capabilities (because of encryption). As more
complex security schemes cannot be used, security is often neglected
by manufacturers. To ensure privacy in an Internet of Things net-
work, authentication and data integrity have to be addressed. Since
the collected data is sometimes personal, we should have control of
who collects the data, when and what data is being collected. Another
possible attack is eavesdropping. Since in some cases the ”things” are
left unsecured, it is also possible to break them down physically.
Semantic interoperability When we collect data from IoT networks, we
might do real-time processing (i.e. CEP) or store the data in a database
and process it later. We want to infer something from the data, i.e.
turn the raw data into useful information. Since the amount of data is
enormous, we must support automated reasoning. In order to do that,
the data must be in a standardized format. Furthermore, it is recom-
mended that the raw data is equipped with meta-data, i.e. additional
data carrying information about the content of the raw data.
10 Naum Gjorgjeski
Chapter 3
Integration of the Internet of
Things and cloud computing
Although the Internet of Things and cloud computing are technologies that
have been developing independently from one another, there are strong rea-
sons why we should integrate them. In the previous chapter we reviewed the
difficulties we face in an IoT network. Most of them arise because of the
huge number of devices and their limited capabilities.
Offering virtually unlimited storage and computing capabilities, cloud
computing could help us tackle many of them successfully. On the other
hand, IoT is a promising technology that could alter the Internet, shifting its
main focus from communication between user devices to machine-to-machine
communication (e.g. communication between a sensor and a node that does
data aggregation). Being an integral part of such revolution would further
expand the usage of the already widely adopted cloud computing [36].
3.1 Cloud requirements for IoT solutions
In order for the integration to be successful, the cloud is expected to meet
the following IoT application requirements [41]:
Device management When a device is initialized, it must be able to con-
11
12 Naum Gjorgjeski
nect to the cloud and identify itself by its unique identifier. That way,
we know that the device is active and we also know which data is pro-
vided by the device. The communication is not one-way, as when the
device is active, it is important for the device to be managed by the
cloud (e.g. provide software updates).
Data ingestion Each device in an IoT network generates immense amount
of data. When this is multiplied by the number of devices in IoT
networks, we get unprecedented data volumes. Scalability is one of the
most important sore points in IoT solutions and the cloud is expected
to handle such incoming amounts of data.
Transformation and storage When the messages arrive in the cloud, the
cloud must enable selection and transformation of the raw data into
useful information. The transformation is done on-the-fly to get real-
time information or later to get retrospective information. CEP is the
most important player in on-the-fly processing and is described in detail
in chapter 5. Cloud should be able to provide computing resources for
CEP. If we want to do further processing later, the huge amounts of
data must be stored. In that case, the cloud must ensure reliable data
storage.
Real-time notifications By using CEP techniques, the raw data is contin-
uously analyzed and complex events are produced. The IoT application
must be able to report the complex events in real-time in form of noti-
fications.
Visualization The cloud must serve as a host for visualizations of the IoT
network’s status.
3.2 Types of clouds
After the decision to use the cloud infrastructure, we can choose between a
private, public and hybrid cloud. Each one has some advantages and draw-
Bachelor’s thesis 13
backs.
A private cloud offers us better control of the privacy because we use our
own servers. However, we have to maintain the infrastructure by ourselves,
which is not an easy task. Private clouds are not intended to sell services to
external customers, but to have full control of the data. It is an expensive
option usually chosen by big companies or even countries and isn’t a viable
solution for the others.
If we decide to use a public cloud, we won’t bother with the infrastructure
management. Since the data is on a public cloud, the cloud provider ensures
limited access and employs many security mechanisms for isolation. If the
data generated by IoT devices is not confidential, public cloud is suitable for
storage of that data and deployment of our applications.
Hybrid cloud is a combination of the two approaches. There are various
reasons for usage of hybrid clouds - e.g. we might want to control confidential
data in private clouds, while the rest is on public clouds. Some use public
clouds to maintain scalability of their applications during peak periods or as
a prevention from natural disasters and electricity blackouts, securing their
application availability in case they have their data center in one location
only.
The dominant Pay-As-You-Go pricing in public clouds is favorable, as it
allows us to pay only for the resources and services we actually use. Then,
if we decide to offer our services or applications to consumers, the same
pricing model could be applied too, enabling them to pay depending on
which services or applications they use and how much they use them. For
instance, we might get access to sensors’ data, either by setting up our own
network of IoT devices or through payable services offered by sensors’ data
providers. We can use that data to make an application which does CEP and
pushes real-time notifications to a data-based visualization application. CEP
provides additional insights by extracting information from the raw data,
while the visualization application provides nice intuitive visualizations. We
can then offer the application on a subscription basis. Thereby, the pricing
14 Naum Gjorgjeski
model also fits perfectly in the plan for integration of cloud computing and
the IoT [30] [3].
3.3 Cloud services
Cloud environments offer the following resources as on-demand services [36]:
Storage Cloud offers storage and enables us to store the raw data from
sensors and the higher level data gathered from analysing the raw data
(e.g. by CEP). We practically get storage on-demand, anytime, as
much as we need it. As we saw, if we are using public clouds, the
payment amount is proportional to the amount of storage we use.
Computational resources It is impossible to use IoT devices for complex
computations because of their low computing capabilities. Therefore,
they must transfer their data to a node which is able to do proper com-
plex computations (e.g. sensors must transfer their measurements).
Cloud is a favorable solution as the data from the devices can be trans-
ferred in the cloud, where we have practically unlimited computational
resources. In the cloud, we can choose whether we want to process
the data in real-time (e.g. use CEP techniques) or store the data and
process it later. To sum up, computational resources can be used for
computations, analysis and visualization of the data we get from IoT
devices.
3.4 Effects of the integration
The integration of IoT and cloud computing solves or diminishes some of the
challenges in IoT networks we pointed out in section 2.3 and satisfies the
needs of IoT applications discussed in section 3.1.
As we saw, we can completely move computations to the cloud and max-
imize the energy efficiency of IoT devices. Their primary task would be to
transfer the data to the cloud [36].
Bachelor’s thesis 15
As illustrated in figure 3.1, the cloud serves as a mediator between the
”things” and the applications [36]. The ”things” are service-oriented and
offer their data through web services (e.g. REST, SOAP). In this way, the
low-level ”things” world is abstracted. We are able to develop applications
only by knowing what the low-level networks of ”things” offer through their
services. The abstraction eases the process of application development.
This results in a loosely coupled system where different individuals can
contribute by offering their Infrastructure, Platform or Software services in
the cloud (i.e. IaaS, PaaS, SaaS). Separate individuals or companies could
offer: storage and computational resources, sensors’ data, algorithms for data
processing, visualization tools, etc [43].
Many standard problems in distributed systems (e.g. virtual-machine
escape) have to be addressed. In case we are using public clouds, this is done
by providers. However, depending on the sensitivity of the data, the trust in
the service provider may be questionable [36]. If the data from IoT devices is
not confidential, this issue doesn’t outweigh the benefits from the integration
of IoT and cloud computing (by using public clouds).
3.5 Scalability and elasticity
Although scalability and elasticity might look like synonyms, they represent
related concepts with a few differences between them.
Scalability is the capability to withstand increasing workloads. Scalability
doesn’t take into consideration the amount of resources we use. Basically, if
an application is scalable by using 5 servers, it will still be scalable if we use
10 of them. As long as the application is able to deal with the workload at any
time, it is scalable. Speed, frequency and granularity of the scaling actions
are not taken into account either. In figure 3.2a we can see an example of
a scalable and non-elastic application. In figure 3.2b the application is not
scalable, as it cannot handle the peaks of increased workload. In order for
an application to be elastic, it must be scalable.
16 Naum Gjorgjeski
Figure 3.1: The cloud enables easier development of IoT applications and
satisfies most of their requirements [43].
Elasticity reflects the level of adaptivity to the current workload by dy-
namic provisioning and deprovisioning of resources, either manually or auto-
matically. The level of adaptivity depends on the precision, i.e. the standard
deviation of the provisioned resources compared to the current resource de-
mand. The precision is influenced by the speed of scaling of applications,
i.e. the time needed to get from underprovisioned to optimal or overpro-
visioned amount of resources or to get from overprovisioned to optimal or
underprovisioned amount of resources [44]. An example of elastic application
is illustrated in figure 3.2c.
As we have seen before, IoT networks are generating huge amounts of
data. Therefore, scalability needs special attention. For that purpose, cloud
computing offers on-demand services. In cloud environments we get dynami-
cally allocated storage and computational resources at run-time. The amount
of resources could be set either by us or dynamically depending on the load
and the maximum budget, if we have chosen one. For example, if the number
Bachelor’s thesis 17
of events produced by sensors is too high and the current instances are not
capable to process all of the events, additional instances could be deployed
manually or automatically. In this way, high Quality of Service (QoS) is
sustained [43].
Although IoT applications require scalability in the first instance, a scal-
able and non-elastic IoT application is an impractical solution. Scalable and
non-elastic application wastes resources unnecessarily. A waste of resources
results in a waste of money and lower performance, which consequently leads
to uncompetitive IoT solution. The demand of resources in an IoT appli-
cation might fluctuate due to: addition of new sensors, fail of a group of
sensors, group of sensors only operating during certain period of the day,
increased or decreased amount of users of the visualization application etc.
Thus, we want both a scalable and elastic IoT solution. Whether we
achieve that objective depends heavily on the choice of architecture. Not
all architecture approaches combined with cloud computing guarantee both
scalability and elasticity. Therefore, in the next chapter we will look at a
favorable and suitable architecture for IoT solutions — the microservices
architecture.
18 Naum Gjorgjeski
(a) The amount of resources is enough for the application to handle peaks and the
application maintains scalability. However, a lot of the resources remain unused
when the workload is low [34].
(b) If we decide to use lower amount of resources, it is not enough for the applica-
tion to handle peaks. Therefore, the application is not scalable [34].
Figure 3.2: Illustrations of the concepts of scalability and elasticity of an
application.
Bachelor’s thesis 19
(c) It is not possible to fully adapt to resource demands. Elastic applications try
to minimize the difference between the provisioned resources and the demand. As
deviation is minimized, QoS and resource efficiency are maximized [44].
Figure 3.2: Illustrations of the concepts of scalability and elasticity of an
application.
20 Naum Gjorgjeski
Chapter 4
Microservices
In order to get the most of the integration of IoT and cloud computing, the
use of microservices is advisable. This chapter provides answers to why this
is so.
4.1 Definition of microservices
Microservices are an architectural approach to developing applications as a
set of small services, where each service is running as a separate process,
communicating through simple mechanisms [29].
Most of the advantages of the microservices architecture stem from its
main feature — decomposition of a service or an application into smaller
components, i.e. microservices. The result of the decomposition should be
many independent units (i.e. components). Each of these components should
implement a specific functionality. As a result, we can develop, deploy, up-
grade and scale every microservice independently. Since not every microser-
vice has equal workload, each microservice is scaled separately. That enables
us to use optimal amount of resources and makes microservices architecture
a natural fit for achieving both scalability and elasticity. We control every
microservice separately. Since they are smaller, they are easily manageable.
Developing them separately enables us to use different technologies (e.g. dif-
21
22 Naum Gjorgjeski
ferent programming languages) for each microservice. When we want to
release an update for a part of our application or service, we don’t redeploy
the whole application, but only the corresponding microservice.
Microservices communicate through web services (such as REST) or use
remote procedure calls (RPC). However, communication between processes
is costly. We should prefer to avoid it and reduce it to the minimum.
All of the mentioned benefits and drawbacks coming from the microservice
architecture are described in detail in the following sections.
4.2 Comparison with monolithic architecture
The advantages of microservices architecture are best identified when we
compare it to the traditional monolithic architecture. This chapter explains
why the microservices architecture is a better fit for deployment in cloud
environments.
Monolithic application has all of its components packed together. For in-
stance, monolithic web applications have the client-side, the server-side and
the database in a single logical executable. Similarly, monolithic IoT applica-
tions have the whole logic for: communication with IoT devices, processing of
devices’ data, eventual communication with databases and visualization, in a
single logical executable as well. Although we might develop an application
in a modular way (e.g. have separate Maven modules for different tasks), at
the end it is still packed as a single logical executable.
In figure 4.1 we can spot the differences between the monolithic and the
microservices architecture. The application illustrated on the left side is de-
veloped by the monolithic approach and has all of its components packed
into one container. To achieve scalability and elasticity in monolithic appli-
cations, more instances of the whole application must be deployed or ter-
minated. However, different application functionalities rarely have an equal
share of the workload. In IoT solutions, we cannot expect the workload to be
continuously equal both in the component communicating with sensors and
Bachelor’s thesis 23
the component for visualization. Therefore, we mustn’t scale them together.
Similarly, in a typical web application for online sale of goods, we can predict
that the catalogue lists will be used more frequently than a purchase com-
pletion functionality because most of the people go through catalogues first
and then decide to checkout. With the microservices approach, as shown on
the right side of figure 4.1, every microservice is packed as an independent
component. We can scale every microservice independently and change the
number of instances for each microservice separately. Since microservices are
fine-grained components, it is the essential approach if we want to have a
fully elastic application. We scale the application according to the workload
of each of the microservices, instead of scaling all of the components together
like we have to do in monolithic applications. In this way, we get a scalable,
elastic and resource efficient application.
Figure 4.1: The application illustrated on the left side adopts the monolithic
approach. It has all of its components packed in a single logical executable
and runs in a single process. On the right side, we see the same application
built using microservices. Each component is a separate process and can be
developed, deployed, upgraded and scaled independently [29].
24 Naum Gjorgjeski
Figure 4.1 shows another difference of the monolithic and the microser-
vices architecture. Monolithic applications usually run as a single process. If
we want to release an update of the application, the whole application must
be redeployed for the changes to take effect. It doesn’t matter which compo-
nent we have changed. With the microservices approach, we would redeploy
the appropriate microservice, where we have made a change. The update
of one microservice should cause no or minor changes to the other microser-
vices. Here we can spot a major challenge of the microservices approach. We
mentioned that the communication between components is expensive and
should be minimized. If a change in one microservice imposes many changes
in other microservices, the benefits of the microservices architecture are lost.
For instance, a change in the way the application communicates with IoT
devices and gets data from them must have no or minimal impact on how we
process the data. Therefore, we should try to componentize the application
into microservices in a way that would allow for the communication between
the microservices to be minimal. The approaches to componentization of IoT
applications are discussed in section 4.3.1.
Monolithic applications could become so big that managing them could
be a nightmare. If a monolithic application is big, it is more vulnerable and
harder to update. With that in mind, it is more likely to make a mistake.
Bugs in monolithic applications could be really expensive, as they cause the
whole application to crash, whereas in microservices applications only the
corresponding microservice collapses. That means the application is still
running and only the specific functionality implemented by the microservice
is unavailable. The importance of this behavior is even more apparent in
IoT applications. For instance, if a microservice which communicates with a
certain group of sensors crashes, that won’t affect or stop the processing of
the data provided by microservices which communicate with other sensors.
The other components of the application will still be up and running.
Nonetheless, the monolithic architecture finds its applications, where it
might actually be better than the microservices architecture. Simple and
Bachelor’s thesis 25
small applications are both easily manageable and easily scalable. In that
case, we wouldn’t bother with the adoption of the microservices architecture
since it is an overkill for simple applications. However, we know that applica-
tions in the IoT world are rarely simple. The advantages of the microservices
architecture are revealed when the complexity starts to grow [29] [40].
4.3 Impact of microservices architecture on
IoT solutions
IoT applications have high requirements regarding scalability, what funda-
mentally pushes us towards distributed architecture. Even though we have
virtually unlimited storage and computing resources in the cloud, we saw
that IoT applications still require a different approach than the monolithic
one.
The microservices architecture has emerged in recent years and conse-
quently is not fully defined. However, most of the microservices applications
share the same characteristics. In general, the microservices architecture is
adaptable to the requirements of IoT applications. In this section we will an-
alyze the characteristics of the microservices architecture that directly affect
IoT applications.
4.3.1 Componentization via services
When developing applications, it is a good practice to break down the ap-
plication into several components. Such components that programmers fre-
quently use are libraries. The concept of services in microservices architecture
is similar to the libraries concept. However, there is one big difference. Li-
braries are linked to the program and when the program is running, there is
only one process. On the other hand, the microservices architecture tends to
componentize a project into services, where each service is running in its own
separate process. Thereby, each microservice could be deployed and scaled
26 Naum Gjorgjeski
independently [29].
By componentization into (micro)services the problem with the vast het-
erogeneity of devices could be addressed. We can have distinctive microser-
vices for devices that communicate using different protocols. These microser-
vices might act as a proxy. The issue of adding new devices, which commu-
nicate using a protocol we don’t support, is usually resolved by adding a
microservice acting as a proxy between protocols [47].
In general, there are two approaches for decomposition of applications
into microservices: verb-based and noun-based [33]. Verb-based is the de-
composition of an application around single use cases. In IoT applications,
such scenarios are rare. If we are dealing with multiple groups of devices,
we might group the logic for communication with certain type of devices
(e.g. temperature sensors), the data processing logic and the visualization
logic for this certain group of sensors in one microservice. However, that
approach is not a natural fit for IoT applications, as the scaling of different
modules is dependent upon different factors. For instance, the communica-
tion with devices is most dependent upon the number of devices and the
amount of data they generate, while the visualization application must take
into account the number of users which access it simultaneously. On the other
hand, noun-based is the decomposition in which a microservice is responsible
for every operation related to a certain functionality. That is the approach
we have mentioned throughout the chapter. We can have one microservice
which communicates with the devices and exchanges data with them, other
microservice which processes the ”hot” data (e.g. CEP), third microservice
might store the data in a database for eventual later processing, while fourth
microservice might be responsible for visualization. This leads to a dynamic
application, where we can orchestrate and scale each functionality separately.
A combination of the verb-based and noun-based approach is also possible.
First, we do a verb-based decomposition. Then, as we described before, we
might decomponentize the microservice responsible for the communication
with devices into many microservices in a noun-based way. In this manner,
Bachelor’s thesis 27
each microservice is acting as a proxy and is responsible only for a group of
devices.
Microservices are reusable components. Once developed, a microservice
can be integrated into other applications [28]. For instance, if we develop
a microservice for communication with a certain type of sensors, it can be
integrated in other applications that need to communicate with the same
type of sensors.
4.3.2 Decentralized governance
Monolithic applications are usually written in one programming language.
The approach to write them in different languages is possible, but that might
bring additional problems. Lots of companies decide to use the same tech-
nologies for most of their projects. However, we know that a specific language
or technology doesn’t fit all projects and specifications. That often leads to
poor and slow applications with many bugs.
On the other hand, microservices enable a decentralized approach. In
fact, the microservices architecture is encouraging developers to write differ-
ent microservices using different technologies. This characteristic is especially
helpful to IoT applications. For instance, it enables us to use different tech-
nologies for communication with devices and for data processing. We can
use an alternative technology for visualizations as well. Microservices en-
able all of the technologies to be integrated without having to worry about
compatibility issues.
Microservices are completely independent. If a microservice wants to
communicate with other microservice (e.g. the data processing component
communicates with the component for communication with devices), it does
that through standardized protocols (e.g REST, SOAP). Microservices are
agnostic. A microservice is unaware of what the core of an other microservice
consists of, as it is a different process.
Decentralized governance results in faster adoption of newer and better
technologies, which is not the case with monolithic applications. This is
28 Naum Gjorgjeski
especially important for fast developing fields like the IoT. However, having
freedom of choice doesn’t mean that we should use a different technology
for every microservice. We should use a different technology if it brings
enhancements to our application [29] [40].
4.3.3 Design for failure
Since microservices are independent components, they should be tolerant of
failure of other services. A service call failing because of unavailability of
other service occurs often.
This is one of the rare drawbacks of using microservices instead of the
opposite monolithic architecture. Since in microservices architectures we
must be particularly aware of failures of other services, a slight increase in
complexity is necessary.
In IoT applications, quick detection of failures is needed. Microservices
applications tackle this problem by real-time semantic monitoring, which
enables us to quickly detect a problem and mobilize the staff to resolve it.
Useful parameters for semantic monitoring in IoT applications might be cur-
rent throughput or latency of sensor events [29].
4.3.4 Evolutionary design
The IoT is a fast developing technology and we can’t know what the future
brings. Probably we would want to integrate new devices in the system. Due
to the advancements in artificial intelligence, in the future we might want to
add new analyses, so that we get additional insights. As monoliths get big
and complex, this is not an easy task. Therefore, microservices serve as a
tool to control changes in a faster pace [29].
Bachelor’s thesis 29
4.4 Scaling of microservices
The microservices architecture brings more freedom for deployment and man-
agement of applications in cloud infrastructures. In order for an applica-
tion to be elastic, we should ensure that every microservice uses an optimal
amount of resources.
For scaling of microservices both IaaS and PaaS can be used. However,
PaaS environments are particularly interesting for us. They offer a platform,
which is responsible for low-level operations like management of virtual ma-
chines, application deployment, load balancing, etc. PaaS enables easier
management of applications and is especially suitable for microservices. It
enables us to focus on the IoT aspect and not on the essential low-level part
of the application. On the contrary, we could hardly find a PaaS provider
that supports monolithic applications. In case of a monolithic application,
we must do the job of configuration and management of the application. In
the following sections, the different strategies of scaling are explained with
special consideration of microservices [40].
4.4.1 Vertical scaling
Vertical scaling (or scaling up) is an addition of resources (i.e. memory,
CPU) to a single specific node. That gives the node additional capabilities
to handle application requirements successfully. However, this approach is
very limited in its nature, as we can’t add resources forever because nodes
hit some physical constraints. IoT applications are often dealing with a huge
number of events and require lots of resources, which a single node often
can’t provide. Therefore, this approach is not feasible if it is not combined
with other horizontal scaling approaches [17].
4.4.2 X-axis scaling
The following approaches to scaling are actually the three dimensions of the
scale cube [42] [40]. The scale cube is illustrated in figure 4.2.
30 Naum Gjorgjeski
Figure 4.2: The scale cube and its dimensions [14]. X-axis scaling is classic
horizontal scaling by adding more instances. Y-axis scaling is scaling by
decomposition of the application to microservices. Z-axis scaling introduces
the concept that one instance is responsible for a subset of the data or the
requests.
X-axis scaling is a typical horizontal scaling (or scaling out). We use
several application instances in order to distribute the workload evenly. It is
most common for monolithic applications to use this kind of scaling. Hav-
ing N instances of an application, each instance gets 1/N of the workload.
Since monolithic applications often maintain user state, the same server must
process requests from a certain user.
That is not the case with microservices, which are stateless. Each mi-
croservice is scaled separately. PaaS environments further facilitate the scal-
ing of microservices, as they allow us to specify the number of instances for
a specific microservice. Some PaaS providers also offer automatic scaling,
i.e. microservices are dynamically scaled depending on the current workload.
Bachelor’s thesis 31
If we use automatic scaling, the degree of elasticity depends on the scaling
rules we set. Otherwise, we must monitor the state of the application and
try to achieve scalability and elasticity manually.
4.4.3 Y-axis scaling
Y-axis scaling is defined as the functional decomposition of an application
into services. If we look back at section 4.1, we see that microservices repre-
sent this concept by definition. In section 4.3.1 we discussed the functional
decomposition of IoT applications as well. In practice, a combination of both
X-axis and Y-axis scaling is most commonly used for scaling of IoT applica-
tions. It is also the most popular scaling option for microservices applications
overall.
4.4.4 Z-axis scaling
Z-axis scaling is based on the sharding principle. It is similar to X-axis scal-
ing, except for one variation. An instance is responsible only for a subset
of the requests or the data (in case the microservice uses and communicates
with a database). A common use case where Z-axis scaling is applicable
is division according to the user category. For example, requests of users
who have payed for a service are routed to different, more powerful servers.
Premium users usually have higher Service Level Agreements (SLA) than
non-premium users. Therefore, their requests are not routed together with
the requests of non-premium users, as higher performance must be ensured.
Better performance is usually provided by less restrictive policy on the cre-
ation of new instances or redirection to more powerful instances. Premium
users might also have access to additional services. For example, we might
process and visualize data from additional sensors, available to premium
users only. Therefore, they will be redirected to servers which are hosting
these additional services. A component for routing might be implemented
as an additional microservice. However, Z-axis scaling is adding complexity
32 Naum Gjorgjeski
to an application and should be used only if it is crucial for user division or
achievement of higher performances.
In figure 4.3 we can see the X-axis, Y-axis and Z-axis scaling in a practical
example.
Figure 4.3: The dimensions of the scale cube in a practical example. We first
componentize the application into microservices (Y-axis scaling). Which
instance would handle a request depends on which subset of the data is
needed (Z-axis scaling). Routing of requests in order to achieve user division
is not illustrated in this figure. We then replicate each microservice according
to our needs (X-axis scaling) [14].
Chapter 5
Complex event processing
Event-Driven architecture (EDA) is an architecture based on production,
detection, consumption of and reaction to events [11]. CEP is a subset of
EDA. CEP’s task is to filter, aggregate and match low-level events to generate
higher-level, i.e. complex events [46].
5.1 The need for CEP
We collect events (i.e. raw data) from IoT devices in order to turn them into
useful information. One way to do that is to store the events in a database
and subsequently use algorithms to get data-based insights. However, this
approach has a significant shortcoming. If we store the data in a database in
order to process it later, we lose the opportunity to analyze the data while it is
still fresh. In most cases we want to turn the raw events from IoT devices into
meaningful information in real time. We know that events in IoT networks
are being created at unprecedented velocity. That is why the term Fast data
is so popular nowadays. Even though we might use a database for near real-
time event processing, it is not a favorable approach. The queries against
a database are significantly affecting processing performance. This issue is
even more obvious in IoT networks because of the high number of events and
the high velocity of event creation. Therefore, this approach doesn’t fit the
33
34 Naum Gjorgjeski
needs of IoT networks.
Let’s take the example of a fire detection system we mentioned in chap-
ter 1. We want to have a notification of a fire as soon as it starts. Time
in such situations is critical and the applicability of a fire detection system
depends upon it. In fire detection systems there is no need to store any of the
raw events at all. If a fire is detected, a complex event is fired. The complex
event includes all the relevant information for the detected fire (e.g. location
of the fire) [38]. Therefore, use of a database in such scenarios is redundant.
We can conclude that a different approach is needed.
CEP helps us transform raw events into relevant information on-the-fly.
IoT is not the only field where CEP is used. CEP can be used to analyze:
streams of transactions to detect credit card frauds, financial markets to
detect trends, network traffic to detect intrusions, etc [38]. In this chapter,
we are going to describe its evolution, comparing it to its predecessors. Then,
we will go through a list of the most common constructs and operators used in
CEP languages. By using combinations of the constructs and the operators,
we can analyze an event stream in real time and extract various information.
5.2 Evolution of information flow processing
systems
The need to process huge amount of data in real time has resulted in develop-
ment of systems for information flow processing (IFP). The general high-level
structure of IFP systems is illustrated in figure 5.1.
Sources are entities that produce data. The data from the sources flows
forward to the IFP engine. Sources might be various entities. In IoT net-
works, sources are the IoT devices, which produce events. Source might be
other IFP engine as well. As the stream of data flows into the IFP engine,
the data is being processed (e.g. filtered, aggregated, matched, etc.) accord-
ing to the defined processing rules. Processing rules are added, managed
and removed by rule managers. The output, produced by the IFP engine in
Bachelor’s thesis 35
Figure 5.1: A high-level view of a typical IFP system. The data is produced
by producers, i.e. sources. Then it is being processed according to the defined
processing rules. The produced output flows forward and is transferred to
the recipients, i.e. sinks [38].
accordance with the processing rules, is transferred to the sinks. Amongst
them, there might be an IFP engine which does further processing of the
output.
We can see that IFP systems must satisfy some key requirements of IoT
networks. First, they must be able to process the data in real-time and
be able to cope with enormous data amounts. Second, they must offer a
language for data processing, which will enable us to specify complex rela-
tionships between the data in an efficient manner [38].
In the following subsections we present the evolution of IFP systems:
active database systems, data stream management systems and the most
popular — complex event processing systems.
36 Naum Gjorgjeski
5.2.1 Active database systems
We get data from the traditional database management systems (DBMS)
in a Human-Active Database-Passive way, i.e. we write queries and get
data. Active database systems complement DBMS, trying to move the active
behavior on the database level instead. However, the basis for these systems
is still persistent storage. That means they still have problems dealing with
huge amounts of data and a big number of processing rules [38].
5.2.2 Data stream management systems
Data stream management systems (DSMS) try to overcome the issues that
active database systems have. We can install continuous (standing) queries
which are deployed on database systems and provide results until we remove
them. Therefore, we get notifications as the stream is processed, without
having to run many queries. We use these systems in a Human-Passive
Database-Active way. However, DSMS are not able to use sequencing and
ordering relations, as they are still a database extension [38].
5.2.3 Complex event processing systems
Unlike DSMS, which is a database extension, CEP is an extension of publish-
subscribe messaging systems. In publish-subscribe systems, when we sub-
scribe to a topic, we receive single events, as publish-subscribe systems don’t
maintain history of events and don’t discover connections between them.
CEP extends this behavior and enables us to subscribe to complex events.
In order to offer us subscription to complex events, CEP addresses the main
shortcomings of DSMS systems. DSMS systems don’t have the ability to
specify complex relationships between the data, as time-consuming opera-
tions such as ordering are very slow in a database extension. On the other
hand, CEP systems can do complex pattern matching, filtering, aggregation,
sequencing, ordering, etc. CEP gives us much more freedom in specifying
complex processing rules. CEP tries to find a relationship between the in-
Bachelor’s thesis 37
coming events according to the processing rules we defined. If it does find a
relationship, a complex event is fired [38].
CEP may take into account event hierarchies as well. Complex events
can be processed further and higher-level complex events may be generated.
We can have as many levels of hierarchy as we want [46].
5.3 Constructs and operators of CEP
languages
Almost every CEP engine has its own pattern-based event processing lan-
guage. Pattern-based languages specify conditions and actions to be per-
formed if the conditions are met. Conditions are usually represented by
patterns, which process stream partitions using various constructs and oper-
ators. Transforming rules (e.g. produce two streams from one stream) are
specific for transforming languages, which are mostly used in DSMSs. Mod-
ern CEP languages (e.g. Esper’s Event Processing Language) enable us to
specify transforming rules as well [38]. Most of the languages do not possess
all of the language constructs and operators described in this section, but
only a subset of them. We will try to cover the most common constructs and
operators in CEP languages (adapted from [38]).
5.3.1 Single-item operators
Single-item operators are amongst the simplest operators in CEP systems.
As their name suggests, they process single events independently, one after
another. There are two classes of single-item operators:
Selection operators These are classic filtering operators, throwing away
events that don’t satisfy specified criteria. For instance, in a fire detec-
tion system, selection operators might filter events which indicate that
the temperature is higher than a certain threshold.
38 Naum Gjorgjeski
Elaboration operators These operators are used for transformation of
events. Some examples are projection and renaming operators. Projec-
tion operators extract only a part of the properties of the events (e.g.
extract only the location of sensor readings). Renaming operators are
used to change the name of an event property.
5.3.2 Logic operators
Logic operators are used for detection of event combinations. The order of
events is not important for these operators. There are four groups of logic
operators:
Conjunction A conjunction of events E1, E2, ..., En means that all of the
events E1, E2, ..., En have occurred.
Disjunction A disjunction of events E1, E2, ..., En means that any of the
events E1, E2, ..., En have occurred.
Repetition A repetition of event E with degree (m,n) means that the event
has occurred between m and n times.
Negation A negation of event E means that the event E didn’t occur.
We can combine these operators (e.g. use both conjunction and disjunction)
to form more complex patterns. Also, logic operators are frequently used in
combination with windows, which we cover in one of the following subsec-
tions. Windows are used in order to set boundaries, so that logic operators
take into account only a partition of the stream (e.g. conjunction of two
events in the last three seconds).
5.3.3 Sequences
Sequences are very similar to logic operators. The difference is that the
order of events is vital for sequences. A conjunction sequence of events
E1, E2, ..., En means that all of the events E1, E2, ..., En have occurred in the
stated order.
Bachelor’s thesis 39
5.3.4 Iterations
In iterations we specify an iteration condition. The sequences of events must
satisfy the specified iteration condition. The ordering of events is important
for iterations as well. Iterations are not included in all event processing lan-
guages because they can be expressed with other event processing constructs
(e.g. combination of sequences and recursion).
5.3.5 Windows
Windows are language constructs which enable operators to process only
a part of the event stream. They are frequently used with various opera-
tors. Their purpose depends on whether the operators are blocking or non-
blocking. Blocking are the operators which have to consume the whole stream
of events before they generate output (e.g. negations and repetitions with
specified higher bound). On the other hand, non-blocking operators generate
output as the stream of events flows and they can stop generating output as
soon as they detect the needed events (e.g. conjunctive and disjunctive oper-
ators). Therefore, windows are vital for usage of blocking operators because
they specify a finite part of a stream, so that blocking operators are able to
produce output. As far as non-blocking operators are concerned, windows
make them more powerful by specifying which part of the stream they should
look at.
There are two types of windows classifications. According to the first
one, windows are divided into logical (i.e. time-based, such as the events in
the last three seconds) and physical (i.e. count-based, such as the last three
events).
The second type of classification depends upon the way boundaries of a
window move. According to this classification, windows can be:
Fixed windows Bounds of fixed windows do not move. We should specify
both the lower and the upper bound.
Landmark windows The lower bound of landmark windows is specified in
40 Naum Gjorgjeski
advance and does not move. The upper bound moves as new events
enter the CEP engine.
Sliding windows They are the most frequently used type of windows. Like
in landmark windows, the upper bound moves as events flow. In sliding
windows, the lower bound moves with the same tempo as the upper
bound. Thus, standard sliding windows enable us to look at the last
three events or the events in the last three seconds, as the lower bound
moves together with the upper bound. Sliding windows also have some
variations (e.g. tumbling windows).
5.3.6 Stream management operators
As their name suggests, stream management operators are used to manage
the streams. Stream management operators are:
Join The join operator is a blocking operator used to merge two streams (or
parts of two streams) into one.
Bag operators Bag operators use the standard set operations to manipu-
late streams. The union operator merges many streams into one, which
has unique events from all of the streams. The intersection operator
merges many streams into one, which has events detected in each of the
streams. The except operator accepts two streams and removes events
from the first stream that are contained in the second stream. The
remove-duplicate operator takes one stream and removes the duplicate
events.
Duplicate It copies the same stream, so that a copy is provided to each
distinctive sink (e.g. for further processing).
Group-by Groups events of a stream partition according to an event prop-
erty.
Bachelor’s thesis 41
Order-by Orders events of a stream partition according to an event prop-
erty.
5.3.7 Parametrization
A language must be able to offer parametrization in one way or another.
We often need to process events in one stream in relation with the events in
other stream. For instance, in a fire detection system, we might produce a fire
alert if we detect both high temperature and smoke. Therefore, we need to
filter the temperature sensors’ stream in correlation with the smoke sensors’
stream. Different CEP languages handle this issue differently. Most of the
modern languages offer parametrization through the conjunction operator.
For instance, in order to detect a fire, we can specify that the temperature
must be above 45◦C and there must be smoke in the last 10 seconds. Then,
the CEP engine does parametrization internally. In some older languages,
we have to use the join operator and merge the streams.
5.3.8 Aggregates
Aggregates are used for aggregation of events in a stream partition. Typical
examples of aggregates are minimum, maximum and average. There are two
types of aggregates:
Detection aggregates They are used during the evaluation of the condi-
tions of the processing rules. For instance, we might want to detect the
events that are below the average temperature of the last ten sensor
readings.
Production aggregates These aggregates are used for production of higher-
level events. A simple example might be generation of events containing
an average of the last ten temperature readings.
Besides the typical aggregates (minimum, maximum, etc.), some lan-
guages also enable us to define custom aggregates.
42 Naum Gjorgjeski
Chapter 6
A practical example
The complete source code of the practical example is available at https:
//github.com/NaumGj/diploma.
6.1 Goals of the practical example
Our goal is to build both a scalable and elastic IoT application. It must
be able to consume high amount of IoT events and adjust the amount of
allocated resources to the amount of data. It must be able to process the
stream of events in real time, i.e. infer useful knowledge from it. At the
end, it should visualize both some of the stream’s properties and the inferred
knowledge from the stream. In order to demonstrate this, we introduce three
different scenarios, described in section 6.3.
6.2 Structure of the practical example
In this section we are going to make a high-level overview of the structure of
the practical example. Every component of the practical example is described
in detail in the following sections.
In chapter 4 we discussed the microservices architecture and the benefits
arising from its adoption. Our IoT application follows the principles of the
43
44 Naum Gjorgjeski
microservices architecture. It consists of six microservices:
Apache Kafka producer In our case, Kafka producers simulate an event
stream. We need lots of devices to generate amounts of data such as
the amounts generated in IoT networks. Since we don’t have so many
IoT devices, we will need to simulate an event stream. In order to do
that, we use Kafka producers for event production.
Apache Kafka message broker Message brokers are delivering messages
from the producers to the consumers.
Apache Kafka consumer Kafka consumers consume the messages pro-
duced by producers (through message brokers).
Apache ZooKeeper ZooKeeper is used by producers, message brokers and
consumers for managing and sharing state [31]. It is also used by
the CEP adapter. The CEP adapter gets events from the consumers
through REST services. It uses ZooKeeper in order to discover every
consumer instance and get all of the events.
Complex event processing microservice The CEP microservice consists
of adapter, statements and listeners. The adapter sends events into the
CEP engine. Statements process the events and produce output. The
output is then handled by listeners. For CEP, we use the Esper engine
for Java.
Visualization (front-end) web application The inferred knowledge from
the events is visualized. The event throughput and the latency of events
between Kafka producers and consumers are visualized as well.
The architecture of the IoT application is illustrated in figure 6.1. We
can have one or many Kafka producers, simulating an event stream. These
events are transferred to the consumers by message brokers. A Kafka mes-
sage broker can transfer an enormous amount of events to many consumers.
In most cases, we only need a few of them. However, that is not the case
Bachelor’s thesis 45
with the consumers. This is the bottleneck for achieving scalability and the
key point for achieving elasticity. We should be able to change the number of
consumers dynamically according to the amount of events. Message brokers
and consumers use ZooKeeper to agree which consumers should read from
a particular broker. Since every consumer receives unique events, the CEP
adapter uses ZooKeeper to discover all of the consumers and receive events
from all of them. This is particularly important because the number of con-
sumers changes as we dynamically deploy and terminate consumer instances.
The events from the adapter are processed by statements, whose output is
then handled by the listeners. The visualization component communicates
with the listeners through REST services and consumes the meaningful out-
put produced by the statements. We can also check how our fire detection
and shoplifting scenarios work, by sending custom events through front-end
forms.
The visualization microservice is written using JavaScript, the AngularJS
framework and HTML. We won’t discuss its source code in the thesis, as it
is not complex and every programmer who has used these technologies will
find it easy to understand. We will only see how we used the forms and how
the visualizations look like. All of the other microservices are written in Java
EE and are using Maven for build automation. The KumuluzEE framework
enables us to pack the Java EE components into microservices as standalone
JARs. A Docker image is built for every microservice. The Docker images
are then pushed to the Google Cloud Engine, where we can deploy, scale and
manage our containerized microservices using Kubernetes.
46 Naum Gjorgjeski
Figure 6.1: The architecture of our application consists of various compo-
nents. Kafka producers simulate event production. The events are trans-
ferred to consumers through one or many message brokers. The adapter
then collects the events and sends them to the CEP engine, which processes
the events according to the processing rules (i.e. statements) and produces
meaningful output. Then, the output is visualized in the front-end compo-
nent of the application.
Bachelor’s thesis 47
6.3 Scenarios
6.3.1 Test of the throughput and the latency of events
in Apache Kafka
The goal of this scenario is to test the bottleneck for achieving scalability and
elasticity. We use producers to generate events. As we dynamically deploy
and terminate producer and consumer instances, we observe the number and
latency of events in the last 10 seconds in real time. The latency is computed
in the bottleneck only, i.e we measure the time from the production of an
event in one of the producers to its arrival in one of the consumers. In order
to get the throughput and the latency, we will create two Esper statements.
6.3.2 Fire detection
In this scenario, we try to detect a fire as soon as it breaks out. We receive
data from two types of sensors: temperature and smoke sensors. We detect
a fire if:
• The temperature is above 40 ◦C;
• The level of smoke is above 5 % obscuration per meter 1;
• The abnormal temperature and smoke sensor readings are coming from
the same room;
• The abnormal temperature and smoke readings have both been de-
tected in the last 25 seconds.
We can observe the active fires in the front-end component. Also, through
forms on the front-end, we can send custom temperature and smoke sensor
events to one of the producers using POST (REST) method. We should
1The unit indicating level of smoke depends on the type of the smoke detector. We
obtained this unit from [32]. Although it might not be the most accurate one, the purpose
of this scenario is to show how to combine data from different types of sensors.
48 Naum Gjorgjeski
specify the room number and the sensor reading value (temperature or level
of smoke). This way, we can easily test the scenario.
6.3.3 Shoplifting
In this scenario, we demonstrate pattern matching. We receive 3 types of
events: events indicating that a product is removed from the shelf, events
indicating that a product is paid and events indicating that a product exits
the store. We detect that a product is stolen if we have received an event
indicating that the product is removed from the shelf and then receive an-
other event indicating that the product exits the store, without receiving an
event indicating that the product is paid. In the front-end component, we
can observe products that are shoplifted. Also, we can send custom events
indicating that a product is removed from shelf, is paid or exits the store.
6.4 Apache Kafka
6.4.1 Topics, partitions and replication factor
Apache Kafka is a distributed, partitioned, replicated commit log service [1].
In Kafka, we publish messages in categories called topics. Topics are divided
into many partitions. The number of partitions is specified by us. The
messages can be assigned to a partition based on some semantic properties
of the message or simply in a round-robin fashion. Replication factor is the
number of replicas of each partition. When the replication factor is bigger
than one, the replicas of the partition are spread across different message
brokers (if there are multiple message brokers). The replicas of a partition
are completely identical, as their purpose is to make a topic resilient to broker
fails. Data in a partition is lost only if all brokers holding replicas of a certain
partition fail. In order for the replicas to be identical, a leader broker for each
partition is elected. Replicas of a certain partition are synchronizing with
the replica that is hosted by the leader of that partition [31]. The relation
Bachelor’s thesis 49
between topics, partitions and replicas is best illustrated in figure 6.2.
Figure 6.2: Topics, partitions and replication factor [31].
6.4.2 Consumers
In general, there are two messaging patterns: message queue and publish-
subscribe. In the message queue pattern, a message is consumed by one of
the consumers. In the publish-subscribe pattern, a message is received by
all of the consumers that have subscribed to a topic. Kafka enables us to
use both of these patterns by the consumer group concept, illustrated in
figure 6.3.
In a consumer group partitions are consumed by one consumer only. If
there is one consumer group only, we get a traditional message queue. This
characteristic is particularly important for achieving scalability because the
load is balanced among the consumers belonging to a certain consumer group.
While a partition is consumed by one consumer only, the consumer can still
consume many partitions at once. In this way, we use partitions to achieve
parallelism and increase throughput. This means that the maximum number
50 Naum Gjorgjeski
Figure 6.3: Kafka’s consumer group concept. A consumer group consumes
each of the partitions, but only one consumer in a consumer group consumes
a certain partition [1].
of useful consumers is equal to the number of partitions of a topic. In that
case, a consumer consumes messages from one partition only. On the other
hand, if every consumer belongs to a separate consumer group, we get the
publish-subscribe pattern. The consumer group concept enables combina-
tions of the two messaging patterns as well. Such example is illustrated in
figure 6.3.
6.4.3 Implementation of the producers
For the implementation of the producers and the consumers we used the
Kafka API for Java, its documentation [1] and the help of a web article [16].
We have three different producers for the three different scenarios de-
scribed in 6.3. Although we might have implemented one producer that
generates all types of events, it seemed more natural to have separate pro-
ducers. In this section, we describe only the producer of events for the event
throughput and latency scenario. The other producers differ in the types of
Bachelor’s thesis 51
events they send. Also, the producers for the fire detection and shoplifting
scenarios offer injection of custom events (from the front-end application)
through REST services implemented using JAX-RS.
First, among the other needed dependencies, we should specify the de-
pendencies for Kafka and KumuluzEE in module’s pom.xml file:
<dependency>
<groupId>com . kumuluz . ee</groupId>
<a r t i f a c t I d >kumuluzee−core</a r t i f a c t I d >
</dependency>
<dependency>
<groupId>org . apache . kafka</groupId>
<a r t i f a c t I d >kafka−c l i e n t s </a r t i f a c t I d >
</dependency>
KumuluzEE will pack our producer component in a microservice. The
versions of the dependencies are specified in the parent, i.e. the root pom.xml.
The producers are implemented as application scoped beans. The production
of events is initialized by the following method:
pub l i c void i n i t ( @Observes @ I n i t i a l i z e d (
↪→ Appl icat ionScoped . c l a s s ) Object i n i t ) {in i tT imer ( ) ;
}
The initTimer() method contains initialization logic. In this method we
schedule the production of events. Before starting event production, we need
to initialize the Kafka producer through the Kafka API. To do that, we placed
a configuration file named producer.props in the resources map. The most
important configuration setting is the specification of the bootstrap.servers
property. Here we specify the IP addresses of the message brokers in a Kafka
cluster and the ports they listen on. As we will see later, passing a fixed IP
is not a problem thanks to the Kubernetes services. Once configured, the
producer can start sending events:
52 Naum Gjorgjeski
producer . send (new ProducerRecord<Str ing , Str ing >(” f a s t−↪→ messages ” , mapper . wr i teValueAsStr ing ( event ) ) ) ;
The first parameter (in our case ”fast-messages”) is the name of the topic we
send messages to. The second parameter is an event serialized to the String
class. Here, mapper is Jackson’s Object Mapper, which helps us serialize and
deserialize event classes to JSON or String and vice versa. Since we haven’t
specified to which partition the message should be sent, the producer will
send our messages in a round-robin fashion.
6.4.4 Implementation of the consumer
Unlike the producers, we have one universal consumer which consumes each
event type we have defined. We should include the same dependencies in the
pom.xml file of the consumer module, as we did in the pom.xml files of the
producers.
Similar to the producers, we first need to read consumer’s configuration
file (named consumer.props). We will have a look at the two most vital
properties:
bootstrap.servers This property is having the same role as in the pro-
ducer’s configuration file.
group.id Here we specify the ID of the consumer group. Later, when we
will deploy many consumer instances, they will belong to the same con-
sumer group. That means the instances will balance the load among
themselves in a message queue manner and enable us to achieve scala-
bility.
Then, we should subscribe to the topics we want the consumer to consume
(in our case the ”fast-messages” topic):
consumer . sub s c r i b e ( Arrays . a s L i s t (” f a s t−messages ”) ) ;
Bachelor’s thesis 53
When we have subscribed to one or more topics, we can poll events from
them:
ConsumerRecords<Str ing , Str ing> r e co rd s = consumer . p o l l
↪→ (200) ;
If the consumer doesn’t receive any events, it sleeps for 200 milliseconds and
tries to poll again. We can then iterate through the records and deserialize
them:
SimpleEvent event =
mapper . treeToValue (msg , SimpleEvent . c l a s s ) ;
The events are added to an array which is emptied every time the CEP
adapter consumes the events from it.
We can see above that we deserialize and convert each event type to the
SimpleEvent class. We can do this thanks to Jackson’s polymorphism. As
we have mentioned before, we have a root event type, implemented in the
SimpleEvent class which is located in the module-models Maven module. All
of the other event types extend this class, inherit its properties and add new
event-specific properties. Jackson deserializes each event to its actual type.
It knows to which type an event should be deserialized because we registered
each event type through the following annotations in the SimpleEvent class:
@JsonSubTypes ({@JsonSubTypes . Type (
va lue = TemperatureSensorEvent . c l a s s ,
name = ” temperature ”) ,
@JsonSubTypes . Type ( value = SmokeSensorEvent . c l a s s ,
name = ”smoke ”) ,
@JsonSubTypes . Type ( value = Shel fEvent . c l a s s ,
name = ” s h e l f ”) ,
@JsonSubTypes . Type ( value = PaidEvent . c l a s s ,
name = ” paid ”) ,
@JsonSubTypes . Type ( value = ExitEvent . c l a s s ,
54 Naum Gjorgjeski
name = ” e x i t ”)
})
One more notable thing in the implementation of the consumer is the use
of ZooKeeper. The consumer implements @PostConstruct and @PreDestroy
methods, invoked just after the bean is constructed and before it is destroyed
respectively. In the @PostConstruct method, the consumer registers itself
as a service in ZooKeeper, while in the @PreDestroy method it unregisters
the service. The CEP adapter uses ZooKeeper to discover each consumer
instance which is currently deployed and consume events from each of them.
The module implementing the communication with the ZooKeeper server
(module-utils) is taken from [39].
6.5 Esper
In chapter 5 we discussed the need to analyze the events as the stream of
events flows, instead of storing them in a database and then running queries
against a database. In order to be able to analyze events with CEP tech-
niques, we decided to use Esper.
Esper is a standalone tool for CEP and event series analysis. It is an
open-source tool, which comes with API for Java and .NET. It is designed to
enable complex computations in event-driven architectures in real time, while
maintaining high throughput and low latency. It offers its own language for
event processing called Event Processing Language (EPL). This language has
most of the constructs and operators we discussed in section 5.3: aggregators,
filtering operators, windows, etc. In general, Esper processes events through
event patterns or event stream queries [9]. We will use both of them: an
event pattern in the shoplifting scenario and event stream queries in the
other scenarios.
A simple CEP architecture is shown in figure 6.4. The adapter sends
raw events to the CEP engine, which processes them using a user-defined
statement. The statement’s output are complex events, which are transferred
Bachelor’s thesis 55
to the listener. This is a traditional CEP architecture in its simplest form.
We can then add many statements vertically or horizontally. For instance,
we might have many parallel statements or the output of one statement is
then the input of another statement. Such situations are demonstrated in
our scenarios as well. Also, a statement might have many update listeners
or have no update listener at all. For instance, one listener of a statement
might do logging only, while another one might offer the complex events as
REST services.
Figure 6.4: Simple CEP architecture. The raw events from the adapter are
processed using statements. The output of the statements is then reported
to listeners.
6.5.1 Implementation of the adapter
The microservice for CEP is having a common pom.xml file for the adapter,
the statements and the listeners. In addition to the Esper dependency, we
should also include some additional dependencies: ANTLR, CGLIB and
Apache Commons logging. These libraries are required by Esper. More
information on their usage is provided in [9].
In the initialization part, we should first define all of the event types. For
example, the event generated by smoke sensors is added with the following
code:
c o n f i g u r a t i o n . addEventType (” SmokeEvent ” ,
↪→ SmokeSensorEvent . c l a s s . getName ( ) ) ;
56 Naum Gjorgjeski
The other event types are added in the same way. Then, we get an instance
of Esper’s service provider:
EPServiceProvider epSe rv i c e = EPServiceProviderManager .
↪→ getProv ider (” RestAdapter ” , c o n f i g u r a t i o n ) ;
With the help of Esper’s service provider, we can now register our statements
and attach listeners to them. Here is an example of the registration of the
statement that computes latency and the listener that receives updates from
the statement:
LatencyStatement latencyStmt = new LatencyStatement (
↪→ epSe rv i c e . getEPAdministrator ( ) ) ;
latencyStmt . addLis tener (new LatencyLi s tener ( ) ) ;
Once we have initialized Esper, we can schedule the adapter to call the REST
services offered by the consumers:
ScheduledFuture<?> runnableHandle = schedu l e r .
↪→ scheduleAtFixedRate ( runnable , 5000 , 500 , TimeUnit
↪→ .MILLISECONDS) ;
After the initial delay of 5000 milliseconds, the REST services are called every
500 milliseconds. As we said, the adapter uses ZooKeeper to discover every
consumer instance and get its events through GET requests. The instance
of ZooKeeper’s service registry is injected in the adapter’s bean through
Contexts and Dependency Injection (CDI).
When the adapter receives some events, we can send each of them to the
CEP engine:
epSe rv i c e . getEPRuntime ( ) . sendEvent ( event ) ;
6.5.2 Implementation of the statements
The creation of statements in Esper is pretty straightforward. Thus, the four
statements we have are created in this manner:
Bachelor’s thesis 57
pub l i c Statement ( EPAdministrator admin ) {St r ing stmt = ”<The statement >”;
EPStatement statement = admin . createEPL ( stmt ) ;
}
Hereinafter we will discuss the content of every statement.
Number of events in the last 10 seconds
In Esper, we can almost always get the desired results in many ways. This
is also the case with the computation of the number of events in the last 10
seconds. In our application, we have first registered the following context:
c r e a t e context Ctx10Seconds i n i t i a t e d @now and
pattern [ every t imer : i n t e r v a l (10) ]
terminated a f t e r 10 sec
That means Esper will do an action every 10 seconds, without keeping events
in memory. Then, we can get the actual number of events with the following
statement:
context Ctx10Seconds s e l e c t count (∗ ) as cnt
from SimpleEvent output snapshot when terminated
In this statement, we used Esper’s count aggregation function. The output
is sent to the listener when the pattern terminates, i.e. every 10 seconds.
We could have adopted a simpler approach and could have used a sliding
window:
s e l e c t count (∗ ) as cnt
from SimpleEvent . win : time (10 sec )
However, this statement would produce output every time an event enters or
leaves the sliding window. Furthermore, Esper would keep all of the events
that arrived in the last 10 seconds in its memory.
Using a batch looks simpler than our solution as well:
58 Naum Gjorgjeski
s e l e c t count (∗ ) as cnt
from SimpleEvent . win : t ime batch (10 sec )
According to this statement, Esper would accumulate events and send the
output to the listener every ten seconds, just like above. However, it would
still keep events in memory. Therefore, the solution we chose fits our needs
better than the other solutions [10].
Average latency (between the producers and the consumers) in the
last 10 seconds
All of the events are timestamped when they are sent from one of the pro-
ducers and when they arrive at one of the consumers. Since each event has
these two properties, we can easily calculate the average latency of the events
that arrived into the CEP engine in the last 10 seconds:
context Ctx10Seconds
s e l e c t avg ( timestampConsumed − timestamp ) as avgLatency
from SimpleEvent output snapshot when terminated
Note that we use the same context we already defined to count the number of
events. Esper can register a given context only once and we mustn’t redefine
it.
Fire detection
In section 6.3.2, we described the fire detection scenario and the conditions
that need to be met for a fire to be detected. At first, we might go straight-
forward and define the following statement:
s e l e c t temperature . roomNumber as room ,
temperature . ce l s iusTemperature as temp ,
smoke . obscurat i on as obscur
from TemperatureEvent . win : time (25 sec ) as temperature ,
SmokeEvent . win : time (25 sec ) as smoke
Bachelor’s thesis 59
where temperature . roomNumber = smoke . roomNumber
and temperature . ce l s iusTemperature > 40 .0
and smoke . obscurat i on > 5 .0
The statement actually does detect fire. However, there is one problem.
For example, if the CEP engine receives two abnormal temperature and two
abnormal smoke events, it outputs the four possible combinations of these
events, i.e. we get four complex events. When a fire really breaks out, we
might get thousands of abnormal events in 25 seconds. So, if we have T
abnormal temperature events and S abnormal smoke events in the last 25
seconds, the output would be T ∗ S complex events. We can see that a
different approach is needed.
Granularity of statements is recommended in Esper’s documentation. If
a complex statement can be partitioned into many smaller and more com-
prehensible statements, we should do that. Therefore, we will define three
statements. As the events flow, the first statement inserts the abnormal
temperature events into another stream, called HighTemperature:
i n s e r t i n to HighTemperature
s e l e c t ∗ from TemperatureEvent as temperature
where temperature . ce l s iusTemperature > 40 .0
Similarly, the second one inserts the abnormal smoke events in a separate
stream, called LotSmoke:
i n s e r t i n to LotSmoke
s e l e c t ∗ from SmokeEvent as smoke
where smoke . obscurat i on > 5 .0
At the end, we combine the two new streams in order to detect a fire:
s e l e c t temperature . roomNumber as room ,
temperature . ce l s iusTemperature as temp ,
smoke . obscurat i on as obscur from
60 Naum Gjorgjeski
HighTemperature . s td : groupwin ( roomNumber ) . win : time (25
↪→ s e c ) as temperature ,
LotSmoke . std : groupwin ( roomNumber) . win : time (25 sec ) as
↪→ smoke
where temperature . roomNumber = smoke . roomNumber
In this statement, we take advantage of Esper’s std:groupwin function. It
enables us to create sub-views [9]. By specifying the room number as a key,
we will get only the last abnormal temperature and smoke events from a
specific room for the last 25 seconds. If there are no abnormal events for
a certain room in the last 25 seconds, no events are taken into account for
that room. In this way, only combinations of the last abnormal events in a
certain room are created.
Shoplifting detection
In section 6.3.3, we described the shoplifting scenario. In this scenario, we
will demonstrate the use of pattern matching. We detect that a product is
stolen by the following statement:
s e l e c t ∗ from pattern
[ every s=Shel fEvent −>( ExitEvent ( productId = s . productId )
and not PaidEvent ( productId = s . productId ) )
where t imer : with in (12 hours ) ]
We detect every event where a product is removed from the shelf, which is
then followed by an event indicating that the product exits the store without
being paid. We keep events for the last 12 hours.
6.5.3 Implementation of the listeners
The listeners are similar to each other. In our case, they all gather the
complex events produced by the statements and offer them to the front-end
Bachelor’s thesis 61
component through REST services. For example, the fire detection listener
adds the active fires in an array. Then, we inject the listener and offer its
events through REST services:
@RequestScoped
@Path(” f i r e s ”)
@Consumes( MediaType .APPLICATION JSON)
@Produces ( MediaType .APPLICATION JSON)
pub l i c c l a s s R e s t S e r v i c e F i r e s {
@Inject
p r i v a t e F i r e D e t e c t i o n L i s t e n e r f i r e L i s t e n e r ;
@GET
pub l i c Response g e t F i r e s ( ) {
List<FireObject> f i r e s = f i r e L i s t e n e r . g e t F i r e s
↪→ ( ) ;
r e turn Response . ok ( ) . e n t i t y ( f i r e s ) . bu i ld ( ) ;
}
}
We must also take care of Cross Origin Resource Sharing (CORS), so that we
can access the REST services from the front-end. This is done by specifying
a CORS filter in the webapp/WEB-INF/web.xml file located in the resources
map.
62 Naum Gjorgjeski
6.6 Deployment and management of the mi-
croservices using Kubernetes
6.6.1 Project setup and build of images
The process of deployment and management of microservices with Kuber-
netes is adapted to our needs from the tutorial [27] and the documenta-
tion [20] of Kubernetes.
In order to be able to use the Google Cloud Platform, we must first create
a Google account. For the purposes of this thesis, we used the Google Cloud
SDK command-line interface. Also, we had to install Docker and the kubectl
Kubernetes component, as described in [27].
An illustration of what we should achieve is displayed in figure 6.5. We
must build a Docker image for each of our microservices or use a ready
prebuilt Docker image from the Docker Hub. Then, we will deploy these
images in a Kubernetes cluster, where we can manage and replicate them as
needed. We explain how to do this in the rest of the section.
We use our Google account to log into Google’s cloud console (https:
//console.cloud.google.com/). Then, we create a new Google Cloud
Platform project and we assign a unique ID to it, as shown in figure 6.6.
The ID is particularly important as we will be using it to push and deploy
our microservices in the cloud.
Docker helps us containerize our microservices. In order to be able to
build Docker images, we should start a Docker machine. We use Google
Cloud SDK to run Docker commands. However, Docker commands can be
run in other shells as well (e.g. Docker Quickstart Terminal). If a Docker
machine is not created, we should first create it, as described in [15]. To start
a Docker machine named default, we should run the following command:
docker−machine s t a r t d e f a u l t
In order to get and run the commands to set up the Docker environment for
the machine named default, we run the following command [5]:
Bachelor’s thesis 63
Figure 6.5: The process of deployment of our microservices in a Kubernetes
cluster. We should build an image or use a prebuilt image for each of our
microservices. Then, each image is pushed to the Google Container Reg-
istry. We can then manage and scale the containerized images in the Google
Container Engine using Kubernetes [27].
docker−machine env d e f a u l t
At this point, we have a configured Docker machine and we can build im-
ages for our microservices. In order to build images automatically, we need
a Dockerfile, which is actually ”a Docker script”. It contains the commands
we could have run directly, one by one, in a command line interface [8]. The
Dockerfile shown below is taken from [39] and is similar for each microservice,
except for the front-end. The Dockerfile in the module-producer map that
should automate the build of our producer looks like this:
64 Naum Gjorgjeski
Figure 6.6: Creation of a new project in Google’s cloud console.
FROM java : openjdk−8u45−jdk
MAINTAINER t i l e n . faganel@me . com
RUN apt−get update −qq && apt−get i n s t a l l −y wget g i t
RUN wget http :// mi r ro r s . s on i c . net /apache/maven/maven
↪→ −3/3.3.9/ b i n a r i e s /apache−maven−3.3.9−bin . ta r . gz
↪→ && \ta r −zx f apache−maven−3.3.9−bin . ta r . gz && rm
↪→ apache−maven−3.3.9−bin . ta r . gz && \mv apache−maven−3.3.9 / usr / l o c a l && ln −s / usr /
↪→ l o c a l /apache−maven−3.3.9/ bin /mvn / usr / bin /
↪→ mvn
RUN mkdir /app
WORKDIR /app
ADD . /app
RUN mvn c l ean package
ENV JAVA ENV=PRODUCTION
EXPOSE 8080
Bachelor’s thesis 65
CMD [ ” java ” , ”− s e r v e r ” , ”−cp ” , ”module−producer / t a r g e t /
↪→ c l a s s e s : module−producer / t a r g e t /dependency /∗” , ”
↪→ com . kumuluz . ee . EeAppl icat ion ” ]
We take the Java JDK image and install git and Maven. Then, we add
the application in the app folder. Maven cleans, compiles and packages our
project. Before the final CMD command, we expose port 8080. We can have
only one CMD command in a Dockerfile. In our case, we run our microservice,
which is packaged by KumuluzEE. Dockerfiles for other microservices differ
only in the last line, where the respective module is specified instead of the
module-producer module.
The Dockerfile for the front-end microservice looks differently:
FROM ubuntu
RUN apt−get update −qq
RUN apt−get i n s t a l l −y bui ld−e s s e n t i a l node j s npm
↪→ nodejs−l egacy vim
RUN mkdir /myangularapp
ADD www /myangularapp
WORKDIR /myangularapp
RUN npm i n s t a l l −g http−s e r v e r
EXPOSE 8080
CMD [ ” http−s e r v e r ” ]
It starts from the latest Ubuntu image, installs npm and its lightweight http-
server, exposes port 8080 and runs the HTTP server.
Since we have the Dockerfiles, we can finally build our images. We should
build six images: three for the producers (one for each scenario), one for the
consumer, one for CEP and one for the front-end. In order to deploy Apache
ZooKeeper and Apache Kafka message brokers, we will use existing images
from the Docker Hub, as we will see later. Therefore, we won’t build these
66 Naum Gjorgjeski
images ourselves. We run the following command six times to build the six
images we need:
docker bu i ld −t gcr . i o / io t−micros/<name−of−the−↪→ micro se rv i c e> −f <map−conta in ing−the−Docke r f i l e−↪→ f o r−the−micro se rv i c e >/D o c k e r f i l e .
The -t parameter specifies the image tag, while the -f parameter specifies
the location of the Dockerfile. Note that iot-micros is the ID of the Google
Cloud Platform project we created. The name of a microservice is specified
by us. The names should be unique within a project. Once we have built
the images, we can push all of them to the Google Container Registry (we
use the tag that we have specified in the previous step):
gc loud docker push gcr . i o / io t−micros/<name−of−the−↪→ micro se rv i c e>
Then, we should create a Kubernetes cluster. This can be done ei-
ther through the console with the gcloud command, as described in [27],
or through the GUI in Google’s cloud console (in the Container Engine sec-
tion). Amongst other parameters, we specify the name of the cluster, the
number of nodes and the machine types in the cluster. When the cluster is
created, we should get credentials to be able to access and manage it from
the Google Cloud SDK:
gcloud conta ine r c l u s t e r s get−c r e d e n t i a l s <c l u s t e r−name
↪→ >
6.6.2 Pods, replication controllers and services in
Kubernetes
Before we deploy our microservices, let’s take a look at the building blocks
of Kubernetes that we need. As described in the Kubernetes documenta-
tion [22], pods are the smallest deployable unit in Kubernetes. They can
Bachelor’s thesis 67
contain one or more containers. However, if we put many containers in a
pod, we must scale them together. Therefore, we decided to have one con-
tainer in each of our pods. However, if deploy our applications as ”naked”
pods, in case of a pod, node or some other kind of failure, the pod is not
redeployed. That is where replication controllers come in handy [23]. In
replication controllers, we can specify the number of pod replicas we want to
maintain. If a pod fails for some reason, a new pod is automatically deployed.
Every pod has its own IP address, visible only in the cluster. It is also
possible to expose the pod and get an external IP and external access to the
pod. The ZooKeeper server and the Kafka brokers run in pods. In order to
connect to ZooKeeper or the brokers, we need their IPs. The brokers and the
microservice for CEP need the IP of ZooKeeper, while the producer and the
consumer microservices need the IPs of the brokers. However, we mentioned
that a pod might fail. The replication controller ensures that a number of
pod replicas are running and starts a new pod if some pod fails. The newly
created pod would have a different IP from the pod that failed. A good
practice in order to connect to ZooKeeper and the brokers is to use services.
Kubernetes services expose a set of pods using a policy we define. Services
get a stable IP, regardless of changes in the set of pods [26].
To run Apache Kafka and Apache ZooKeeper in a Kubernetes cluster,
we used the service and the replication controller files from an open-source
repository [2]. In these replication controllers, ZooKeeper and the brokers
are deployed from existing images in the Docker Hub [6] [7]. We will also
have both a service and a replication controller for our CEP and front-end
microservices. The consumer doesn’t need a service, since the CEP adapter
implements custom logic to discover all of the consumer instances programat-
ically (still through ZooKeeper, as we explained at the end of section 6.4.4).
Therefore, it will only have a replication controller. As far as producers are
concerned, we could have written replication controllers for them as well.
However, it is sufficient to deploy them as ”naked” pods since we deploy and
terminate them regularly in order to test how our application works. In order
68 Naum Gjorgjeski
to inject custom events from the front-end, we also expose the producers for
the fire detection and the shoplifting scenarios, as shown later. All service
and replication controller files are located in the Services and Replication
Controllers map of our GitHub project. They are implemented similarly
as the replication controllers and services in [2]. We should have a look at
consumer’s replication controller, as it is a little specific:
ap iVers ion : v1
kind : R e p l i c a t i o n C o n t r o l l e r
metadata :
l a b e l s :
app : kafka−e n t i t i e s
component : consumer
name : consumer
spec :
r e p l i c a s : 1
template :
metadata :
l a b e l s :
app : kafka−e n t i t i e s
component : consumer
spec :
c o n t a i n e r s :
− name : consumer
image : gcr . i o / io t−micros /consumer : l a t e s t
env :
− name : MY POD IP
valueFrom :
f i e l d R e f :
f i e l d P a t h : s t a t u s . podIP
The kind object tells Kubernetes to which building block this file refers to.
In the metadata object we specify the name of the replication controller and
Bachelor’s thesis 69
the app and component objects, used for targeting by other resources. Then,
in the spec object we specify the number of replicas we want to maintain.
The spec object nested in the template object specifies the pod. It tells
Kubernetes it should build the pod from the latest version of the gcr.io/iot-
micros/consumer image. The name of the image is actually the tag that we
used in order to build the image and then push it to the Google Container
Registry [24]. To this point, the other replication controllers are similar. The
environment variable MY POD IP is specific for the consumer replication
controller. It contains the IP of the pod in which an instance of the consumer
is deployed. We remember that the consumer has a @PostConstruct method,
in which it should register itself in ZooKeeper, so that the CEP microservice
can discover every available consumer instance. What the consumer actually
registers, is the IP of the pod in which the consumer instance is deployed. In
the @PostConstruct method, we take the value of this environment variable
and store it in ZooKeeper:
S t r ing endpointURI = System . getenv (”MY POD IP”) ;
// s t o r e the endpointURI in ZooKeeper
Finally, let’s deploy our microservices. We should first deploy the ZooKee-
per server and an Apache Kafka broker. Then, we can deploy the CEP and
front-end microservices. In order to do that, we move to the Services and
Replication Controllers map and run the following commands for each of the
four mentioned microservices:
kubect l c r e a t e −f <s e r v i c e−f i l e >
kubect l c r e a t e −f <r e p l i c a t i o n−c o n t r o l l e r−f i l e >
We can then deploy a consumer instance. We run only the second command,
as the consumer doesn’t have a service file. At the end, we can deploy a
producer as a ”naked” pod. We will show how to deploy and expose the
producer which is generating events for the fire detection scenario. The
procedure is the same for the other two producers. Assuming the image of
the producer is called gcr.io/iot-micros/fire-producer, we run the following
70 Naum Gjorgjeski
commands:
kubect l run f i r e−producer −−image=gcr . i o / io t−micros /
↪→ f i r e−producer
kubect l expose deployment f i r e−producer −−type=”
↪→ LoadBalancer ” −−port =8080
We are exposing the producer as a load balancer, since it is not important
in which producer instance (in case we deploy many producer instances) we
inject the custom events from the front-end. Instead of exposing it in this
way, we can also build a service from a file, as we did for some of the other
microservices.
6.6.3 The scenarios and the front-end
In the previous section, we created a service for the front-end component.
The following command would produce a list of the services we created:
kubect l get s e r v i c e s
In the output of this command, we can see the service for our front-end
microservice and we can read its external IP. We can then access the front-
end in a browser at <external-ip>:8080.
Test of the throughput and the latency of events
The front-end offers us visualizations of the number and the latency of events
in the last 10 seconds. We will see how to scale the consumer both manually
and automatically. First, we will scale the consumer manually, so that we
can see better how the number of consumer instances affects the scalability
of our application. Then, we will automate this process and the consumer
instances will be deployed or terminated automatically, depending on the
current load.
We can scale the consumer and the producer manually by running the
respective command:
Bachelor’s thesis 71
kubect l s c a l e rc consumer −−r e p l i c a s=<number−of−↪→ i n s tance s>
kubect l s c a l e deployment <name−of−the−producer> −−↪→ r e p l i c a s=<number−of−i n s tance s>
An alternative way to scale the consumer would be to change the number
of replicas in the replication controller file and then run a rolling update, as
described in the Kubernetes documentation [25].
In figure 6.7, we can see the front-end view. The red arrows represent a
change in the number of consumer instances, i.e. we deployed new or termi-
nated some of the existing consumer instances. The blue arrows represent
a change in the number of producer instances. Above the arrows, we can
see the number of instances for the respective deployment. The black arrows
represent that around that point of time the Kafka broker was rebalancing
the partitions of the topic, consequently causing a bigger latency. So, the
topic to which we were sending events has six partitions. This is specified
in the replication controller of the message broker. Since we wanted to test
the bottleneck for achieving scalability and elasticity, we wanted to keep the
other things simple. We have deployed one message broker. The partitions
of the topic were not replicated, i.e. they had a replication factor of one.
We started with three consumers and one producer. We tried to achieve
elasticity, so we destroyed two consumer instances. Meanwhile, we launched
one more producer instance. The first peak happened when we launched an
additional producer. At the same moment, the broker was rebalancing the
partitions of the topic and that caused the latency to grow. At that point,
the broker assigned all of the six partitions to our single consumer. Then,
we launched a third producer. Since one consumer couldn’t handle the large
amount of events, the latency started to grow. Therefore, we quickly de-
ployed two more consumer instances in order to handle the increased load
and achieve scalability. We get the second peak as a new rebalance of par-
titions was needed. This time, each consumer instance got two partitions of
72 Naum Gjorgjeski
the topic. However, we weren’t sure that we have achieved elasticity. There-
fore, when the latency decreased, we destroyed one consumer instance. After
the third rebalance, where each of our two consumer instances got three par-
titions to consume, we see that the latency decreases again. This time, we
have achieved both scalability and elasticity for the current load of events.
As we can see on the throughput graph below, the number of received events
increased each time we have deployed a new producer.
Now that we have seen how the number of consumer instances affects the
scalability of our application, we can automate the process of deployment
and termination of consumer instances. Assuming the name of the consumer
replication controller is consumer (it is specified in the replication controller
file of the consumer), we can run the following command:
kubect l a u t o s c a l e rc consumer −−max=6 −−cpu−percent=85
Since the minimum number of instances is by default one, we don’t have
to specify it explicitly. We specify that the maximum number of consumer
instances is six, as our topic has six partitions. Adding more consumers
wouldn’t help us, since our consumers follow the message queue pattern (each
partition is consumed by one consumer at a time). We also specify that a new
instance should be deployed, if the average CPU utilization in our consumer
pods exceeds 85% [21]. In general, the formula Kubernetes uses to calculate
the target number of pods at a certain moment is the following [19]:
TargetNumberOfPods =
ceil(sum(CurrentPodsCPUUtilization)/TargetCPUUtilization) (6.1)
So, our consumer will be scaled automatically depending on the current
load. In order to see the current autoscalers and the average CPU utilization
for each of them we can run the following command:
kubect l get hpa
A sample output is shown in figure 6.8.
Bachelor’s thesis 73
Figure 6.7: Front-end view of the test of throughput and latency of events
between the producers and the consumers. The arrows are added addition-
ally and are not part of the visualization. The red arrows represent a change
in the number of consumer instances. The blue arrows represent a change in
the number of producer instances. Above the arrows we can see the current
number of consumer or producer instances. The black arrows represent a re-
balance of the partitions, which occurs shortly after new consumer instances
are deployed or some of the existing consumer instances are terminated.
74 Naum Gjorgjeski
Figure 6.8: A sample output of the command kubectl get hpa.
In order to see the current number of consumer instances (assuming that
the replication controller is named consumer), we run the following com-
mand:
kubect l get rc consumer
We tested how autoscaling works in our application. We deployed two
producer instances, one by one. Since the CPU utilization got too high,
our autoscaler deployed two additional consumer instances, as shown in fig-
ure 6.9. In order to check the number of consumer instances we used the
above-mentioned command. Let us note that sometimes we have to wait for
a few minutes for the CPU utilization to stabilize. Therefore, after the de-
ployment of the second producer, we allowed the CPU utilization to stabilize.
The autoscaler maintained three consumer instances, which were capable of
handling the load from the two producers. Our application maintained low
and constant latency of the events coming from the two producer instances.
At the same time, the current CPU utilization didn’t allow the autoscaler to
try to terminate one consumer instance and scale down.
Figure 6.9: In order to handle the amount of events from two producer
instances, the autoscaler maintained three consumer instances.
So, the autoscaler maintained sufficient number of consumer instances and
achieved scalability. Then, we decided to terminate one producer instance
in order to see whether the autoscaler will try to achieve elasticity. We
Bachelor’s thesis 75
checked the number of consumer instances regularly and after a short time
the autoscaler terminated one of the three consumer instances. The front-
end view is shown in figure 6.10. We can see that the latency is constant,
except for the time when one of the consumer instances was terminated and
a rebalance of the partitions was needed.
Then, we terminated the one producer instance that was left. The au-
toscaler terminated one consumer instance as well, so that it maintained only
a minimum of one consumer instance.
Figure 6.10: After one producer instance was terminated, the autoscaler
terminated one consumer instance as well. When the consumer instance was
terminated, the broker had to do rebalancing. At that point, we see a peak
in the latency of events. The arrows and the text above and beyond them
are added additionally.
76 Naum Gjorgjeski
Fire detection
We can send both temperature and smoke sensor events through the front-
end, by specifying the room number and the respective measurement. Below
we get real-time notifications of the active fires.
The front-end view for the fire detection scenario is shown in figure 6.11.
For the rooms with IDs 3 and 7, we have sent events that contain high tem-
perature measurements and then events that contain a high level of smoke.
Since the time between the respective temperature and smoke sensor events
was less than 25 seconds, we got a fire alert for these two rooms. For example,
if we have sent an additional event for room 7, indicating a temperature of
60 ◦C, the temperature field in the existing notification would have changed.
Figure 6.11: Front-end view of the fire detection scenario. Through the
forms, we can send custom temperature and smoke sensor events. At the
bottom, we can see the current active fires.
Bachelor’s thesis 77
Shoplifting
Through the front-end, we can also send each of the event types for the
shoplifting scenario, by specifying the ID of a product. Below we get real-
time notifications of the stolen products. In figure 6.12, we can see how this
looks. We have sent an event that the product with id perfume-12345 is
removed from the shelf and an event indicating that the same product exits
the store. Because we didn’t send an event that the product is paid, we got
a real-time warning that the product is stolen.
Figure 6.12: Front-end view of the shoplifting scenario. Through the forms,
we can send each of the event types for the shoplifting scenario. At the
bottom, we can see the stolen products.
78 Naum Gjorgjeski
Chapter 7
Conclusion
IoT is a promising technology that may change the way we live. Since it is
relatively new, it faces many challenges. In order to realize the IoT vision,
achieving scalability is one of the key challenges to overcome. We can achieve
scalability if we integrate IoT solutions and cloud computing. Cloud com-
puting offers virtually unlimited storage and computational resources and is
a perfect fit for IoT applications.
Throughout the thesis, we saw that not all architectures are suitable for
cloud deployments of IoT applications. The traditional monolithic architec-
ture is resource inefficient, as the workload for different functionalities of IoT
applications depends on different factors. We want to achieve elasticity and
deploy, scale and manage the functionalities of an IoT application separately.
The microservices approach to building IoT solutions does exactly what we
want. We concluded that the microservices architecture is suitable for cloud
deployments of IoT applications.
Still, achieving scalability only in order to be able to store the stream of
events in a database is not sufficient. Most of the real-world scenarios (such as
the fire detection and shoplifting scenarios we introduced) require real-time
actions. Complex event processing helps us process huge amount of events
on-the-fly. As the stream of events flows, we are able to extract information
from the events in real time. We saw that different CEP techniques can be
79
80 Naum Gjorgjeski
used for event analysis.
In order to show all of this in practice, we developed a practical exam-
ple. We showed how we can achieve scalability by changing the number of
consumer instances, both manually and automatically. We developed three
scenarios and used Esper to analyze the events for all of the scenarios in
real time. We used Google’s public cloud and deployed our application in a
Kubernetes cluster in the Google Container Engine. We described how the
microservices of our IoT application are developed and how to build Docker
images in order to containerize our microservices. At the end, we managed
our microservices through Kubernetes and tested our scenarios through the
front-end.
Further work might be the integration of Esper with Apache Storm, which
would distribute the complex event processing done by Esper. Some scenarios
require later processing of events as well. Therefore, another upgrade could
be to introduce an option to store the events in a database and extensively
use cloud’s storage capabilities. In this way, we would be able to get both
real-time and retrospective information.
Bachelor’s thesis 81
82 Naum Gjorgjeski
Bibliography
[1] Apache Kafka documentation. https://kafka.apache.org/
documentation.html. [Accessed: 20. 7. 2016].
[2] Apache Kafka in a Kubernetes cluster. https://github.com/
CloudTrackInc/kubernetes-kafka. [Accessed: 31. 7. 2016].
[3] Cloud types: private, public and hybrid. http://www.asigra.com/
blog/cloud-types-private-public-and-hybrid. [Accessed: 18. 7.
2016].
[4] Complex event processing. https://en.wikipedia.org/wiki/
Complex_event_processing. [Accessed: 10. 3. 2016].
[5] Docker env. https://docs.docker.com/machine/reference/env/.
[Accessed: 28. 7. 2016].
[6] Docker image for Apache Kafka. https://hub.docker.com/r/
wurstmeister/kafka/. [Accessed: 31. 7. 2016].
[7] Docker image for Apache ZooKeeper. https://hub.docker.com/r/
digitalwonderland/zookeeper/~/dockerfile/. [Accessed: 31. 7.
2016].
[8] Dockerfile reference. https://docs.docker.com/engine/reference/
builder/. [Accessed: 29. 7. 2016].
[9] Esper reference. http://www.espertech.com/esper/release-5.4.0/
esper-reference/html_single/index.html. [Accessed: 15. 4. 2016].
83
84 Naum Gjorgjeski
[10] Esper: Solution patterns. http://www.espertech.com/esper/
solution_patterns.php. [Accessed: 29. 7. 2016].
[11] Event-driven architecture. https://en.wikipedia.org/wiki/Event-
driven_architecture. [Accessed: 20. 7. 2016].
[12] Event processing glossary. http://www.complexevents.com/2011/08/
23/event-processing-glossary-version-2/. [Accessed: 10. 3. 2016].
[13] Fast data: the next step after Big data. http://www.infoworld.com/
article/2608040/big-data/fast-data--the-next-step-after-
big-data.html. [Accessed: 5. 3. 2016].
[14] From monoliths to microservices: An architectural strategy. http://
thenewstack.io/from-monolith-to-microservices/. [Accessed: 11.
7. 2016].
[15] Get started with Docker Machine and a local VM. https://docs.
docker.com/machine/get-started/. [Accessed: 27. 6. 2016].
[16] Getting started with sample programs for Apache Kafka 0.9.
https://www.mapr.com/blog/getting-started-sample-programs-
apache-kafka-09. [Accessed: 21. 7. 2016].
[17] Horizontal and vertical scaling. https://en.wikipedia.org/wiki/
Scalability#Horizontal_and_vertical_scaling. [Accessed: 11. 7.
2016].
[18] Internet of things. https://en.wikipedia.org/wiki/Internet_of_
things. [Accessed: 3. 3. 2016].
[19] Kubernetes: Autoscaling algorithm. https://github.com/
kubernetes/kubernetes/blob/master/docs/design/horizontal-
pod-autoscaler.md#autoscaling-algorithm. [Accessed: 4. 9. 2016].
[20] Kubernetes documentation. http://kubernetes.io/docs/. [Accessed:
2. 8. 2016].
Bachelor’s thesis 85
[21] Kubernetes: Horizontal Pod Autoscaling. http://kubernetes.io/
docs/user-guide/horizontal-pod-autoscaling/walkthrough/.
[Accessed: 4. 9. 2016].
[22] Kubernetes: Pods. http://kubernetes.io/docs/user-guide/pods/.
[Accessed: 3. 7. 2016].
[23] Kubernetes: Replication controller. http://kubernetes.io/docs/
user-guide/replication-controller/. [Accessed: 3. 7. 2016].
[24] Kubernetes: Replication controller operations. http://kubernetes.
io/docs/user-guide/replication-controller/operations/. [Ac-
cessed: 7. 8. 2016].
[25] Kubernetes: Rolling updates. http://kubernetes.io/docs/user-
guide/rolling-updates/. [Accessed: 15. 8. 2016].
[26] Kubernetes: Services. http://kubernetes.io/docs/user-guide/
services/. [Accessed: 4. 7. 2016].
[27] Kubernetes walkthrough. http://kubernetes.io/docs/hellonode/.
[Accessed: 5. 8. 2016].
[28] Microservice: Cloud and IoT applications force the CIO to cre-
ate novel IT architectures. http://analystpov.com/cloud-
computing/microservice-cloud-and-iot-applications-force-
the-cio-to-create-novel-it-architectures-25117. [Accessed:
10. 7. 2016].
[29] Microservices. http://martinfowler.com/articles/microservices.
html. [Accessed: 1. 7. 2016].
[30] Private vs. public cloud: What’s the difference? https:
//www.expedient.com/blog/private-vs-public-cloud-whats-
difference/. [Accessed: 5. 7. 2016].
86 Naum Gjorgjeski
[31] Running a multi-broker Apache Kafka 0.8 cluster on a single
node. http://www.michael-noll.com/blog/2013/03/13/running-
a-multi-broker-apache-kafka-cluster-on-a-single-node/. [Ac-
cessed: 24. 7. 2016].
[32] Smoke detectors. https://en.wikipedia.org/wiki/Smoke_detector.
[Accessed: 16. 8. 2016].
[33] The scale cube. http://microservices.io/articles/scalecube.
html. [Accessed: 15. 7. 2016].
[34] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph,
Randy H Katz, Andrew Konwinski, Gunho Lee, David A Patterson,
Ariel Rabkin, Ion Stoica, et al. Above the clouds: A Berkeley view of
cloud computing. 2009.
[35] Luigi Atzori, Antonio Iera, and Giacomo Morabito. The Internet of
Things: A survey. Computer networks, 54(15):2787–2805, 2010.
[36] Alessio Botta, Walter De Donato, Valerio Persico, and Antonio Pescape.
On the integration of cloud computing and Internet of Things. In Future
Internet of Things and Cloud (FiCloud), 2014 International Conference
on, pages 23–30. IEEE, 2014.
[37] Michael Chui, Markus Loffler, and Roger Roberts. The Internet of
Things. McKinsey Quarterly, 2(2010):1–9, 2010.
[38] Gianpaolo Cugola and Alessandro Margara. Processing flows of informa-
tion: From data stream to complex event processing. ACM Computing
Surveys (CSUR), 44(3):15, 2012.
[39] Matjaz B. Juric, Tilen Faganel. KumuluzEE: Building microser-
vices with Java EE. http://www.javamagazine.mozaicreader.com/
JanFeb2016#&pageSet=80&page=0. [Accessed: 20. 6. 2016].
Bachelor’s thesis 87
[40] Tilen Faganel. Ogrodje za razvoj mikrostoritev v Javi in njihovo skali-
ranje v oblaku. Diploma thesis, Faculty of Computer and Information
Science, University of Ljubljana, 2015.
[41] Bob Familiar. Microservices, IoT and Azure: Leveraging DevOps and
Microservice Architecture to Deliver SaaS Solutions. 2015.
[42] Martin L. Abbott, Michael T. Fisher. The art of scalability: scalable
web architecture, processes, and organizations for the modern enterprise.
Addison-Wesley, 2015.
[43] Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and
Marimuthu Palaniswami. Internet of Things (IoT): A vision, archi-
tectural elements, and future directions. Future Generation Computer
Systems, 29(7):1645–1660, 2013.
[44] Nikolas Roman Herbst, Samuel Kounev, and Ralf Reussner. Elasticity
in cloud computing: What it is, and what it is not. In Proceedings of
the 10th International Conference on Autonomic Computing (ICAC 13),
pages 23–27, 2013.
[45] Daniele Miorandi, Sabrina Sicari, Francesco De Pellegrini, and Imrich
Chlamtac. Internet of Things: Vision, applications and research chal-
lenges. Ad Hoc Networks, 10(7):1497–1516, 2012.
[46] D Robins. Complex event processing. In Second International Workshop
on Education Technology and Computer Science. Wuhan. Citeseer, 2010.
[47] Tomislav Vresk and Igor Cavrak. Architecture of an Interoperable IoT
Platform Based on Microservices.