This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee.
Project Acronym: SecureIoT
Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA)
Project Full Title: Predictive Security for IoT Platforms and Networks of Smart
Objects
DELIVERABLE D3.1 – Security Information
Storage and Analytics Infrastructure
Deliverable Number D3.1 Deliverable Name Security Information Storage and Analytics
Infrastructure Dissemination level Public
Type of Document Report
Contractual date of delivery 30/09/2018
Deliverable Leader AIT
Status & version Final-1.2
WP / Task responsible WP3/Task T3.1 (AIT)
Keywords: IoT Security, SecureIoT infrastructure, Data collection, Data
streaming, Big data analytics
Abstract (few lines): The document presents the infrastructure that will be used in
SecureIoT for the planned trials. Different technologies for its
various components are presented and arguments are given for
the selected ones.
Deliverable Leader: Athens Information Technology (John Soldatos, Sofoklis
Efremidis)
Contributors: John Soldatos (AIT), Sofoklis Efremidis (AIT), Daniel Calvo
Alonso (AOTS)
Reviewers: Sofianna Menesidou (UBI), Mariza Konidi (INTRA)
Approved by: George Koutalieris (INTRA)
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 2
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Executive Summary
This document presents the infrastructure that will be employed in SecureIoT. The
infrastructure is aligned with the overall architecture of the project and comprises a set of
interconnected components for collecting security data from the target IoT system, streaming
the data for storage and processing and applying predictive IoT security analytics techniques for
the timely detection of any security issues at the target IoT system and also the visualization of
the collected data. The document presents requirements for the parts of the infrastructure and
arguments for the selection of its components.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 3
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Document History
Version Date Contributor(s) Description
0.10 04/6/2018 John Soldatos (AIT) Initial structure of the document for
discussion during the Bilbo meeting
0.11 08/6/2018 John Soldatos (AIT) Revised/updated structure
0.12 11/6/2018 John Soldatos (AIT) Updates based on feedback from ATOS
0.13 13/7/2018 Sofoklis Efremidis
(AIT) Updates on Chapters 2, 3, 4
0.15 13/7/2018 Sofoklis Efremidis
(AIT) Information fine-tuning in Chapters 2 & 3
0.16 27/7/2018 Sofoklis Efremidis
(AIT) Updates on Chapters 1 and 2
0.17 6/9/2018 Sofoklis Efremidis,
John Soldatos (AIT) Alignment with D2.4, Chapter 2
0.18 20/9/2018 Daniel Calvo
(ATOS)
IoT application data modelling, subsection
3.1.4
0.20 21/9/2018 Sofoklis Efremidis
(AIT) Updates to Chapter 4
0.21 22/9/2018 Sofoklis Efremidis,
John Soldatos (AIT) Content harmonization
0.90 24/9/2018
Sofoklis Efremidis
(AIT), John
Soldatos (AIT)
Document edits
0.91 25/9/2018 Sofoklis Efremidis
(AIT) Updates to Chapter 4
1.00 26/9/2018 Sofoklis Efremidis
(AIT) Document final edits
1.00 28/9/2018 Sofianna
Menesidou (UBI) Reviewer’s comments received
1.00 28/9/2018 Mariza Konidi
(INTRA) Reviewer’s comments received
1.10 28/9/2018 Sofoklis Efremidis
(AIT) Edits incorporating reviewers’ comments
1.20 29/9/2018 Sofoklis Efremidis
(AIT) Final document polishing for submission
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 4
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Table of Contents Executive Summary ......................................................................................................................... 2
Definitions, Acronyms and Abbreviations ...................................................................................... 6
1 Introduction ............................................................................................................................. 7
1.1 Scope and Purpose.......................................................................................................... 7
1.2 Background and Vision ................................................................................................... 7
1.3 Methodology ................................................................................................................... 8
1.4 Document Structure ....................................................................................................... 9
2 SecureIoT Data Storage and Analytics Requirements ........................................................... 10
2.1 The four Vs of SecureIoT BigData ................................................................................. 10
2.1.1 Data Characteristics .................................................................................................. 10
2.1.2 Data Types ................................................................................................................. 11
2.2 Data streaming requirements ....................................................................................... 12
2.3 Data analytics requirements ......................................................................................... 13
2.3.1 Simple Analytics – Rule-Based .................................................................................. 13
2.3.2 Machine Learning ..................................................................................................... 14
2.3.3 Deep Learning ........................................................................................................... 14
2.4 Alignment to WP2 and the SecureIoT Architecture ..................................................... 14
3 Information Streaming and Storage Infrastructure .............................................................. 20
3.1 SecureIoT Information Modelling ................................................................................. 20
3.1.1 IoT Assets Modelling ................................................................................................. 20
3.1.2 Attack Modelling ....................................................................................................... 21
3.1.3 IoT Security Data Modelling ...................................................................................... 22
3.1.4 IoT Application Data Modelling ................................................................................ 23
3.1.5 IoT Security Templates & Rulesets Modelling .......................................................... 28
3.2 Data Collection Infrastructure ...................................................................................... 29
3.3 Data Streaming Infrastructure ...................................................................................... 30
3.3.1 Request Reply ........................................................................................................... 31
3.3.2 Publish Subscribe ...................................................................................................... 32
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 5
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
3.3.3 SecureIoT Streaming Infrastructure ......................................................................... 34
3.4 Data Storage Infrastructure .......................................................................................... 37
4 SecureIoT Analytics Infrastructure ........................................................................................ 39
4.1 Data Analytics in SecureIoT........................................................................................... 39
4.1.1 Analytics Layers ......................................................................................................... 39
4.1.2 Data analytics requirements ..................................................................................... 39
4.2 Data Analytics Framework in SecureIoT ....................................................................... 39
4.2.1 Apache Hadoop ......................................................................................................... 40
4.2.2 Apache Spark ............................................................................................................ 40
4.2.3 SecureIoT Data Analytics framework ........................................................................ 41
5 Prototype Implementation and Demonstration ................................................................... 42
5.1 Data collection .............................................................................................................. 42
5.2 Data storage .................................................................................................................. 47
6 Conclusions ............................................................................................................................ 50
References .................................................................................................................................... 51
Table of Figures FIGURE 1: OVERVIEW OF THE SECAAS PARADIGM. ................................................................................................................. 15 FIGURE 2: OVERVIEW OF SECUREIOT ARCHITECTURE. .............................................................................................................. 16 FIGURE 3: ARCHITECTURE OF THE DATA COLLECTION AND ACTUATION LAYER. ............................................................................... 17 FIGURE 4: ARCHITECTURE OF THE SECURITY INTELLIGENCE LAYER................................................................................................ 18 FIGURE 5: OVERVIEW OF THE SECUREIOT INFRASTRUCTURE. ..................................................................................................... 19 FIGURE 6: ASSET MODEL. ................................................................................................................................................... 21 FIGURE 7: ATTACK MODEL. ................................................................................................................................................. 22 FIGURE 8: SECUREIOT PROBES COLLECTING AND HARMONIZING APPLICATION-DATA INFORMATION FROM MULTIPLE IOT PLATFORMS ...... 24 FIGURE 9: UML CLASS DIAGRAM FOR NGSI. ......................................................................................................................... 25 FIGURE 10: LOGICAL VIEW OF OPENMTC CONNECTOR FOR FIWARE ORION CONTEXT BROKER. .................................................... 25 FIGURE 11: NGSI-LD ONTOLOGY APPLIED IN AN EXAMPLE. ...................................................................................................... 27 FIGURE 12: SSN/SOSA CONCEPTUAL MODULES, CLASSES AND PROPERTIES FOR OBSERVATION PERSPECTIVE...................................... 28 FIGURE 13: OVERVIEW OF THE REQUEST-REPLY ARCHITECTURE. ................................................................................................. 31 FIGURE 14: OVERVIEW OF THE REQUEST-REPLY ARCHITECTURE WITH MULTIPLE CLIENTS. ................................................................ 32 FIGURE 15: OVERVIEW OF THE PUBLISH-SUBSCRIBE ARCHITECTURE. ........................................................................................... 32 FIGURE 16: OVERVIEW OF THE PUBLISH-SUBSCRIBE ARCHITECTURE WITH MULTIPLE DATA PRODUCERS AND CONSUMERS. ..................... 33 FIGURE 17: OVERVIEW OF BROKER ARCHITECTURE.................................................................................................................. 34 FIGURE 18: KAFKA PARTITIONS AND READ-WRITE OPERATIONS.................................................................................................. 35 FIGURE 19: OVERVIEW OF RABBITMQ ARCHITECTURE. ............................................................................................................ 36 FIGURE 20: OVERVIEW OF THE INFRASTRUCTURE SETUP. .......................................................................................................... 42
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 6
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
List of Tables NO TABLE OF FIGURES ENTRIES FOUND.
Definitions, Acronyms and Abbreviations
Acronym Title
AMQP Advanced Message Queuing Protocol
API Application Programming Interfaces
BRMS Business Rule Management System
CRUD Create Read Update Delete
CVSS Common Vulnerability Scoring System
DAG Directed Acyclic Graph
ECU Electronic Control Unit
ETL Extract Transform Load
ESB Enterprise Service Bus
ETSI European Telecommunications Standards Institute
ETSI CIM ETSI Context Information Management
ETSI NGSI-LD ETSI Next Generation Service Interfaces-Linked Data
GDPR General Data Protection Regulation
HDFS Hadoop Distributed File System
IIoT Industrial Internet-of-Things
IoT Internet-of-Thing
OEM Original Equipment Manufacturer
RDD Resilient Distributed Dataset
RDF Resource Description Format
SECaaS Security as a Service
SOSA Sensor, Observation, Sample and Actuator
SSN Semantic Sensor Network
W3C World Wide Web Consortium
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 7
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
1 Introduction 1.1 Scope and Purpose SecureIoT [1] plans to architect, implement, and demonstrate a standards-based open end-to-
end security framework for securing cross-platform, dynamic, and decentralized IoT systems.
The security framework will be aligned with international standards and initiatives and will
support the development, integration and deployment of cross platform IoT services that may
involve multiple dynamic autonomous and intelligent smart objects and devices. Security
support will be provided through select security services for risk assessment and mitigation, for
seamless development of secure IoT services, as well as for auditing and compliance, as a set of
add-on services (as opposed to built-in ones), following the Security as a Service (SECaaS)
paradigm. The SecureIoT framework targets application developers, platform providers,
solution integrators and IoT OEMs.
The SECaaS services will be heavily based on predictive IoT security functionalities, whereby
security related data that are generated from different nodes of the target IoT services will be
communicated to and continuously analyzed by an analytics engine implementing targeted and
sophisticated machine learning algorithms, which will provide continuous and timely
monitoring and alerting.
Security data are communicated through a scalable, flexible and efficient communication
infrastructure for further storage and analytics processing. The analytics engine makes use of
security related knowledge bases that contain templates and rules (relating to historical
security aspects of IoT services) that are applied to the monitored data, raising alerts when
security breaches to the target IoT service are suspected.
SecureIoT will conduct three trials demonstrating the IoT security services that will be
developed. This document presents the infrastructure that will be used in the course of the
project for setting up the project trials, and in particular the data collection, data transfer, data
storage, and the analytics parts of it.
1.2 Background and Vision The vision of SecureIoT is to secure the next generation of dynamic, decentralized, multi-
platform IoT systems, which may include intelligent and (semi)autonomous objects or things.
The project will realize this vision by providing a set of SECaaS services that will support the
operation of target IoT systems or deployments based on predictive analytics. The envisaged
services comprise
(a) Risk assessment and mitigation, by applying well established approaches like the NIST
Common Vulnerability Scoring System (CVSS) for identifying risks and providing solution
for their mitigation
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 8
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
(b) Compliance auditing and recommendations, by providing tools that provide support for
security and privacy controls at various levels of the IoT deployment, including controls
that pertain to the enforcement of regulations like GDPR, NIS and ePrivacy directives.
(c) Developers’ support through a set of programming language level annotations and their
mappings to runtime functionalities for monitoring and policy enforcement.
The runtime support functionalities of SecureIoT comprise the collection, and filtering of
security related data as well as use of predictive analytics techniques for proactively and timely
identifying security related issues for the target IoT deployment, like attacks and incidents.
1.3 Methodology The specification of the SecureIoT infrastructure follows the overall project architecture and the
requirements that have to be met for securing the target IoT services. The following steps were
followed for defining the infrastructure that will be put in place.
First, the nature and the properties of the collected security data are identified. Security related
data in SecureIoT are collected from different levels of the IoT service, namely, the devices,
edge, cloud, and duplication. The SecureIoT security framework supports cross platform
deployments of the IoT services, thereby collected security data may originate from a number
of platforms on which the target IoT service may be deployed. Practically, security data are
generated by probes that are associated with select nodes of the target IoT service.
Second, after the properties of the security data are identified, their schema is defined. The
schema is generic enough to accommodate the diverse types of data that may be generated by
a number of sources and may assume a number of forms.
Third, the infrastructure for the collection of the security data is specified. Collection of data
has to be very efficient, and flexible. It is accomplished through a number of probes that are
deployed along select nodes of the IoT system, which collect the data of interest from the IoT
nodes and communicate them to the analytics modules.
Fourth, the transfer and storage of the collected security data is specified. Transfer of large
amounts of data has to be very efficient and flexible and impose the least possible overhead to
the operation of the target IoT service. Moreover, data storage must also be efficient and
flexible and allow the query and processing of data that are characterized by their large
volumes and diversity of types.
For the specification of the infrastructure that will be put in place in the context of Secure IoT
some key requirements related to its performance, availability, and reliability are stated. In the
sequel some key platforms are presented and justification is given for the one that will be used
as part of the SecureIoT infrastructure.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 9
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
1.4 Document Structure The document is structured as follows: Chapter 2 presents the data storage and analytics
requirements for the platform. The chapter first lists the characteristics of the security data that
are collected by the target IoT system, i.e., the high volume, speed, diversity, and quality. It
then gives a generic model that intends to encompass all different types of collected data.
Based on the properties of the security data identified the chapter lists the requirements that
the infrastructure has to meet regarding their collection, streaming, storage and processing.
Finally, the architecture of the infrastructure that will be used is presented and is alignment
with the overall SecureIoT architecture is shown.
Chapter 3 presents the information streaming and storage infrastructure. The same chapter
presents also the information collection components of the infrastructure. Different
technologies are presented and arguments are given for the selection of those that meet the
project’s requirements and will be used for its trials.
Chapter 4 presents the analytics infrastructure. Similar to Chapter 3 different analytics
technologies are presented that are candidates to be part of the project’s infrastructure.
Arguments for the one that will be used in the project’ trials are also presented.
Chapter 5 gives a prototype implementation of the infrastructure that has been put in place.
The overall setup of the infrastructure is presented along with its different components and
their configurations. Sample runs with artificially collected data and corresponding screenshots
are also presented in that chapter. The infrastructure will be used for running the project trials
and will be updated according to their needs.
Finally, Chapter 6 concludes the document.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 10
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
2 SecureIoT Data Storage and Analytics
Requirements 2.1 The four Vs of SecureIoT BigData SecureIoT aspires to provide a set of security services for target IoT systems based on security
analytics. The functioning of these services will be based on the security data that are collected
from probes that are deployed along the target IoT system at different layers of the supporting
IoT platform, i.e., network, device, edge, cloud, and application layers, as well as across
different IoT platforms that may support the target IoT system and applications over it.
The security oriented data that are collected by the SecureIoT services possess similar
characteristics to Big Data [2] that are encountered in other contexts, in particular:
Volume: SecureIoT services make use of large volumes of security data. These data may
be either historical security data that are used to both train and fine tune the analytics
algorithms, or large volumes of security data that are collected during the operation of
the target IoT system from its different architectural layers. The latter are used to flag
out abnormal behavior of the target IoT system so as to take actions for guarding its
security.
Variety: Security data that are collected by the SecureIoT services come from a number
of feeds that are deployed along with the target IoT system. Different types of IoT
devices generate different types of security data. Moreover, different IoT architectural
layers (network, field, edge, cloud, application) generate different types of security data.
Finally, different IoT deployments also generate different types of security data.
Velocity: Security data are generated with different speeds by the deployed data
probes. Both streamed and non-streamed data are used by the analytics algorithms.
Moreover, streamed data are collected and transferred with varying speeds depending
on the required accuracy of the monitoring and reaction processes.
Veracity: The quality and trustworthiness of the collected security data is a prerequisite
for the quality of the results that will be produced by the analytics algorithms. Security
data that are collected by IoT deployments in the context of SecureIoT will be filtered
before they are processed so as to guarantee quality level for the security services to be
provided.
2.1.1 Data Characteristics SecureIoT services depend on data collected from a number of probes that are deployed along
a number of architectural layers of the target IoT system. In particular, security data are
collected from the network, the field devices, the edge devices, the cloud servers and the IoT
application itself and are fed to the analytics module that realizes the runtime services of
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 11
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
SecureIoT. The analytics module makes use of data analytics algorithms, which are trained and
tuned based on large amounts of historic security data.
Characteristics of the collected security data that are used for implementing the envisaged
security services are similar to those of big data in other contexts and include:
Large volumes: SecureIoT services make use of large volumes of both streaming and
non-streamed data, the former coming from the target IoT deployment while the latter
are used for training the analytics algorithms. Security data are collected from a large
number of elements of the target IoT system, including devices, edge servers, the cloud
and the IoT application itself. The precision and quality of the results produced by the
analytics algorithms depend on the volumes of processed data with larger volumes
allowing for better trained algorithms and more precise alarms generated. As a
consequence, the infrastructure that supports the IoT security services must be ble to
handle large volumes of collected data.
Large variety: The security analytics modules are fed with data coming from a variety of
probes that are deployed along the target IoT system. Different probes generate
different types of data, with different formats, resulting thus to a large variety of
generated data that must be transferred and fed to the security analytic modules.
Moreover, for the SecureIoT services to be widely applicable they must scale and be
adaptable to new types of IoT devices, which implies that they should allow new types
of generated data to be handled seamlessly.
Large speeds: SecureIoT services depend on real time generated data that are
generated in real time in order to provide alerts in real time incident identification
scenarios. Streaming data are generated continuously and with adjustable rate of
generation. These data are streamed to the analytics modules through well-established
platforms like the Apache Kafka.
Veracity: SecureIoT data are generated by probes along the IoT deployment. The
generated data are typically an exact representation of the current security state of the
various IoT components. The SecureIoT services make use of technologies that
guarantee the integrity of the generated data and as a consequence their quality when
they reach the analytics modules.
As a consequence, the infrastructure that will support the security services of SecureIoT should
be able to handle with efficiency, flexibility, and reliability the collection, transfer, storage and
processing of the security data collected from the target IoT system.
2.1.2 Data Types
SecureIoT services are based on security data that are collected from a number of different
components of IoT deployments. In particular, field devices, edge devices, and cloud based
servers generate different types of security data that are subsequently fed into the security
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 12
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
analytics modules that implement the SecureIoT services. Moreover, application specific
security data are generated by the IoT application components.
The variety of the different types of data that may be generated from components of an IoT
deployment dictates an equal number of corresponding specifications. Such an approach is
does not scale well when new IoT components are introduced resulting into new data
specifications. Moreover, the interfaces through which these data will be communicated need
be enhanced and adapted to the new types of data. Alternatively, a generic and normalized
data representation may be defined along with mappings with the different native
representations. The advantage of this approach is that data type specificities are contained at
the points where data are produced or consumed. Following the latter approach a generic data
model (or type) for the security related data may be defined as follows:
type securityData { String sourceID; enum {perf, status, usage, alert, …} typeOfData; DateTime timestamp; HashMap<String, Object> properties; HashMap<String, Object> data; String comments; String reserved;
}
This is a rather generic data model that intends to encompass a wide variety of security related
data types in the context of SecureIoT. Specialization of the model may be defined for specific
purposes or project trials. The various fields of this abstract data model are rather self-
explanatory. The properties field in the model above is intended to capture properties or
attributes of the data source entity. For example, properties like if a device is mobile or not and
its current location may be included in this field. Infrastructure specific properties may also be
included in this field. For example, as will be explained in later chapters Apache Kafka will be
used in the context of the project as the data streaming component of its infrastructure. Kafka
defines the concept of topic to partition data to thematic queues. In such a case the pair
<”topic”, “topic name”> will appear in the properties field. This approach makes the data model
independent of its infrastructure. If, for example, at a later stage Apache Kafka is replaced by
another streaming engine, the data model will remain valid.
2.2 Data streaming requirements Collected security data need be streamed to the persistency and analytics modules of the
SecureIoT platform. These data carry information that is directly related to the security of the
target IoT system, for example, unauthorized attempts for remote access, unauthorized
attempts for software updates, deviating behavior of system components and so on. Based on
these data and the applicable rules the analytics modules will raise alarms for early detection or
security related issues to the target IoT system. As early detection of such issues constitutes a
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 13
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
core functionality of the SecureIoT services and as such it is highly important, the streaming of
these data and as a consequence the components of the SecureIoT infrastructure that will
implement it has to meet some stringent requirements.
High volume: SecureIoT data streaming has to be able to cope with high volumes of data.
Generated security data come from different levels of the target IoT system, the IoT
application itself as well as the IoT platforms on which the application is deployed.
Fast transfer: security data have to reach their destination as fast as possible so as to allow
for early detection of security issues at the target IoT system. Combined with the high
volume requirement for security data implies that the SecureIoT data streaming
component has to support very high throughput.
Reliable communication: security data have to be delivered reliably to their destination.
This requirement implies that no such data are allowed to be lost while in transit.
Therefore, the data streaming component has to guarantee at least once semantics,
meaning that each datum will be delivered at least once, even in the presence of failures.
High availability: the services provided by the streaming component have to be highly
available to guarantee that downtime is minimal and security data are always delivered to
their destination.
2.3 Data analytics requirements The security analytics of the SecureIoT services make use of complementary technologies to
flag security breaches of the target IoT system. Both rule based and machine learning analytics
techniques are used for the purpose as exemplified in the sequel.
2.3.1 Simple Analytics – Rule-Based Rule based decision making are the simplest form of analytics. Their function is based on static
rules that specify what action to be taken when some conditions are met on a stream of inputs.
The rules have the form
rule: condition → action
where condition is a predicate on some input values. Conditions may also be time dependent,
so that values that appeared in the past can be expressed through appropriate predicates. In
addition, predicates on sequences of values may also be specified, for example, a set of
constantly increasing values for a certain duration.
A rule engine monitors a stream of inputs coming from a variety of sources like sensors,
devices, probes, etc. and tries to determine when conditions of the specified ruleset are met.
When a condition is met the corresponding rule fires and its action is performed. The most
widely used algorithm by rules engines is the Rete algorithm. Systems that implement such
functionalities are called Business Rule Management Systems (BRMS). Typically, network
intrusion detection systems are rule and signature based that operate on the detected traffic at
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 14
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
flowing at the network boundary. Such detection is part of the functionality provided by the
SecureIoT SECaaS services.
It follows that in the context of SecureIoT the analytics engine should possess rule driven and
signature based functionality. For a given ruleset the engine should be able to monitor the data
stream coming from the target IoT system including network data and apply the rules, firing
those whose precondition is satisfied, or signatures are matched.
2.3.2 Machine Learning Machine learning is an approach to mechanized learning that is based on statistical techniques
that are applied on data. It provides for sophisticated approach to decision making since data
models, patterns, and rules are automatically extracted from already seen data and are
subsequently applied to new feeds of generated data. Two broad categories of machine
learning algorithms exist: (a) supervised learning, in which the given input and output data have
already been labelled and the objective is to discover the rules that map inputs to outputs, and
(b) unsupervised learning, in which there is no labelling of data, in which case the objective is to
discover rules and patterns from the given data.
Careful observation of patterns that appear in the dataset under processing may result into
extraction of rules that may be formulated given a set of metrics like the maximum desired
length of a rule, the confidence and support it has from the give dataset, etc. Several algorithms
have been described and appeared in the literature for rule extraction. Rule extraction is a
computationally intense process and for large datasets efficient parallelization of the algorithms
becomes mandatory.
In the context of SecureIoT the analytics engine should be able to support machine learning
algorithms for extraction of rules from large datasets and later apply the detected rules to
security data coming from the target IoT system. Such algorithms (e.g., [3]) typically require
large amounts of computing power they are parallelizable and are implemented in clusters.
2.3.3 Deep Learning
Deep learning is a special case of machine learning which uses an array of layers comprising
nonlinear processing elements for feature extraction and transformation of the input data. The
layers are arranged so as each layer receives as input the output of the previous layer. The
learning process itself can be either supervised or unsupervised. Deep learning approaches
include deep neural networks, deep belief networks and recurrent neural networks. Deep
learning algorithms for detecting security events are still at research level. [4] presents an
approach to using Deep Learning techniques for learning unknown intrusions to networks.
2.4 Alignment to WP2 and the SecureIoT Architecture This section presents shortly the SecureIoT architecture to which the infrastructure for
supporting the SECaaS services is aligned. The SECaaS services are offered to IoT
systems/platforms owners or operators, in-line with the paradigm shown in Figure 1. Security
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 15
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
data are collected from the target IoT system that may be deployed on a number of IoT
platforms by the SecureIoT platform. The SecureIoT platform in turn provides a number of
services like Risk Assessment, Compliance and Auditing, and Developer’s support to the IoT
system operators, deployers, and developers. Moreover, the platform provides runtime
support during the operation of the target IoT system by monitoring collected security data and
generating alerts when security issues are detected at the target IoT system or generating
visualizations of them.
Figure 1: Overview of the SECaaS paradigm.
The SecureIoT architecture comprises a set of layers that communicate with well defined
interfaces. The layers as detailed in [5] and shown in are as follows:
IoT systems layer, which comprise the target IoT system
Data collection and actuation layer, which is responsible for interacting with the
components of the target IoT for collecting data and configuring its elements
Security intelligence, which is responsible for analysing the collected data by employing
data analytics and machine learning techniques and detecting security related issues
Security services (SECaaS), namely Risk Assessment, Compliance Auditing, and Developer
support.
Security use cases, which applies the security services to three representative scenarios
Figure 2 shows an overview of the SecureIoT architecture.
SecureIoT Platform
Risk Assessment Compliance Auditing
Alters Automation
IoT platform #1
Data Collection
IoT platform #N
Data CollectionCross-Platform & Cross-Vertical
SECaaS…….
Single Platform SECaaSSingle Platform SECaaS
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 16
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 2: Overview of SecureIoT architecture.
The architecture of the data collection and actuation layer is shown in Figure 3. Details of the
components of the layer are presented in [5].
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 17
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 3: Architecture of the data collection and actuation layer.
Figure 4 shows the architecture of the security and intelligence layer. Details of the components
of the layer are presented in [5].
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 18
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 4: Architecture of the security intelligence layer.
The realization of the SecureIoT architecture will be supported by an infrastructure that will be
put in place and used in the context of the project’s trials. The main components of the
infrastructure concern:
Collection of security data from select nodes of the target IoT deployment, the IoT
platforms that support the IoT application and the IoT application itself.
Streaming of security data from their source to the analytics engine.
Processing and storage of the collected security data using data analytics techniques for
identifying and flagging any security issues at the target IoT system and application as
well as visualizing them.
The overall architecture of the infrastructure is shown in Figure 5. The infrastructure is
compatible with the Secure IoT architecture that is shown in Figure 2. In which the data
collection, data streaming and data processing components are shown. In addition, in Figure 5
the technologies that will be used in the context of the project are shown. These technologies
are presented in the following sections and their selection is justified.
IoT Systems (Platforms &
Devices)
FieldNetwork
FieldDevice
Edge
Cloud
App Intelligent(Context-
Aware)Data
Collection
Actuation & Automation
Open APIs
IoT Security Template Extraction (Analytics)
Template Execution
Engine(e.g., Rule
Engine)
Global Storage(Cloud)
(SecureIoT Database + Probes Registry)
IoT Security Templates Database
Templates
ContextualizationEngine
IoT Security Knowledge Base
Security Policy Enforcement Point
WP4
Open APIs
WP3Management &
Configuration ToolsVisualization (Dashboards)
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 19
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 5: Overview of the SecureIoT infrastructure.
Elastic Beats
Elastic Beats
Elastic Beats
Apache Kafka
Apache Spark
Elastic Search
Data Collection IoT probes
Data Streaming
Data Storage Data Analytics
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 20
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
3 Information Streaming and Storage
Infrastructure 3.1 SecureIoT Information Modelling 3.1.1 IoT Assets Modelling
An asset is defined as a physical or logical object owned by or under the custodial duties of an
organization, having either a perceived or actual value to the organization [6]. Assets can be
either material or immaterial, and include
Physical objects
Software
Document
Intellectual property (licenses, patents)
Humans
services
SecureIoT assets and their relationships are modeled as a graph, in which their properties are
represented. Each asset is modeled as a node of the graph, having a set of properties, while a
relationship is modeled as an edge of the graph having a single property. Nodes can have a
number of labels, while edges can have a single label. Labels are used to narrow searches and
navigations through the graph. Moreover, both graph nodes and edges can have multiple
properties, which are represented as [key, value] pairs. Graph databases like neo4j may be used
for implementing the asset models of a target IoT deployment and navigating through them.
Figure 6 shows a graph model of the assets of a hypothetical company. It models a plant that
contains a bolting robot, which in turn comprises two sensors: a proximity sensor and a torque
one. The model shows also the relations between the various assets.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 21
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 6: Asset model.
3.1.2 Attack Modelling
[6] lists some of the most widely known attacks for IoT based systems. They include
Wired and wireless scanning and mapping attacks
Protocol attacks
Eavesdropping attacks (loss of confidentiality)
Cryptographic algorithm and key management attacks
Spoofing and masquerading (authentication attacks)
Operating system and application integrity attacks
Denial of service and jamming
Physical security attacks (for example, tampering, interface exposures)
Access control attacks (privilege escalation) As noted by the authors, most of these attacks are customised to a particular IoT system
vulnerability. A list like the one above cannot be fixed, as new attack types are expected to
appear and put into use in the future.
In addition to identified attacks, standard quantifications of their impact has been specified. [7],
the Common Vulnerability Scoring System (CVSS) provides an open framework for
communicating the characteristics and impacts of IT vulnerabilities by defining metrics and
allowing thus accurate estimate of their impact.
Plant
Name: “ManA” Addr: “Sunville” Size: 1000sqm
Robot
Name: “RobA” Manufr: “Kuka” Action: Bolting
Sensor1
Name: “Sens1” Mnfr: “Mouser”
Action: Proximity
Sensor2
Name: “Sens2” Mnfr: “Kistler” Action: Torque
Company
Name: “BigC” Hq: “Main Str.”
Indus: “Automotive”
owns
Since: 1/1/2018 contains
Since: 1/1/2010
SetMinProximity
Setting: 2mm Reading: 1mm Ts: 25/6/2018
SetMaxTorque
Setting: 20Nm Reading: 18Nm Ts: 25/6/2018
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 22
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Attacks are classified using attack trees, which model how an asset may be attacked. Each node
of the tree models an attack and has children that model the sub-attacks that must be
performed for the attack to succeed. For example, an attack to an industrial robot may be
modelled with a tree as sown in Figure 7.
Figure 7: Attack model.
3.1.3 IoT Security Data Modelling
Security data are generated by the probes that are deployed along the target IoT system and
are fed to the analytics modules of the SecureIoT services. As stated above, a generic and
scalable approach to modeling security relevant data is to abstract away from the specificities
of the various devices, sensors, edge nodes, etc. of the target IoT system and allow for a generic
data type for representing security related data, as shown below
type securityData { String sourceID; enum {perf, status, usage, alert, …} typeOfData; DateTime timestamp; HashMap<String, Object> properties; HashMap<String, Object> data; String comments; String reserved;
}
This type of unstructured data falls in the NoSQL category for which technologies and tools
exist for their efficient manipulation and search. The ElasticSerach engine for example can
Attack a
robot
Penetrate
firewall Identify
robot
Send attack
code
Launch
attack
Install code Execute
attack code
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 23
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
handle efficient indexing and searching of large volumes of data having types as above.
Moreover, the engine provides REST and HTTP interfaces for their manipulation.
{ “text”: “…”, “version”: “…”, “ts”: “2018-06-25T17:00:00.0.00Z”, “properties”: [ { “topic”: “…” }, {“location”: “…”}, …], “data”: [ { “key1”: “…” }, {“key2”: “…”}, …, {“keyn”: “…”}], “comment”: “…”, “reserved”: “…” }
3.1.4 IoT Application Data Modelling Interoperability at the application data level is one of the most challenging aspects that are
currently limiting the definite explosion of IoT technologies: although the number of use-cases
with clear, successful and consolidates business is growing exponentially, it is still difficult to
find relevant examples of deployments where the information gathered by the same groups of
sensors or devices is used to create advanced end-user services across multiple verticals or
domains. The IoT interoperability problem also introduces an additional complexity in the
potential co-creation of complex applications and services relying on devices or sensor
belonging or connect to multiple IoT platforms managed by different organizations, aka,
business domains. Thus, resolving the interoperability problem has become a very active
research field, with diverse approaches like federation based on semantics as it is proposed by
F. Carrez et. al. in [8].
Within SecureIoT, the use of application data may be critical in order to detect anomalies and
to implement predictive security services. As it was explained in SecureIoT D2.1 [9], a great
number of treats in the main application domains of IoT technologies do not imply attack
patterns that affect to the network traffic or to the communication protocol or software
vulnerabilities. For instance, in the case of the connected vehicle scenario, detecting a
compromised onboard Electronic Control Unit (ECU) shall be possible by checking possible
inconsistencies between correlated fields (e.g., speed versus acceleration versus gear) or even
by comparing the data received by multiple cars driving simultaneously through the same
route. Thus, SecureIoT must be able to collect, store and analyse application data generated at
the different tiers of the IoT stack, from field-level devices and smart objects to platform
components.
In this subsection, an analysis of some initiatives and solutions to address the interoperability
burden are presented. It is important to highlight the impact of this technological choice not
only from a technological perspective due to its consequences in the high-level components of
SecureIoT architecture; but also, from the business point of view, since the capacity to work
with as many solutions as possible or even to easily expand the compatibility to new systems
shall be essential for a successful exploitation strategy.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 24
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
In the present deliverable, the analysis is constrained to the presentation and review of the
alternatives that could be exploited by SecureIoT. A final decision will be taken and described in
the next version (D3.2), considering the progress and inputs of all the documents that must be
released in milestones MS2 (M9) to MS9 (M21).
In general, from a logical point of view, data probes deployed at the different tiers of an IoT
stack will collect application-level data from components of all the layers of the stack. SecureIoT
Global Storage component must contain harmonized information so the translation from the
corresponding data model will be done by the probes. This approach is showed in Figure 8.
Figure 8: SecureIoT probes collecting and harmonizing application-data information from multiple IoT platforms
3.1.4.1 FIWARE NGSI and data models
As it is also explained in SecureIoT D2.4 [5], the central component of FIWARE IoT platform is
the Context Broker, which must be deployed mandatory. The main role of the Context Broker is
the large-scale management of context information by means of the implementation of a Next
Generation Service Interface (NGSI). Detailed documentation for the Context Broker and NGSI
API can be found at [10] and [11].
FIWARE NGSI enables the virtual or digital representation of entities (e.g., a vehicle, a room in a
house or a device), which include multiple context attributes (e.g., speed, temperature,
humidity, etc.) and metadata. The attributes’ values may come from IoT devices and smart
objects but also from other sources like web-services, chatbots, IoT platforms or even humans.
A comprehensive diagram that illustrates NGSI classes is showed in Figure 9 (extracted from
[11]). JSON syntax is proposed to represent structured data based on the entity-attribute data
model.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 25
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 9: UML class diagram for NGSI.
In addition, FIWARE complement NGSI with a set of harmonized schemas, vocabularies or
ontologies that try to warranty portability and interoperability. FIWARE data models are
created leveraging on the experience acquired during real experimentation and large-scale
projects in fields like smart cities, transportation or environmental monitoring. They are also
strongly based on already existing standardization activities like Schema.org
(https://schema.org/) or SAREF (http://ontology.tno.nl/saref/). The complete list of FIWARE
data models is available at [12].
The development and deployment of adaptation or translation components is the main
mechanism to achieve interoperability in multi-platform interactions involving FIWARE based
systems. This approach is currently implemented by the IDAS (also known as IoT Agents)
FIWARE Generic Enabler to interconnect devices which use specific IoT communication
protocols (e.g., LoRaWAN, MQTT, CoAP) and data models (e.g., CayenneLpp, IETF CBOR,
LWM2M). Another representative example is the incubated Generic Enabler which provides
integration with OpenMTC (based on oneMTM) IoT middleware. A specific connector, showed
in the middle of Figure 10, has been developed to translate data to/from NGSI and to enable bi-
directional data flows between OpenMTC backend and FIWARE Context Broker [13].
Figure 10: Logical view of OpenMTC connector for FIWARE Orion Context Broker.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 26
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
3.1.4.2 ETSI Context Information Management (CIM)
The European Telecommunications Standards Institute (ETSI) has an Industry Specification
Group dedicated to working on cross-cutting Context Information Management (ISG CIM). The
first specification was released in April 2018 as at [14]. The working group includes reference
industrial organizations like Telefonica, Orange, NEC or British Telecommunications and the
European Commission as a counsellor. As it is stated on page 11 of [14], ESTI CIM “leverages on
the former OMA NGSI 9 and 10 interfaces and FIWARE NGSIv2 to incorporate the latest
advances from Linked Data”.
ETSI CIM aims to standardize the following aspects:
NGSI-LD: an information model to structure context information.
Possible architectures to use NGSI-LD API
NGSI-LD data representation based on JSON-LD.
NGSI-LD query language to retrieve entities and apply filters.
The specification of the API operations
The specification of the API HTTP binding.
In comparison with NGSI, the adoption of a Linked Data approach formalizes the representation
of relationships between entities and adds context useful information regarding the specific
ontology applied to each one of them. It must be noted that the fact of using JSON-LD syntax
makes possible a smooth transition from classical NGSI entities to the new format.
An example of NGSI-LD ontology and its instantiation to model a vehicle is included in Figure 11
(p. 19 of [14]).
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 27
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 11: NGSI-LD ontology applied in an example.
At the meta-model level, NGSI-LD introduces the Resource Description Format (RDF) concepts
of Properties and Relationships. At the cross-domain ontology, additional common properties
are defined: Geolocation, Temporal Property, unitCode. Also, possible values for the properties:
TimeInterval (used by Temporal Property) and Geometry (used by Geolocation). Finally, for
each domain it is possible to derive new entities (e.g., parking, street, gate or car), relationships
(e.g., adjacentTo, hasOpening) and properties (hasState, reliability).
On one hand, the main benefit of ETSI CIM / NGSI-LD with respect to the previous NGSI
approach will be the possibility of performing advanced queries that exploit the relationships
between entities, e.g., to get all the entities and attributes of all the vehicles of a parking.
On the other hand, it must be also taken into account that achieving good performance and
scaling RDF databases is a complex issue that may affect to the overall system. Moreover, from
a practical position, there is not yet an available implementation of a Context Broker that
supports NGSI-LD.
3.1.4.3 Semantic Sensor Network (SSN) Ontology
The SSN initiative of the World Wide Web Consortium (W3C) follows the same goal that is also
behind ETSI CIM / NGSI-LD: to apply Linked Data paradigm to the information collected by IoT
devices and to propose a common ontology. It proposes two different specifications: Semantic
Sensor Network (SSN) and Sensor, Observation, Sample and Actuator (SOSA) ontologies. The
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 28
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
latter one is a minimum version of the SSN like Schema.org vocabularies that has been designed
to simplify the adoption process.
Figure 12 (extracted from [15]) shows how SOSA and SSN specify multiple conceptual modules
(black boxes), classes and properties considering only the observation perspective. SOSA and
SSN elements are depicted in green and blue colour respectively. Similar diagrams are also
available for actuation and sampling perspectives.
Figure 12: SSN/SOSA conceptual modules, classes and properties for observation perspective.
Therefore, SSN/SOSA can be considered as an alternative to ETSI CIM / NGSI-LD although the
former one is more focused on IoT devices or sensors aspects (e.g., sensing mechanisms,
observations, etc.), what could be seen as a benefit in terms of the level of detail that could be
captured by SecureIoT data probes but also adds non-essential information that could slow
even more the performance of the queries. In fact, this problem has led to formulate
lightweight versions of SSN/SOSA [15].
3.1.5 IoT Security Templates & Rulesets Modelling
The SecureIoT Analytics module is the core component of the overall SecureIoT architecture as
it detects security related issues that may arise to the target IoT deployment. The two main
inputs to the Analytics module are streaming data generated from the deployed probes to
select target IoT nodes and contextualization data that are stored in permanent storage and
comprise security template and rulesets.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 29
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Rules can either be specified manually or extracted from collections of security related data
after employing machine learning techniques. Rules have the generic form
antecedent → consequent
The rule antecedent is the conjunction of predicates over the values of some security
attributes. It has the form
𝑃1⋀ … ⋀𝑃𝑛, 𝑛 ≥ 1
where each 𝑃𝑖 = 𝑃𝑖(𝑎𝑖1, 𝑎𝑖2
, … , 𝑎𝑖𝑖𝑛). The attributes that appear in the predicate are names of
security data that are generated by the SecureIoT probes. In practical terms they are keys of the
content field of the securityData model presented above. For evaluating a predicate, the values
that correspond to the names that appear in it are used.
When the rule antecedent is evaluated to true (i.e., all of the predicates are satisfied) then the
rule fires and its consequent is executed. The consequent may involve an action, like raising an
alert, setting a flag, creating a log entry, and so on. The Analytics module is responsible for
applying the specified or discovered rules to the input security data, executing the
corresponding consequent when a rule fires. In general, more than one rule may fire after
receiving input data; in such cases the consequents of all firing rules will be executed.
For performing its task, the Analytics module makes use of security templates, which contain
historical data, rules that may be extracted from them as well as conditions or exceptions for
applying the rules. For example, in a specific IoT deployment it may be normal to receive ten
SYNC messages within one minute. In this hypothetical scenario rules that specify the raise of
an alert under the given conditions should be ignored. In this sense the security template
specifies a context in which rules should be applied. Therefore, the templates specify the
context in which rules may be enabled or disabled.
3.2 Data Collection Infrastructure The SecureIoT SECaaS services are provided as add-ons to existing IoT systems, including legacy
systems or ones that have to meet strict efficiency or throughput requirements. Following the
SecureIoT architecture presented in D2.4 [5], security related data are collected from select
nodes of target IoT systems through probes or agents that are deployed along them. As the
collection of security data has to impose the least possible overhead to the nodes of the target
IoT system the data collection agents have to be lightweight and their interactions with the
target IoT system remain the least possible.
A consequence of this requirement is that data should be pushed to the SecureIoT analytics
engine as opposed to be pulled from it. Pulling data imposes the extra overhead on the server
to listen for incoming connection requests, establish new connections when a request comes,
and send data to the requesting client when requested to do so, as opposed to pushing data in
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 30
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
which case data are forwarded as they are generated making thus the data producer more
efficient and lightweight.
There are several lightweight technologies for collecting and pushing data. In the sequel two of
them are presented, Elastic Beats and Sematext Logagent.
Elastic Beats [17] is a lightweight open source and free platform for data collection and
forwarding probes or agents, which can be installed at different nodes of a distributed system,
including devices of an IoT system. Each Beat is a separately installable agent. The Beats API
specifies how data may be collected and be shipped to a data sink. There are a number of
predefined Beats as follows
Auditbeat: for data auditing, mainly from Linux based systems by communicating
directly with the Linux audit framework.
Filebeat: for forwarding and centralizing files and logs
Heartbeat: for detecting the availability of a server. It issues a request to one or more
URLs and waits for receiving the reply. It then reports on the aliveness of the sites along
with the response time
Metricbeat: for reporting on a set of metrics of a system, including CPU, memory
utilization, load balancing, and so on.
Packetbeat: for reporting network traffic
Winlogbeat: for reporting Windows event logs
In addition, custom Beats may be implemented by using the libbeat library.
Sematext Logagent [18] is a lightweight open source log shipper similar to Elastic Beats, and
more precisely to Elastic Filebeat. It provides support for log parsing, log routing, log
enrichment, and disk buffering of data and supports two way SSL authentication.
Logagent supports a number of inputs, for example files, streams, sockets, databases, as well as
filtering of input data. It can output to Apache Kafka, Elasticsearch, while the output filters
support aggregation of parsed data and data enrichment.
In the context of SecureIoT Elastic Beats will be used as the data collection platform the main
reason being that custom made Beats need be developed for the specialized nodes of the
target IoT system, IoT application, IoT platform for the collection of the pertinent security data.
3.3 Data Streaming Infrastructure This section presents the streaming infrastructure that will be used in SecureIoT. The security
services that are offered by SecureIoT depend on a highly distributed and decoupled
infrastructure. The target IoT system comprises several levels, from the low level devices and
smart objects to the supporting IoT platforms and the IoT applications. The communication and
data exchange between the various components is a core functionality that has to be
implemented efficiently, flexible and remain transparent to the target IoT system.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 31
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
A key observation is that messages that are collected from the nodes of the target IoT
deployment are actually events. For example, an attempt for a remote connection to a node
will typically generate an event. Similarly, a flow of SYN messages will result to the generation
of an event. As a consequence, the integration technology that will be used as part of the
SecureIoT infrastructure has to be event oriented. This requirement sets it apart from other
integration technologies like Enterprise Service Buses (ESB) and Extract-Transform-Load (ELT)
tools.
This section presents the streaming infrastructure that will be used in SecureIoT. It first
presents the two main approaches for communicating real time application level generated
data from a source to a destination in a distributed setting. In the context of SecureIoT security
data are communicated from the probes that are deployed with the target IoT system to the
analytics module for further processing. The two approaches presented are the request-reply
and the publish-subscribe. In the sequel it focuses on the Apache Kafka platform, giving some of
its capabilities and giving the main arguments for its selection as the platform that will be used
for communicating security data in SecureIoT.
3.3.1 Request Reply
In the request-reply model of communicating data between two entities A and B when A needs
some data that are produced by B, it makes a request to B and B replies with the data, as shown
in Figure 13 below. In practical terms, when client A needs some data from B, it opens a
connection to B and sends a request. B, on the other hand waits for client requests. When it
receives the next one, it prepares a response to be sent back to A.
Figure 13: Overview of the request-reply architecture.
For example, if B is a server that provides the current temperature in a specific area, when A
needs to know that temperature it will send a request to B and will do the same every time it
needs that temperature. Temperatures may change between A’s requests, but A maintains the
choice of issuing a request whenever it needs a piece of data from B, effectively receiving data
at its own pace. A does not know when new data are available at B. Therefore, if A is interested
in getting updates (at every temperature change, say) it may need to issue frequent requests,
possibly placing an overhead to the communications network and in several cases getting back
nothing new. On the other hand, if A decides not to place overhead to B by reducing the
frequency of requests to it, it runs the risk of missing some data updates.
A B
request
reply
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 32
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
In a more complex scenario when multiple (distributed) clients like A request data from a
number of servers like B, then each server B has to keep track of and satisfy each request,
which makes B’s logic more complicated. The corresponding architecture is shown in Figure 14.
Figure 14: Overview of the request-reply architecture with multiple clients.
3.3.2 Publish Subscribe
The publish-subscribe model of communicating data between two entities A and B introduces a
third entity in between them, the broker. Entities that are interested in new data coming from
B will register to receive such data with the broker. Whenever a new datum is published from B,
the broker gets to know about it and it duplicates and forwards the new datum to all entities
that have subscribed for data coming from B as shown in Figure 15.
Figure 15: Overview of the publish-subscribe architecture.
Continuing the example of the temperature server B, when a client A wants to receive new
temperatures from B, it will first register with broker R its interest to B’s data. When B produces
a new temperature, it will send the value to the broker and the broker will forward it to all
entities like A that registered with R their interest to B’s values. If multiple such entities exist
then B’s value will be replicated. In this scenario B simply announces a new value when such
value has been produced. Broker R already maintains a list of interested clients like A so it only
B B B B
B B B B
A R B
register
announce announce
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 33
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
copies and transmits A’s value to each one of them. In the publish-subscribe model A will only
receive data from B whenever B has produced something to be announced.
In a more complicated scenario in which multiple clients like A request data from B, B’s logic
remains simple. All interested entities register their interest with the broker and receive data
when are generated by B; there is no need to make repeated requests to B to receive new data.
B, on the other hand, generates data at its own pace without having to afford the overhead of
replying to individual requests by clients. The corresponding architecture is shown Figure 16. It
is evident that the publish-subscribe model of communicating data scales very well and much
better than then request-reply one.
Figure 16: Overview of the publish-subscribe architecture with multiple data producers and consumers.
The broker implements the core functionality for communicating data from producers to
consumers. Typically, it contains a routing component and a number of output queues as
shown in Figure 17. Data producers send their data to the routing component, which, in turn,
places the data into one or more queues, possibly replicating them. Consumers, on the other
hand, read data from the queue they have registered with. Depending on the platform, data
may persist in the various queues or be transient.
B B B B
A A A A
R
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 34
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Figure 17: Overview of Broker architecture.
SecureIoT security data producers are the probes that are deployed with the target IoT system,
which produce security related data according to the defined configurations and policies. As the
probes have to impose the least possible overhead to the target IoT system and remain
lightweight a publish-subscribe platform will be used for relaying data generated by them to
the analytics modules; Apache Kafka is a candidate such platform.
3.3.3 SecureIoT Streaming Infrastructure
The large amount of security data that are produced by probes that are deployed along select
nodes of the target IoT system has to be communicated in a flexible and efficient way for
subsequent storage and analytics processing. Therefore, efficiency is a key requirement for the
streaming infrastructure that will be put in place in the context of SecureIoT. Moreover, the
streaming infrastructure has to implement the publish-subscribe model of data communication,
whose advantages were emphasized in the previous subsections.
Security data that are produced by the nodes of the target IoT system are actually events, s
opposed to data that are exchanged between parts of a distributed system for its functioning.
They are generated when interesting things happen at the nodes of the target IoT system, for
example, an attempt for a remote connection, a component’s firmware update, detection of a
flood of SYN messages and so on. Event driven streaming infrastructures are clearly
distinguished from other integration solutions including ESBs and ELTs.
There exist a number of platforms that implement the publish/subscribe paradigm, for example
Apache Kafka, ZeroMQ, ActiveMq, JBOSS Messaging, RabbitMQ, and HornetMQ. The rest of this
section presents two of the most widely used streaming platforms, namely Apache Kafka and
RabbitMQ.
Apache Kafka [19] is a streaming platform for handling large volumes of event based data flows.
It is easily scalable and flexible for accommodating multiple data sources and destinations
effectively decoupling data producers from data consumers and can easily be integrated with a
Routing element P
rod
uce
d d
ata
Co
nsu
med
data
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 35
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
number of other technologies. Kafka has been designed and tuned for efficient, high
throughput, low latency, real time, and scalable streaming of large amounts of event based
data. The platform implements the publish-subscribe model of data streaming for
communicating data from multiple data producers to multiple data consumers. Kafka defines
the concept of topic, which is a subject of interest. Producers produce data for one or more
topics and consumers register their interest to one or more topics to receive data. Within a
topic data are partitioned and messages in each partition are ordered and timestamped.
Partitions are replicated and distributed over the nodes of the Kafka deployment cluster.
A key characteristic of Kafka is persistence of streaming data. When data are published to
Kafka, they are written to the filesystem and remain there for a configurable amount of time.
The advantage of this approach is that clients are able to reload parts of a data log, or new
coming clients to catch up by loading the whole history of the logged data. Moreover, clients
can read data independently of one another and each at their own speed. Data items are
indexed (starting at index 0) as shown in Figure 18 and reside in partitions that may be
replicated across the nodes of a cluster.
Figure 18: Kafka Partitions and read-write operations.
Apache Kafka was originated as an internal project in LinkedIn [20] implemented in Scala but
now is an open source stream processing platform under the Apache Software Foundation [21].
Benchmarking of the Kafka platform appears in [22].
Kafka provides its services over a number of APIs, as follows:
Producer API: Allows applications to produce streams of data
Consumer API: Allows applications to consume streams of data
Connector API: Allows the definition of connectors for read/write of data to other
applications.
Producer (write) Consumer 1 (read)
Consumer 2 (read)
0 1 2 3 4 …
Partition 0
Partition 1
Partition 2
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 36
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Streams API: Allows stateful processing of stream data, including operations like
filtering, mapping, aggregation, joins.
Kafka depends on Apache Zookeeper [23], which is a centralized service for maintaining
configuration information, and providing naming, distributed synchronization, and services.
RabbitMQ [24] is a data streaming platform similar to Apache Kafka that has been developed in
Erlang and implements a variety of messaging protocols including the Advanced Message
Queuing Protocol (AMQP). AMQP originated in JPMorgan Chase and is well tuned for
performance, scalability and reliability primarily for applications in the financial sector but also
ones of broader scope. A high level architecture of RabbitMQ is shown in Figure 19.
Figure 19: Overview of RabbitMQ architecture.
An exchange is a data router. Exchanges are bound to queues based on platform configuration.
Different types of exchanges are supported by RabbitMQ. Direct exchanges send data to a
specified output queue. Topic exchanges apply matching rules to the incoming data to decide
to which output queue will send it. Fanout exchanges copy and send data to all output queues
they are linked with. Finally, headers exchanges decide the output queue based on the data
header. Compared to Kafka, it does not provide persistency of streaming data. Instead, it makes
use of smart queues, which monitor data consumption by consumers, only retaining data as
long as needed before they are consumed.
There have been several comparisons and benchmarks between Kafka and RabbitMQ. [25]
provides a thorough such comparison and concludes with a guide that guides selection of one
or the other platform based on a number of criteria. In the context of SecureIoT where security
data are to be communicated both fast and reliably the most relevant criteria of those listed in
that paper are the following:
Very large system throughput
Very large throughput per topic
Exchanges Producers
Consumers Queues
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 37
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
At least once delivery semantics (in case of failures the platform guarantees that no data
gets lost)
High availability
Long term data storage may be desirable but not critical requirement for SecureIoT. The table
of the paper shows that under these requirements Kafka with replication should be the
platform of choice. Similar conclusions are reached by [26], where it is stated that Kafka should
be preferred over RabbitMQ when high data flows are expected, high availability is required
and guarantees against data losses.
Therefore, Apache Kafka will be used as the streaming platform for transferring security related
data from the probes to the analytics engine.
3.4 Data Storage Infrastructure Data that are collected from the target IoT system are permanently stored for subsequent
analysis and further training of the security analytics algorithms. Several alternatives are
examined below for the persistent storage of these data.
The simplest and most primitive form of permanent data storage is provided by plain files. Files
can easily store data sequentially by appending new data as they arrive. The disadvantage of
using plain files is that they provide no inherent support for searching or otherwise processing
data, except for sequentially scanning them. Nevertheless, some primitive structuring of
relatively small amounts of data may be provided by the files themselves. For example,
different files may be defined for containing data from different time periods or different types
of data, and this may be reflected to the names of the files themselves.
Typically, databases are employed for storing large amounts of data. Depending on the nature
of the data either SQL or NoSQL databases may be used. SQL databases are used for storage
and retrieval of structured data that can be modeled in tabular forms. Tables are used to store
either the data themselves or relations between data. Table columns represent data attributes
while each table row contains a data record. It follows that data that fit into this model must be
very well structured with each data record having a fixed number of attributes and each
attribute be of a specified type. Moreover, methodological approaches to structuring SQL
databases so as to remove redundancy and improve integrity allow no multivalued attributes.
NoSQL databases take a different approach by allowing storage and retrieval of non-structured
data. They have become popular in big data applications, which need to process large amounts
of data coming from different sources and be of different types. For such data with no
particular structure NoSQL databases are well suited as they provide for efficient storage,
indexing and retrieval of the data.
The following paragraphs present shortly two widely used NoSQL database systems, MongoDB
and ElasticSearch.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 38
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
MongoDB [27] is a widely used NoSQL database. It stores data as JSON like documents, which
means that documents may have arbitrary fields, which can be nested at arbitrary levels and
different documents may have different structure or fields while the document structure may
change over time. MongoDB supports indexing of documents to facilitate subsequent efficient
retrieval. Queries may be expressed by referencing document fields. For example, queries may
require fields to have specific values or fields whose values fall within a certain range or fields
with values that match a regular expression. MongoDB may be used for storing and handling
security data as they are generated from SecureIoT probes. MongoDB is a distributed NoSQL
database. It indexes documents and divides the indices into shards, i.e., horizontal partitions of
the data. Shards are maintained and replicated in different servers; for each shard one or more
replicas may be maintained. MongoDB may be used for holding security data in the context of
SecureIoT, as different IoT probes may generate different structures of security data, which
may be stored as MongoDB documents.
Elasticsearch [28] is a distributed search and analytics engine for JSON documents, as compared
to MongoDB which is primarily a document store database. It provides a REST API for real time
data collection and search. It supports both structured and unstructured data, numbers, text,
and geolocations, and achieves good efficiency by appropriately indexing them.
Elasticsearch is based on the popular Lucene information retrieval library. It supports the
modern architectural style of multitenancy, i.e., having a single Elasticsearch deployment
supporting multiple tenants as opposed to a single deployment per tenant. It is part of the
Elastic Stack integrated suite, which includes the data collection engine LogStash and the
analytics engine Kibana. Similar to MongoDB, the data are partitioned into shards, which are
replicated among servers.
Elasticsearch provides support to make applications GDPR compliant by incorporating a number
of features as follows [29]:
Access Controls: role-based access control, down to the field level, may be implemented
for ensuring that only authorized persons can access GDPR Personal Data in the
Elasticsearch cluster.
Monitor Access and Breaches: Elasticsearch audit and access logs may be combined with
machine learning and alerting jobs for access monitoring and breach detection.
Pseudonymization: the Logstash fingerprint filter may be used to replace personal data
with hashed values.
Encryption: TLS / SSL may be enabled for securing data in transit from snooping and
tampering.
Elasticsearch will be used as part of the SecureIoT infrastructure for storing security data that
are collected from SecureIoT probes.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 39
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
4 SecureIoT Analytics Infrastructure This chapter presents the analytics infrastructure that will be put in place for the support of the
SecureIoT services. The analytics module is a core component of the SecureIoT architecture and
is responsible for generating alerts when security issues with the target IoT system are
detected. It makes uses of custom made analytics algorithms that monitor security related data
collected by probes form the target IoT system and as well as templates and rulesets that are
maintained in a knowledge base.
4.1 Data Analytics in SecureIoT 4.1.1 Analytics Layers SecureIoT makes use of predictive security analytics for identifying security issues at the target
IoT system and the corresponding IoT application. The analytics components of the SecureIoT
architecture cooperatively provide the core real time functionalities of the SecureIoT platform.
The SecureIoT analytics components are distinguished in layers as follows:
Edge analytics components. They are lightweight components that are deployed at the
edge nodes of the target IoT system. These components implement simple analytics
functions like data aggregations, statistical calculations, or other application specific
calculations and they stream the results to the core analytics components. Streaming of
the results is done similar to the streaming of security data from other probes of the
SecureIoT platform. The data model presented in previous sections is generic enough to
accommodate data that are generated by the edge analytics components.
Core analytics components. The implement the predictive security analytics functions of
the SecureIoT services. As shown in the SecureIoT architecture in Figure 4, they make use
of large data sets and security templates to provide their services. The core analytics
components use an analytics platform on which they run.
4.1.2 Data analytics requirements
The major requirement for the data analytics framework that will be used in SecureIoT is its
efficiency. Security related incidents and issues that may take place at the target IoT system and
application, have to be detected as quickly as possible, given also that the analytics engine has
to process large amounts of security related data either stored or streamed.
4.2 Data Analytics Framework in SecureIoT There are a number of widely used platforms for big data that are available, several of them
being free and open source. This section gives a short overview of two of the most popular,
Hadoop and Apache Spark and gives arguments for the selection of the latter in the context of
SecureIoT.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 40
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
4.2.1 Apache Hadoop
Apache Hadoop [30] is a framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models. It started as a Yahoo project in
2006 and it later became an Apache open source project. Hadoop comprises three main
modules, the Hadoop Distributed Filesystem (HDFS), the coordination and scheduling module
YARN, and the MapReduce algorithm for processing large amounts of data. Hadoop is mainly
oriented towards batch processing of data, as it makes heavy use of HDFS.
The core of Hadoop is the MapReduce algorithm which is a programming paradigm for
processing big data sets that allows for parallelism and distribution. As the name implies,
MapReduce comprises a Map step that performs filtering tasks followed by a Reduce step that
performs aggregation tasks on the outputs of the Map step.
Hadoop is a highly fault-tolerant platform as it replicates data across many machines. Each file
is split into blocks, which are replicated across several machines, so that if a single machine
fails, the file can be rebuilt from other block replicas that reside in other machines.
Hadoop uses Apache Mahout [31] for data processing and machine learning. Mahout is a
distributed linear algebra framework that allows the implementation of distributed scalable
machine learning algorithms, mainly for collaborative filtering, clustering and classification, all
of which run on top of MapReduce.
4.2.2 Apache Spark
Apache Spark [32] was initiated at the University of California Berkeley and is now an Apache
project. It is a highly efficient analytics engine for large scale data processing, for either
streaming or stored data. Spark can run as a standalone platform or on a cluster on top of JVM.
Alternatively, it can run on top of Hadoop YARN. Spark allows writing of applications in Java,
Python, Scala, R, and SQL.
All data to be processed are maintained in memory, hence the high performance capabilities of
the platform. The same holds for any generated data. In memory data may be transferred to
permanent storage after the programmer explicitly programs the transfer. According to [32]
Spark performs 100 times faster than Hadoop. Further reports state that Spark won the 2014
Gray Sort Benchmark [33] (Daytona 100TB category), sorting 100TB of data in 23 minutes on a
cluster of 206 nodes, with the previous world record being 72 minutes, set by a Hadoop
MapReduce cluster of 2100 nodes.
Spark makes use of a replication technology by the name Resilient Distributed Dataset (RDD),
which is an immutable collection of data amenable to parallel processing that allows it to run in
a cluster and achieve fault tolerance capabilities. Along with RDD, Spark creates a Directed
Acyclic Graph (DAG) that models the relationships between data operations. The formation of
RDD and the corresponding graph gives Spark its fault tolerance capabilities. If some data is
corrupt, or a machine that hosts them fails, the data can be recovered from replicas residing in
other nodes.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 41
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Spark overcomes the limitations of Hadoop, which are based on the map-reduce model of data
processing, whose main drawback is that it imposes a linear way of processing data, something
that may be a limitation for certain types of application. Spark, on the other hand, allows the
development of iterative analytics algorithms that may provide for more sophisticated
processing of data. Spark includes libraries for SQL, streaming data, MLlib (a machine learning
library), and GraphX (a graph computation platform) support. It easily interfaces with a number
of other platforms including MongoDB, Elasticsearch, HDFS, and so on.
The MLlib is a scalable machine learning library that provides support for applications that are
oriented towards in memory data use.
4.2.3 SecureIoT Data Analytics framework
SecureIoT will use the Apache Spark as the platform for its core analytics component. The
argument in favor of Spark is its high speed and efficiency. As security related issues at the
target IoT system should be detected as quickly as possible, the speed of the analytics engine is
a high ranked requirement.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 42
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
5 Prototype Implementation and Demonstration This chapter presents a prototype implementation and demonstration of the storage and
analytics infrastructure that will be used for the SecureIoT trials, i.e., the components for data
collection, transfer, storage, and analytics processing, along with their configurations and setup.
The use and operation of the infrastructure is also presented. Sample data are generated and
their collection and subsequent transfer to the storage and analytics modules are
demonstrated.
Figure 20 shows an overview of the infrastructure setup. Data collection executes beats on
nodes in containers. Beats collect data and ship them to Logstash. Logstash transform data into
SecureIoT internal format, and finally sends them to both Elasticsearch and Kafka, where they
can be queried by static or dynamic analysis tools.
Figure 20: Overview of the infrastructure setup.
The following sections give details of the interfaces among components of the infrastructure.
5.1 Data collection Data collection has been implemented as a Spring Boot application. The component uses
MongoDB to maintain its internal state, which comprises (1) the metrics it can collect, (2) the
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 43
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
platforms and the nodes it collects metrics from, and (3) the collectors it has spawned for that
purpose.
Data collection exposes the following REST API that allows users to monitor different metrics on
different nodes that belong to different platforms.
Title Create a collector
Description Creates a collector for the given metric on the given node.
URL /collectors
Method POST
Request headers Content-Type: application/json
Request body { "metric": "...", "node": "..." } metric: The metric to collect. node: The node to collect the metric from.
Status code 201 (Created): The collector was created. 400 (Bad Request): The request was invalid (e.g. the metric was missing). 500 (Internal Server Error): The collector failed to be created.
Response headers Content-Type: application/json Location: …
Response body { "id": "...", "metric": "...", "node": "...", "status": "..." }
id: The ID of the collector. metric: The metric that the collector collects. node: The node where the collector collect the metric from. status: The status of the collector.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 44
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Example Request { "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0" }
Response { "id": "bb142fea-a242-4f7c-a6e1-e87a70099755", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "status": "stopped" }
Notes
The collector is only created; it is not started.
The status of the new collector is stopped.
Title Start a collector
Description Starts the collector with the given ID.
URL /collectors/:id/start
Method POST
Request parameters id: The ID of the collector.
Status code 204 (No Content): The collector was started. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be started.
Notes
The status of the collector must be stopped, before it can be started.
Once it has been started, the status of the collector is changed to running.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 45
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Title Stop a collector
Description Stops the collector with the given ID.
URL /collectors/:id/stop
Method POST
Request parameters id: The ID of the collector.
Status code 204 (No Content): The collector was stopped. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be stopped.
Notes
The status of the collector must be running, before it can be stopped.
Once it has been stopped, the status of the collector is changed to stopped.
Title Delete a collector
Description Deletes the collector with the given ID.
URL /collectors /:id
Method DELETE
Request parameters id: The ID of the collector.
Status code 204 (No Content): The collector was deleted. 404 (Not Found): The collector with the given ID was not found. 409 (Conflict): The collector was in an invalid state. 500 (Internal Server Error): The collector failed to be deleted.
Notes
The collector must be stopped, before it can be deleted.
Title Search for collectors
Description Searches for collectors that match the given criteria.
URL /collectors/search
Method POST
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 46
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Request headers Content-Type: application/json
Request body { "metric": "...", "node": "...", "platform": "...", "status": "..." } metric: The metric that the collector collects. node: The node where the collector collects the metric from. platform: The platform where the collector collects the metric from. status: The status of the collector.
Status code 200 (OK): Collectors were retrieved. 500 (Internal Server Error): Collectors failed to be retrieved.
Response headers Content-Type: application/json Location: …
Response body { "collectors": [ { "id": "...", "metric": "...", "node": "...", "status": "..." }, ... ] }
collectors: The collectors that match the given criteria. id: The ID of the collector. metric: The metric that the collector collects. node: The node where the collector collects the metric from. status: The status of the collector.
Example
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 47
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Request { "metric": "d6c25553-9bca-4334-b08b-eedd62155599" }
Response { "collectors": [ "id": "bb142fea-a242-4f7c-a6e1-e87a70099755", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "status": "stopped" ] }
Apart from the above endpoints, data collection provides also endpoints that allow users to
create, update, delete and search for platforms, nodes and metrics.
Each collector is currently implemented as a Beat with the appropriate configuration. For
example, a collector that collects system-level CPU usage from a server is Metricbeat deployed
on that server and configured to collect CPU usage. We are already experimenting with running
beats in containers.
All beats are configured to ship their data to Logstash, which in turn sends them to both
Elasticsearch and Kafka with the use of the corresponding output plugins. That way analysis can
be done both on data at rest (Elasticsearch) and on data in transit (Kafka).
More information about the data collection component can be found at
https://gitlab.atosresearch.eu/secure-iot/data-collection.
5.2 Data storage Data storage has been also implemented as a Spring Boot application. The component serves
as an abstraction layer over Elasticsearch.
Data storage exposes the following REST API that allows users to query stored data.
Title Query data
Description Queries stored data.
URL /collectors
Method POST
Request headers Content-Type: application/json
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 48
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
Request body { "query": ... } query: The query to execute.
Status code 200 (OK): Data were retrieved. 500 (Internal Server Error): Data failed to be retrieved.
Response headers Content-Type: application/json Location: …
Response body { "data": [ { "platform": "...", "node": "...", "metric": "...", "time": "...", "value": ... }, ... ] }
data: The data that match the give criteria. platform: The platform where the data were collected from. node: The node where the data were collected from. metric: The metric that the data are about. time: The date and time when the data were collected. value: The value.
Example Request { "query": { "match": { "node": "82a26f51-0695-4cbd-9736-81291f354fc0"
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 49
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
} } }
Response { "data": [ { "platform": "6b2f9297-6c5e-4859-b82d-da5dbbaabd3f", "node": "82a26f51-0695-4cbd-9736-81291f354fc0", "metric": "d6c25553-9bca-4334-b08b-eedd62155599", "time": "2018-08-25T08:00:00+000", "value": 5.00 } ] }
The endpoint currently accepts queries in the ElasticSearch Query DSL. We may reconsider that
approach in the next versions.
More information about the data storage component can be found at
https://gitlab.atosresearch.eu/secure-iot/data-storage.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 50
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
6 Conclusions This document presents the Security Storage and Analytics Infrastructure that will be put in
place for running the trials of SecureIoT. The infrastructure is aligned with the SecureIoT
architecture and as defined in D2.4 and comprises a number of open source components.
Requirements for the parts of the infrastructure are expressed and different alternative
technologies are presented. Based on the requirements, the document argues for selection of
the most appropriate technology that will be used in the context of the project. The last
chapter presents a prototype setup of the infrastructure based on the selected components
and gives some examples of its functioning. The infrastructure will be configured for running
the planned trials of the project and will be refined according to the needs. The final refined
version of the infrastructure will be presented in a follow-up document at the end of the
project.
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 51
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
References [1] https://secureiot.eu/
[2] https://en.wikipedia.org/wiki/Big_data
[3] Ioannis T. Christou, Emmanouil Amolochitis, Zheng-Hua Tan. “A Parallel/Distributed
Algorithmic Framework for Mining All Quantitative Association Rules”, April 2018,
https://arxiv.org/abs/1804.06764
[4] Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and Mansoor Alam. “A Deep Learning
Approach for Network Intrusion Detection System”, IEEE Transactions on Emerging
Topics in Computational Intelligence, 2018.
[5] SecureIoT “D2.4 – Architecture and Technical Specifications”. J. Soldatos and all, 2018.
[6] Brian Russel, Drew van Duren. “Practical Internet of Things Security”, Pact Publishing,
2016
[7] https://nvd.nist.gov/vuln-metrics/cvss
[8] F. Carrez, T. Elsaleh, D. Gómez, L. Sánchez, J. Lanza and P. Grace. “A Reference
Architecture for federating IoT infrastructures supporting semantic interoperability”,
2017 European Conference on Networks and Communications (EuCNC), Oulu, 2017,
pp. 1-6. doi: 10.1109/EuCNC.2017.7980765
[9] SecureIoT “D2.1 – Reference Scenarios and Use Cases”. K. Kalaboukas and all, 2018.
[10] FIWARE Orion Context Broker documentation in Read The Docs. https://fiware-
orion.readthedocs.io/en/master/index.html
[11] FIWARE NGSI API specification. http://telefonicaid.github.io/fiware-
orion/api/v2/stable/
[12] FIWARE data models. https://github.com/Fiware/dataModels
[13] T. Günter. “OpenMTC – An open source implementation of the oneM2m standard”,
FIWARE Global Summit, 2018, https://es.slideshare.net/FI-WARE/fiware-global-
summit-openmtc-a-open-source-implementation-of-the-onem2m-standard
[14] ETSI GS CIM 004. “Context Information Management (CIM); Application Programming
Interface (API)”,
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/004/01.01.01_60/gs_CIM004v010
101p.pdf
[15] Semantic Sensor Network Ontology. https://www.w3.org/TR/vocab-ssn/
[16] M. Bermudez-Edo, T. Elsaleh, P. Barnaghi and K. Taylor. “IoT-Lite: A Lightweight
Semantic Model for the Internet of Things”, 2016 Intl. IEEE Conferences on Ubiquitous
Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and
Communications, Cloud and Big Data Computing, Internet of People, and Smart World
Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, 2016, pp. 90-97.
doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0035
[17] https://www.elastic.co/products/beats
[18] https://sematext.com/logagent/
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
Page | 52
D3.1 – Security Information Storage and Analytics Infrastructure
Version: v1.2 - Final, Date 29/09/2018
[19] http://kafka.apache.org/documentation.html
[20] https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin
[21] http://apache.org/
[22] https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-
second-three-cheap-machines
[23] https://zookeeper.apache.org/
[24] http://www.rabbitmq.com
[25] Philippe Dobbelaere and Kyumars Sheykh Esmaili. “Kafka versus RabbitMQ: A
comparative study of two industry reference publish/subscribe implementations.
Industry Paper”. DEBS '17 Proceedings of the 11th ACM International Conference on
Distributed and Event-based Systems, pp. 227-238.
[26] Nicolas Nannoni. “Message-Oriented Middleware for Scalable Data Analytics
Architectures”, Master’s Thesis, KTH, Sweden, 2015.
[27] https://www.mongodb.com/
[28] https://www.elastic.co/
[29] https://www.elastic.co/gdpr
[30] https://hadoop.apache.org/
[31] https://mahout.apache.org/
[32] http://spark.apache.org/
[33] http://sortbenchmark.org/
Top Related