Towards Efficient and Lightweight Security Architecture ... · Towards Efficient and Lightweight...
Transcript of Towards Efficient and Lightweight Security Architecture ... · Towards Efficient and Lightweight...
Towards Efficient and Lightweight Security Architecture for Big
Sensing Data Streams
by
Deepak Puthal
M. Tech. (National Institute of Technology Rourkela)
A thesis submitted to
Faculty of Engineering and Information Technology
University of Technology, Sydney
for the degree of
Doctor of Philosophy
April 2017
i
To my family and friends
ii
CERTIFICATE OF ORIGINAL AUTHORSHIP
I certify that the work in this thesis has not previously been submitted
for a degree nor has it been submitted as part of requirements for a
degree except as fully acknowledged within the text.
I also certify that the thesis has been written by me. Any help that I have
received in my research work and the preparation of the thesis itself has
been acknowledged. In addition, I certify that all information sources
and literature used are indicated in the thesis.
Signature of Student:
Date:
iii
Acknowledgement
I sincerely express my deep gratitude to my principle coordinating supervisor, Prof.
Jinjun Chen, for his experienced supervision and continuous encouragement
throughout my PhD study. And I want to show my most honest appreciation to my
co-supervisors, Dr. Surya Nepal and Dr. Rajiv Ranjan from CSIRO, for their
supervision and encouragement. Without their consistent support and supervision, I
would not have been able to complete this thesis. I express my hearty gratitude to Dr.
Ranjan for his financial support, without him it may have been difficult for me to
travel to Australia for PhD study.
I thank the Commonwealth Scientific and Industrial Research Organisation
(CSIRO) for offering me a full Scholarship throughout my doctoral program. I also
thank University of Technology Sydney (UTS) and the Faculty of Engineering and
IT (FEIT) for providing me an IRS Scholarship throughout my doctoral program.
My thanks also go to staff members, research assistants, previous and current
colleagues, and friends at UTS, and CSIRO for their help, suggestions, friendship
and encouragement; in particular, Dr. Priyadarsi Nanda, Prof. Sean He, Eryani
Tjondrowalujo, Chang Liu, Xuyun Zhang, Chi Yang, Adrian Johannes, Nazanin
Borhan, Ashish Nanda, Jongkil Kim, Nan Li, Danan Thilakanathan, Mian Ahmed
Jan, and Usman Khan.
Last but not least, I am deeply grateful to my parents Karitk Ch. Puthal,
Shakuntala Puthal, my brother, sisters and brothers-in-law for supporting me to
study abroad, understanding, encouragement and help. Most importantly, I would
like to sincerely express the deepest gratitude to almighty god.
iv
Abstract
A large number of mission critical applications from disaster management to health
monitoring are contributing to the Internet of Things (IoT) by deploying a number of
smart sensing devices in a heterogeneous environment. Resource constrained
sensing devices are being used widely to build and deploy self-organising wireless
sensor networks for a variety of critical applications. Many such devices sense the
deployed environment and generate a variety of data and send them to the server for
analysis as data streams. The key requirement of such applications is the need for
near real-time stream data processing in large scale sensing networks. This trend
gives birth to an area called big sensing data streams. One of the key problems in big
data is to ensure end-to-end security where a Data Stream Manager (DSM) must
always verify the security of the data before executing a query to ensure data
security (i.e., confidentiality, integrity, authenticity, availability and freshness) as the
medium of communication is untrusted. A malicious adversary may access or
tamper with the data in transit. One of the challenging tasks in such applications is to
ensure the trustworthiness of collected data so that any decisions are made on the
correct data, followed by protecting the data streams from information leakage and
unauthorised access. This thesis considers end-to-end means from source sensors to
cloud data centre. Although some security issues are not new, the situation is
aggravated due to the features of the five Vs of big sensing data streams: Volume,
Velocity, Variety, Veracity and Value. Therefore, it is still a significant challenge to
achieve data security in big sensing data streams. Providing data security for big
sensing data streams in the context of near real time analytics is a challenging
problem.
v
This thesis mainly investigates the problems and security issues of big sensing
data streams from the perspectives of efficient and lightweight processing. The big
data streams computing advantages including real-time processing in efficient and
lightweight fashion are exploited to address the problem, aiming at gaining high
scalability and effectiveness. Specifically, the thesis examines three major properties
in the lifecycle of security in big data streams environments. The three properties
include authenticity, integrity and confidentiality also known as the AIC triad, which
is different to CIA triad used in general data security. Accordingly, a lightweight
security framework is proposed to maintain data integrity and a selective encryption
technique to maintain data confidentiality over big sensing data streams. These
solutions provide data security from source sensing devices to the processing layer
of cloud data centre. The thesis also explore a further proposal on a lattice based
information flow control model to protect data against information leakage and
unauthorised access after performing the security verification at DSM. By
integrating the access control model, this thesis provides an end-to-end security of
big sensing data streams i.e. source sensing device to the cloud data centre
processing layer. This thesis demonstrates that our solutions not only strengthen the
data security but also significantly improve the performance and efficiency of big
sensing data streams compared with existing approaches.
vi
The Author’s Publications
So far, I have published nine refereed papers including one book chapter, one IEEE
magazine, one ERA ranked A*1 journal paper, one ERA ranked A journal paper,
three ERA ranked A conference papers and one ERA ranked B conference paper and
other papers. The publications as well as one paper that is under review are listed
below in detail. The impact factor (IF)2 of each journal paper is also stated.
Book Chapter:
1. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "End-to- End
Security Framework for Big Sensing Data Streams." in Big Data Management,
Architecture, and Processing, CRC Press, to be published 2017.
Journal Articles:
2. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Dynamic
Prime Number Based Efficient Security Mechanism for Big Sensing Data
Streams." Journal of Computer and System Sciences (JCSS). Vol. 83(1), pp. 22-
42, 2017. (A*, IF: 1.583)
3. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "DLSeF: A
Dynamic Key Length based Efficient Real-Time Security Verification Model for
Big Data Streams." ACM Transactions on Embedded Computing Systems 1 ERA ranking is a ranking framework for publications in Australia. Refer to http://www.arc.gov.au/ era/era_2010/archive/era_journal_list.htm for detailed ranking tiers. The 2010 version is used herein. For journal papers: A* (top 5%); A (next 15%). For conference papers (no A* rank): A (top 20%). 2 IF: Impact Factor. Refer to http://wokinfo.com/essays/impact-factor/ for details and query.
vii
(TECS), Vol. 16(2), pp. 51:1-51:24, 2016. (A*, IF: 1.19)
4. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "Threats to
Networking Cloud and Edge Datacenters in the Internet of Things." IEEE Cloud
Computing. Vol. 3(3), pp. 64-71, 2016.
5. Deepak Puthal, Surya Nepal, Rajiv Ranjan, Xindong Wu, and Jinjun Chen.
"SEEN: A Selective Encryption Method to Ensure Confidentiality for Big
Sensing Data Streams." IEEE Transactions on Big Data (TBD), Minor revision,
February 2017.
Conference Papers:
6. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Synchronized
Shared Key Generation Method for Maintaining End-to-End Security of Big
Data Streams." in 50th Hawaii International Conference on System Sciences
(HICSS-50), Hawaii, USA. 2017. (A)
7. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "IoT and Big Data:
An architecture With Data Flow and Security Issues." in 2nd international
conference on Cloud, Networking for IoT Systems (CN$IoT), Brindisi, Italy,
2017.
8. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Secure Big
Data Streams Analytics Framework for Disaster Management on Cloud." in 18th
IEEE International Conferences on High Performance Computing and
Communications (HPCC 2016), Sydney, Australia. 2016 (B)
9. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Dynamic Key
Length based Approach for Real-Time Security Verification of Big Sensing Data
Streams." in 16th International Conference on Web Information System
Engineering (WISE 2015), Miami, Florida, USA. 2015. (A)
10. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "DPBSV – An
Efficient and Secure Scheme for Big Sensing Data Streams." in 14th IEEE
International Conference on Trust, Security and Privacy in Computing and
Communications (IEEE TrustCom-15), Helsinki, Finland. 2015. (A)
viii
11. Deepak Puthal, Surya Nepal, Cecile Paris, Rajiv Ranjan, and Jinjun Chen.
"Efficient Algorithms for Social Networks Coverage and Reach." in IEEE
BigData Congress, New York, USA, 2015.
ix
Table of Contents
Figures xiii
Tables xv
Algorithms xvi
Chapter 1 Introduction 1 1.1 Background ············································································ 1
1.1.1 Big Data with Security Issues ············································· 3
1.1.2 Cloud Computing ··························································· 5
1.2 Motivation: Securing Big Sensing Data Streams ································ 6
1.3 Overview of the Work ······························································· 9
1.3.1 Methodology ································································· 9
1.3.2 Contributions ······························································ 11
1.4 Thesis Organisation ································································ 13
Chapter 2 Background Studies and Related Work 15 2.1 General Research Trend ··························································· 15
2.2 Review of Reviews ································································· 17
2.2.1 Data Centre Security ······················································ 17
2.2.2 Network Security ·························································· 19
2.2.3 IoT Security ································································ 21
2.3 IoT Generated Data Stream Architecture ······································· 23
2.3.1 IoT Architecture ··························································· 23
2.3.2 Security Threats of Each Layer ········································· 28
2.4 Big Data Stream Security ························································· 38
2.4.1 Security Requirements ··················································· 40
2.4.2 CIA Triad Properties ····················································· 41
x
2.4.3 Confidentiality of Big Data Streams ··································· 42
2.4.4 Integrity of Big Data Streams ··········································· 45
2.4.5 Availability of Big Data Streams ······································· 51
2.5 Comparision ········································································· 56
2.6 Summary ············································································ 59
Chapter 3 Security Verification Framework for Big Sensing Data Streams 61 3.1 Introduction ········································································· 62
3.2 Preliminaries to the Chapter ······················································ 64
3.3 Research Challenges and Research Motivation ································ 65
3.3.1 Research Challenges ······················································ 66
3.3.2 Research Motivation ······················································ 67
3.4 Dynamic Prime-Number Based Security Verification ························ 70
3.4.1 DPBSV System Setup ···················································· 70
3.4.2 DPBSV Handshaking ···················································· 72
3.4.3 DPBSV Rekeying ························································· 72
3.4.4 DPBSV Security Verification ··········································· 74
3.5 Security Analysis ··································································· 76
3.5.1 Security Proof ····························································· 76
3.5.2 Forward Secrecy ·························································· 81
3.6 Experiment and Evaluation ······················································· 81
3.6.1 Sensor Node Performance ··············································· 82
3.6.2 Security Verification······················································ 83
3.6.3 Performance Comparision ··············································· 86
3.6.4 Required Buffer Size ····················································· 87
3.7 Summary ············································································· 88
Chapter 4 Lighweight Security Protocol for Big Sensing Data streams 89 4.1 Introduction ········································································· 89
4.2 Preliminaries to the Chapter ······················································ 92
4.3 Research Challenges and Research Motivation ································ 94
4.3.1 Research Challenges ······················································ 94
4.3.2 Research Motivation ······················································ 96
4.4 DLSeF Lightweight Security Protocol ·········································· 96
4.4.1 DLSeF System Setup ····················································· 97
4.4.2 DLSeF Handshaking ···················································· 100
xi
4.4.3 DLSeF Rekeying ························································ 101
4.4.4 DLSeF Key Synchronisation ··········································· 104
4.4.5 DLSeF Security Verification ·········································· 108
4.5 Security Analysis ·································································· 110
4.5.1 Security Proof ···························································· 111
4.6 Experiment and Evaluation ······················································ 115
4.6.1 Sensor Node Performance ·············································· 115
4.6.2 Security Verification····················································· 117
4.6.3 Performance Comparision ·············································· 120
4.6.4 Required Buffer Size ···················································· 121
4.7 Summary ············································································ 123
Chapter 5 Seletive Encryption Method to ensure Confidentiality of Big Sensing Data Streams 124
5.1 Introduction ········································································ 125
5.2 Design Consideration ····························································· 127
5.2.1 System Architecture ····················································· 128
5.2.2 Adversary Model ························································· 130
5.2.3 Attack Model ····························································· 131
5.3 Research Challenges and Research Motivation ······························· 132
5.3.1 Research Challenges ····················································· 132
5.3.2 Research Motivation ····················································· 134
5.4 Selective Encryption Method for Big Data Streams ························· 135
5.4.1 Initial System Setup ····················································· 136
5.4.2 Rekeying ·································································· 138
5.4.3 New Node Authentication ·············································· 139
5.4.4 Reconfiguration ·························································· 141
5.4.5 Enctyption/Decryption ·················································· 142
5.4.6 Tradeoffs ·································································· 143
5.4.7 Requirement Resources for SEEN····································· 144
5.5 Theoritical Analysis ······························································· 147
5.5.1 Security Proof ···························································· 147
5.5.2 Forward Secrecy ························································· 150
5.6 Experimental and Evaluation ···················································· 150
5.6.1 Security Verification····················································· 151
xii
5.6.2 Performance Comparision ·············································· 153
5.6.3 Required Buffer Size ···················································· 154
5.6.4 Network Performance ··················································· 155
5.7 Summary ············································································ 157
Chapter 6 Access Control Framework for Big Sensing Data streams 158 6.1 Introduction ········································································ 158
6.2 Background Studies ······························································· 161
6.2.1 Stream Processing························································ 161
6.2.2 Stream Security ··························································· 162
6.2.3 Chinese Wall Policy ····················································· 163
6.3 Design Consideration ····························································· 163
6.3.1 System Architecture ····················································· 163
6.3.2 Defination ································································· 166
6.3.3 QoS Requirements ······················································· 167
6.3.4 Adversary Model ························································· 169
6.4 Access Control Model ···························································· 170
6.5 Experimental Evaluation ························································· 173
6.5.1 System Setup ····························································· 173
6.5.2 Results Discussion ······················································· 175
6.6 Summary ············································································ 176
Chapter 7 Conclusion and Future Work 177 7.1 Conclusion ·········································································· 177
7.2 Future Work ········································································ 181
Bibliography 183
xiii
Figures
Figure 1-1 Typical Lifecycle of Security Framework for Big Sensing Data Streams 6
Figure 2-1 Cloud computing security architecture ······································ 19
Figure 2-2 Layer wise IoT Security architecture ········································· 22
Figure 2-3 layer wise IoT architecture from IoT device to cloud data centre ······· 26
Figure 2-4 Communication protocol in IoT ·············································· 28
Figure 2-5 Cloud computing security threats, attacks and vulnerabilities············ 38
Figure 2-6 CIA triad of data security either data in transit or in rest ·················· 41
Figure 3-1 A simplified view of a DSMS to process and analyse input data stream 62
Figure 3-2 Overlay of our architecture from sensing device to data centre ·········· 65
Figure 3-3 Pair of dynamic relative prime number generation ························ 68
Figure 3-4 The sensors used for experiment ·············································· 81
Figure 3-5 Estimated power consumption during the key generation process ······ 83
Figure 3-6 Scyther simulation environment result page ································ 84
Figure 3-7 Performance of the security scheme comparision ·························· 85
Figure 3-8 Performance comparison of minimum buffer size required ·············· 87
Figure 4-1 High level of architecture from source sensing device to big data
processing centre ······························································· 93
Figure 4-2 Secure authentication of Sensor and DSM ································· 100
Figure 4-3 Neighbour node discovered to get the key generation properties ······· 105
Figure 4-4 Neighbour discovery with all possible conditions ························ 107
Figure 4-5 Performance computation of two different sensors ······················· 116
Figure 4-6 Energy consumption by using COOJA in Contiki OS ···················· 116
Figure 4-7 Scyther simulation environment result page ······························· 118
xiv
Figure 4-8 Security verification results of Scyther during neighbour authentication
··················································································· 119
Figure 4-9 Performance comparison ······················································ 121
Figure 4-11 Efficiency comparison of minimum buffer size required to process · 121
Figure 5-1 High level architectural diagram for SEEN protocol ····················· 130
Figure 5-2 Initial authentication methods with 4 steps process ······················· 138
Figure 5-3 Key Selection ··································································· 139
Figure 5-4 Shared key management for robust clock skew ··························· 140
Figure 5-5 Method to the data sensitivity level ········································· 141
Figure 5-6 Scyther simulation result page of security verification ··················· 152
Figure 5-7 Performance comparison SEEN method ··································· 153
Figure 5-8 Efficiency comparison by comparing required buffer size ·············· 154
Figure 5-9 Energy consumption ··························································· 155
Figure 6-1 Overview of access control of big data streams using lattice model ··· 166
Figure 6-2 Lattice model for data access ················································· 171
Figure 6-3 Experiment Setups ····························································· 172
Figure 6-4 Mapping time for HT Sensor Dataset ······································· 173
Figure 6-5 Mapping time for Twin Gas Sensor Dataset ······························· 174
xv
Tables
Table 2-1 Network layer security threats ················································· 31
Table 2-2 Possible threats of IoT generated Big Dat streams in CIA triad
reprentation······································································ 57
Table 2-3 Comparison of IoT generated big data stream security threats and
solutions according to CIA triad method ··································· 58
Table 3-1 DPBSV Notations ································································ 69
Table 3-2 Notations Symmetric key (AES) algorithm takes time to get all possible
keys using most advanced Intel i7 Processor ······························ 77
Table 4-1 Notations used in this DLSeF model ·········································· 98
Table 5-1 SEEN Notations ································································· 135
Table 5-2 Performance and Properties of Security Solutions ························· 156
Table 5-3 Communication overhead of SEEN protocol ······························· 156
Table 6-1 Machine specification ·························································· 174
Table 6-2 Dataset information ····························································· 174
xvi
Algorithms
Algorithm 3-1 Security Framework for Big Sensing Data Stream ···················· 74
Algorithm 3-2 Dynamic Prime Number Generation ···································· 78
Algorithm 4-1 Synchronisation of Dynamic Key Length Generation ··············· 102
Algorithm 4-2 Key Generation (Rekeying) Process ···································· 107
Algorithm 4-3 Lightweight Security Protocol for Big Sensing Data Stream ······· 109
Algorithm 5-1 Rekeying process ·························································· 140
Algorithm 5-2 Selective encryption method for big sensor data streams ··········· 145
1
Chapter 1
Introduction
This chapter mainly introduces the research background and motivation, as well as a
brief summary of the work. Specifically, Section 1.1 briefly introduces the notions
of cloud computing and big data as research background knowledge. To motivate
the research, Section 1.2 provides a motivation to securing big sensing data streams.
Section 1.3 summarises the work and outlines its contribution. Finally, Section 1.4
presents the organisation of the thesis.
1.1 Background
Nowadays, we have entered into a big data era of petabytes. Big data is
widespread in both industry and scientific research applications where the data is
generated with high Volume, Velocity, Variety, Veracity and Value. It is difficult to
process using existing database management tools or traditional data processing
applications. Big data sets can come from many areas, including meteorology,
connectomics, complex physics simulations, genomics, biological study, gene
analysis and environmental research [1-2]. According to literature [1-2], since the
1980s, generated data doubles its size in every 40 months all over the world. For
example, in the year of 2012, there were 2.5 quintillion (2.5×1018) bytes of data
being generated every day. Currently, the data size is measured with exabyte, and in
the year 2015, there were around 10,000 exabytes of digital data being generated.
Following that digital data explosion, the size of big data is expected to surpass
2
40,000 exabytes in the year 2020 [1-3]. Hence, how to process big data has become
a fundamental and critical challenge for modern society. More and more research
interest and effort have been noticed under the theme of big data and its related
issues. In this thesis, research is concentrated on the data security and access control
[4-9] technologies for big datasets from modern sensing systems. Big data streams is
even gaining more research attention in recent days. Lots of applications including
healthcare, military, natural disaster etc. require stream data analysis to detect events.
There are several existing security solutions to protect data which can be classified
into two classes: Communication Security [10-12] and Server side data security [13-
16]. Communication security solutions protect data when data is in motion whereas
the server side security solutions protect data when it is at rest. Communication
security is primarily proposed to protect from network and communication related
potential attacks. The communication attacks are broadly divided into two parts i.e.
external attack and internal attack. In-order to avoid these potential attacks, security
solutions have been proposed for each TCP/IP layer. Server side data security is
mainly proposed for physical data centres, when data is at rest and accessed through
applications. There are several solutions proposed to secure the data for both data
communication and data stored in a server, but those are not necessarily applicable
for a big data stream environment.
In addition to the above two approaches, there is also a need to address security
aspects of big data streams. Stream data processing is a generally process to take
quick decisions or save lives in several critical applications as stated above. In such
situations, it is an important task to protect big data streams before they are evaluated
and control the access to authorised users and query processors only. Another major
motivation is to perform the security verification on near real time in-order to
synchronise with the processing speed of Stream Processing Engines (SPEs) [43].
Stream data analysis performance should not degrade because of the security
processing time, and there are several applications where we need to perform data
analysis in real time. According to the features of a big data stream (i.e. 4Vs) existing
security solutions need a huge buffer size to process security verification.
Cloud computing and big data, two disruptive trends at present, offer a large
number of business and research opportunities, and likewise pose heavy challenges
for the current information technology (IT) industry and research communities [17-
3
18]. The next section briefly introduces the notions of security issues in big data and
cloud computing.
1.1.1 Big Data with Security Issues
Big Data refers to the inability of traditional data architectures to efficiently handle
the new datasets [20]. Big data collection in applications has been growing
tremendously and getting increasingly complicated so that traditional data
processing tools are incapable of handling the data processing pipeline including
collection, storage, processing, mining, sharing, etc. within a tolerable elapsed time.
Big data is characterised by the broadly recognised 3Vs proposed by Douglas
Laney1: Volume, Velocity and Variety, and Veracity2. The 4Vs are detailed as
follows.
Volume (i.e. the size of the dataset): It refers to the huge amount of data
generated every second. About 90% of the world’s data has been generated
in the last two years. The high volume of data being generated and collected
daily creates an immediate challenge for real time processing in several
applications. The typical examples include emails, social media messages,
photos, video clips and sensing data that we produce and share every second.
This data explosion makes datasets too large to store and analyse using
traditional database technology.
Security: Retargeting traditional relational database security to non-
relational databases has been a challenge. An emergent phenomenon
introduced by Big Data variety that has gained considerable
importance is the ability to infer identity from anonymized datasets
by correlating with apparently innocuous public databases.
Velocity (i.e. rate of flow): It refers to the speed at which new data is
generated and the speed at which data moves to cloud data centre. The New
York Stock Exchange captures about 1 terabyte of trade information daily.
Reacting fast enough and analysing the streaming data is critical to
1 Douglas Laney, “3D Data Management: Controlling Data Volume, Velocity and Variety”,
Application Delivery Strategies, Gartner, February 2001 2 http://www.villanovau.com/university-online-programs/what-is-big-data/, accessed April, 2015
4
businesses, with speeds and peak periods often inconsistent. Big data
filtering technology should be able to analyse the data without accessing
traditional databases.
Security: As with non-relational databases, distributed programming
frameworks such as Hadoop were not developed with security as a
primary objective.
Variety (i.e. data from multiple repositories, domains, or types): It refers
to the different types of data that could be encountered. In the past we
focused on structured data that neatly fits into tables or relational databases.
In fact, 80% of the world’s data is unstructured, even heterogeneous data.
Therefore it cannot simply be put into tables or relational databases. For
example, big data sets can have data from images, graphs, video sequences
or social media updates at the same time. With big data technology we
should harness different types of data including messages, social media
conversations, photos, sensor data, video/voice data together.
Security: The volume of Big Data has necessitated storage in multi-
tiered storage media. The movement of data between tiers has led to a
requirement of systematically analyzing the threat models and
research and development of novel techniques.
Veracity: It refers to the data uncertainty and impreciseness. With many
types of big data, quality and accuracy are less controllable. For example,
twitter posts with hashtags, abbreviations, typos and colloquial speech. Big
data and analytics technology should allow people to work with all these
types of data. The volumes often make up for the lack of quality or accuracy.
Data stream: Uninterrupted flow of a long sequence of data, such as in
audio and video data files.
Big data is not only about the data characteristics themselves, but also about a whole
new big data technology architecture including new storage, computation models
and analytic tools, in search of appropriate problems in big data applications to solve.
Advances in big data storage, processing and analysis include new parallel and
distributed computing paradigms such as the Apache Hadoop ecosystem. These
technologies rapidly evolving in current days.
5
1.1.2 Cloud Computing
Nowadays, cloud computing is one of the most hyped IT innovations, providing a
new way of delivering computing resources and services and having sparked plenty
of interest in both the IT industry and academic research communities. Recently, IT
giants such as Amazon, Google, IBM and Microsoft have invested huge sums of
money in building up their public cloud products, and indeed they have developed
their own cloud products, e.g., Amazon’s Web Services3, Google Compute4, IBM
Cloud5 and Microsoft’s Azure6. Several corresponding open source cloud computing
solutions have also been developed, like Eucalyptus 7 , OpenStack 8 and Apache
Hadoop9. The core technologies that cloud computing is principally built on include
web service technologies and standards, virtualization, novel distributed
programming models like MapReduce [21], and cryptography.
The cloud computing definition published by the U.S. National Institute of
Standards and Technology (NIST) comprehensively covers the commonly agreed
aspects of cloud computing [22]. Accordingly, cloud computing is defined as a
model for enabling convenient, on-demand network access to a shared pool of
configurable computing resources (e.g. networks, servers, storage, applications,
services) that can be rapidly provisioned and released with the minimal management
effort or interactions with service providers. In terms of the definition, the cloud
model consists of five essential characteristics, three service delivery models and
four deployment models. Specifically, the five key features encompass on-demand
self-service, broad network access, resource pooling (multi-tenancy), rapid elasticity
and measured services. The three service delivery models are Cloud Software as a
Service (SaaS), e.g. Google Docs10, Cloud Platform as a Service (PaaS), e.g. Google
App Engine11, and Cloud Infrastructure as a Service (IaaS), e.g. Amazon EC2 and
S3 cloud services. The four deployment models include private cloud, community
3 http://aws.amazon.com/, accessed April, 2015 4 https://cloud.google.com/, accessed April, 2015 5 http://www.ibm.com/cloud-computing/au/en/, accessed April, 2015 6 http://www.azure.microsoft.com/en-us/, accessed April, 2015 7 https://www.eucalyptus.com/, accessed April, 2015 8 https://www.openstack.org/, accessed April, 2015 9 http://hadoop.apache.org/, accessed April, 2015 10 https://docs.google.com/, accessed April, 2015 11 https://appengine.google.com/, accessed April, 2015
6
cloud, public cloud and hybrid cloud, where hybrid cloud can contain the other three
types of cloud.
1.2 Motivation: Securing Big Sensing Data Streams
Data Stream Management Systems have been increasingly used to support a
wide range of real-time applications such as military applications and network
monitoring, battlefield, sensor networks, health monitoring, and financial monitoring
[23]. These applications need real-time processing of data streams, where the
application of the traditional “store-and-process” method is limited [24]. Most of the
above applications need to protect sensitive data from unauthorised accesses. For
example, in battlefield monitoring, the position of soldiers should only be accessible
to the battleground commanders. Even if the data is not sensitive, it may still be of
commercial value to restrict their accesses. Another example is real-time health
monitoring applications. Here privacy protection of personal health data is crucial. A
patient may be living at home with a monitoring device attached to him, which can
detect early health abnormalities and transmit alert signals to relevant personnel.
However, the patient may prefer only certain users, such as his doctor or a nurse, to
Figure 1-1: Typical Lifecycle of Security Framework for Big Sensing Data
Streams
7
have access to his streaming data and prevent access to any third-parties (e.g.
insurance companies or other hospitals). Only if his vital signs go far above the
norm and he is in imminent danger, needing urgent care, would the closest hospital
gain access to his streaming data. As a result, a new security verification module
needs to be developed to clean and drop modified/unwanted data before data streams
are evaluated in SPEs.
SPEs deal with the specific types of challenges and are intended to process data
streams with a minimal delay [23, 25-27]. In SPEs, data streams are processed in real
time (i.e. on-the-fly) rather than batch processing after storing the data in the cloud as
shown in Figure 1-1. The above specified applications require real-time processing of
very high-volume and high-velocity data streams (also known as big data streams).
The complexity of big data streams is defined through 4Vs (i.e. volume, variety,
velocity, and veracity). These features introduce huge open doors and enormous
difficulties for big data stream computing. A big data stream is continuous in nature
and it is important to perform real-time analysis as the lifetime of the data is often
very short (data is accessed only once) [28-29]. As the volume and velocity of the
data is so high, there is not enough space to store and process; hence, the traditional
batch computing model is not suitable. Cloud computing has become a platform of
choice due to its extremely low-latency and massively parallel processing
architecture [30]. It supports the most efficient way to obtain actionable information
from big data streams [28, 31-33].
Big data stream processing has become an important research topics in the current
era, whereas the data stream security has received little attention from researchers.
Some of these data streams are analysed and used in very critical applications (e.g.
surveillance data, health monitoring, military applications), where data streams need
to be secured in every aspect to detect malicious activity. The problem is exacerbated
when thousands to millions of small sensors in self-organising wireless networks
become the sources of the data stream. How can we provide security for big data
streams? In addition, compared to conventional store-and-process, these sensors will
have limited processing power, storage, bandwidth, and energy. Furthermore, data
streams ought to be processed on-the-fly in a prescribed sequence. This chapter
addresses these issues by designing an efficient architecture for real-time processing
of big sensing data streams, and the corresponding security scheme.
8
Streaming data security can be broadly divided into two types of security
punctuations: (i) the “data security punctuations” (dsps) describing the data-side
security, and (ii) the “query security punctuations” (qsps) representing the query-side
security [82]. We introduced a new module called DSM (Data Stream Manager),
where we perform security verifications of data streams for dsps before data analysis
and followed by qsps for secure query processing.
One of the security threats is the man-in-the-middle attack, in which a malicious
attacker can access or modify the data stream from sensors. This situation arises as it
is not possible to monitor a large number of sensors deployed in the untrusted
environment. We need to maintain an end-to-end security. The common approach is
to apply a cryptographic model. Keeping data encrypted is the most common and
safe choice to secure data in transmission, if encryption keys are managed properly.
There are two common types of cryptographic encryption methods: asymmetric and
symmetric. Asymmetric-key encryption algorithms (e.g. RSA, ElGamal, DSS, YAK,
Rabin) perform a number of exponential operations over a large finite field.
Therefore, they are 1000 times slower than symmetric key cryptography [34-35].
Efficiency becomes an issue if asymmetric-key cryptology based infrastructure such
as the Public-Key Infrastructure PKI [36-37] is applied to big data streams. Thus,
symmetric-key encryption is the most efficient cryptographic solution for such
applications. However, symmetric-key algorithms (e.g. DES, AES, IDEA, RC4) fail
to meet the requirements of real-time, on-the-fly processing of big data streams and
synchronise the speed of recent advanced stream processing engines. Hence, there is
a need for an efficient scheme for securing big data streams. The possible types of
attacks in big data streams are attacks on authenticity, confidentiality, integrity and
availability.
Another important issue is to maintain data privacy or data confidentiality over big
sensing data streams. While protecting data, we always need strong encryption and
more processing power to protect big data streams for a longer time. All data do not
carry the same sensitivity level to require application of strong encryption, because
strong encryption takes longer to process security verification. So it is really
important to find the data sensitivity level prior to applying encryption or decryption
techniques. In this way, we can provide strong encryption for high sensitivity data
and weak encryption for low sensitivity data. So it is always challenging task to
9
identify the data sensitive level in big data streams. Controlling access from
unauthorised users or query processors is also a challenging task in big data stream
environments. There is a high chance of information leakage while giving access for
data processing or evaluation. Access control over big data streams is gaining lots of
interest from a global perspective.
This thesis addresses the solutions for different types of threats and attacks
between Chapters 3 and 6.
1.3 Overview of the Work
This section highlights the overview of our architecture with research
methodology followed by research contributions.
1.3.1 Methodology
In order to address the challenges mentioned above in an organised and
comprehensive manner, we propose to investigate the whole lifecycle of a security
framework for big sensing data streams or securing end-to-end security of big
sensing data streams i.e. from source sensors to cloud data centre. The research
problems in each phase of the lifecycle are identified and corresponding solutions
are put forth. A typical lifecycle is shown in Figure 1-1. Brief descriptions about the
lifecycle of big sensing data streams and its security aspects follow.
We have divided the complete architecture into five different phases of life cycle
such as collection, evaluation, coalition, analysis, and dissemination. We describe the
complete architecture with these five different standard steps from Figure 1-1. Figure
1-1 shows the complete architecture of a big data stream starting from source sensors
up to DSM for a security framework and followed by access control for query
processor and end user. Data transfer to stream, clustering and Bayesian network are
standard data processing, but we did not address all these in our architecture. These
five steps are defined as follows.
Collection: Data collected from different sources such as sensors, for data
analysis and event detection at cloud data centre.
10
Evaluation: Stream data verified for security evaluation to maintain the
originality of the data and go for online stream query processing.
Collation: Evaluate data from different DSMs and aggregate for access to
query processor or end user. Data also move to the cloud for batch
processing after this step.
Analysis: Analyse the data to find its sensitivity level and accordingly give
access to the query processor. Followed by query processor analysis of data
streams for event detection.
Dissemination: This step is the output of the data analysis and distributes
emergency alert messages if necessary. We have not highlighted the
dissemination in our architecture because this falls outside the scope of our
work.
We have described the complete architecture by considering the above five steps
as follows. The description starts with data collection and ends with alert
dissemination. The proposed architecture may be applicable for different applications
though our description is based on a disaster management application.
In the collection step, data are collected from various sources for analysis and
event detection. In our architecture we considered sensors as the source to our data
streams. These collected data streams move to the STREAM collection system [23]
after security verification at DSM. In the evaluation step, there are always two types
of evaluation process in big data: batch processing and stream processing. In this
thesis, we focus on stream processing to detect emergency events in real-time. In the
evaluation step, we address the security evaluation before data analysis. Generally
sources use an untrusted medium to transfer sensed data to the cloud for
evaluation/analysis. So security verification is one of the important features that need
to be addressed on big data streams to filter out unwanted and modified data. DSM
processes data streams on-the-fly. DSM is designed to handle high-volume and
bursty data streams with a large number of complex continuous queries. In the
collation step, evaluated data from DSM are further processed for access control. In
the analysis step, data are structured based on sensitivity level and mapped to the
respective query processor and end user. The access control mechanism protects data
from unauthorised access and information leakage. Nowadays data sources generate
11
terabytes to petabytes of data on a daily basis [41]. Given the volume of data being
generated, real-time computation has become a major challenge. A scalable real-time
computation system that we have used effectively is the open-source Apache Storm
tool, which was developed at Twitter and is sometimes referred to as “real-time
Hadoop”. In the dissemination phase, alert messages are disseminated after data are
evaluated from stream data processing.
1.3.2 Contributions
The main contributions of this thesis are in four contributions chapters. Firstly, a
security solution is proposed for big sensing data streams, which will be
synchronised with the speed and performance of stream processing engine at cloud
data centre. Secondly, a more efficient security solution is proposed by proposing 2-
dimensional security, where it will be more difficult for an attacker or intruder to
guess the secret shared key and followed by synchronisation technique to get key
generation properties from its neighbours without communicating with DSM.
Thirdly, a selective encryption technique is proposed to protect data streams based
on data sensitivity level. Finally, an access control technique is proposed to give
access to big data streams to only authorized and authenticated query processor or
end user. The framework and its implementation have been reported in the author’s
publications (see the section The Author’s Publications for details).
A large number of mission critical applications ranging from disaster
management to smart cities are built on the Internet of Things (IoT) platform
by deploying a number of smart sensors in a heterogeneous environment. The
key requirement of such applications is the need for near real-time stream data
processing in large scale sensing networks. This trend gives birth to an area
called big data stream. The security aspects in big data streams is a very
challenging task because of the 4Vs properties. And because of these
properties, we cannot apply existing security solutions. So we propose a new
security solution by updating dynamic prime number at both source and DSM.
This scheme is based on a common shared key that is updated dynamically
without further communication after the handshaking process. Moreover, the
proposed security mechanism not only reduces the verification time or buffer
12
size in DSM, but also strengthens the security of the data by constantly
changing the shared keys. The results of this chapter have been reported in the
author’s publications 1, 2 and 10.
One of the key problems in big data stream is to ensure end-to-end security.
We refer to this as an online security verification problem. To address this
problem, we propose a Dynamic Key Length Based Security solution based
on a shared key derived from synchronised prime numbers; the key is
dynamically updated at short intervals to thwart potential attacks to ensure
end-to-end security. One of the major shortcomings of these methods is that
they assume synchronisation of the shared key. Later on we also solve the
synchronisation problem and integrate with the main security framework.
The results of this chapter have been reported in the author’s publications 3,
6 and 9.
Many sensing devices are deployed in the environment to generate a variety
of data and send them to the server for analysis as data streams. A DSM at
the server collects the data streams (often called big data) to perform real
time analysis and decision-making for these critical applications. To ensure
the confidentiality of collected data, we need to prevent sensitive information
from reaching the wrong people by ensuring that the right people are getting
it. So we proposed a Selective Encryption method to secure big sensing data
streams that satisfies the desired multiple levels of confidentiality and
integrity. This method protect data against several attacks based on its
sensitive level. The results of this chapter have been reported in the author’s
publications 4 and 5.
Another important step is to control the information leakage of big data
streams. We refer to this as an access control or information flow control
problem over big sensing data streams. To address this, we propose a lattice
based information flow control over big sensing data streams. We consider
static lattices to process the information flow model faster, because we are
dealing with big data streams i.e. high volume and velocity of data streams.
The results of this chapter have been reported in the author’s publication 7, 8
13
and another paper is about lattice based secure information flow control in
big sensing data streams under preparation to submit.
1.4 Thesis Organisation
The rest of this thesis is organised as follows:
Chapter 2 provides the basic background knowledge relevant to our research
to facilitate the discussion, including IoT basics, big data stream models,
security models and stream data processing basics. This is followed by an in-
depth literature review of the state-of-the-art techniques in security issues in
the big data context. We divided the security reviews in basic CIA triad to
classify properly. We also surveyed recent developments of the security
processing techniques with their performance speed. Finally, we compared
existing security solutions to define the need for a security protocol for big
sensing data streams.
Chapter 3 investigates the problem of how to achieve an efficient security
scheme for big sensing data streams to synchronise the speed with a stream
processing engine. We proposed a Dynamic Prime Number Based Security
Verification (DPBSV) scheme for big data streams. Our scheme is based on
a common shared key that is updated dynamically by generating
synchronised prime numbers.
Chapter 4 explores the problem of achieving a scalable and secure solution
by proposing a multidimensional security solution. We propose a Dynamic
Key Length Based Security Framework (DLSeF) based on a shared key
derived from synchronised prime numbers; the key is dynamically updated at
short intervals to thwart potential attacks to ensure end-to-end security.
There follows a proposed synchronisation technique to get the
synchronisation properties from source node neighbours without further
contacting DSM.
Chapter 5 studies the problem of how to achieve data confidentiality and
integrity based on the data sensitivity level in big sensing data streams. We
14
propose a Selective Encryption (SEEN) method to secure big sensing data
streams that satisfies the desired multiple levels of confidentiality and
integrity. Our method is based on two key concepts: common shared keys
that are initialised and updated by DSM without requiring retransmission,
and a seamless key refreshment process without interrupting the data stream
encryption/decryption.
Chapter 6 presents the problem of control, the unauthorised access and
information flow of big sensing data streams. We propose a lattice based
information flow control over big sensing data streams. We consider static
lattices to process the information flow model in faster way, because we are
dealing with big data streams i.e. high volume and rapid arrival rate.
Chapter 7 concludes the thesis and points out future work.
15
Chapter 2
Background Studies and Related Work
The security concerns in big sensing data streams have drawn quite a lot attention
from research communities, but much less work has already been done in this area.
This chapter presents an in-depth literature review on existing work related to our
research. Section 2.1 reviews general research trends in terms of IoT and IoT
generated big sensing data streams. Furthermore, we defined the review of reviews
in Section 2.2 where we review the data centre security, network related data
security and IoT security. Then in Section 2.3, we define IoT generated big sensing
data streams. Here the complete architecture is divided in to different layers in terms
of IoT and each layer properties are highlighted followed by security issues and
solutions. Moreover, the security issues and existing solutions reviewed in Section
2.4 can be applied in big data stream properties. The existing solutions with their
associated properties are classified in Section 2.5. Finally, Section 2.6 summarises
this chapter.
2.1 General Research Trend
The Internet of Things (IoT) is a widely used expression, although still a fuzzy one,
mostly due to the large amount of concepts it encompasses. The IoT materializes a
vision of a future source of data where any sensing device possessing computing and
sensorial capabilities is able to communicate with other devices using Internet
16
communication protocols, in the context of sensing applications. Many such
applications are expected to employ a large amount of sensing and actuating devices,
and in consequence its cost will be an important factor. On the other hand, cost
restrictions dictate constraints in terms of the resources available in sensing
platforms, such as memory and computational power, while the unattended
employment of many devices will also require the usage of batteries for energy
storage. Overall, such factors motivate the design and adoption of communications
and security mechanisms optimized for constrained sensing platforms, capable of
providing its functionalities efficiently and reliably.
Several of these applications are approaching the bottleneck of current data
streaming infrastructures and require real-time processing of very high-volume and
high-velocity data streams (also known as big data streams). The complexity of
big data is defined through V5s: 1) volume– referring to terabytes, petabytes,
or even exabytes (10006bytes) of stored data, 2) variety– referring to unstructured,
semi-structured and structured data from different sources like social media
(Twitter, Facebook etc.), sensors, surveillance, image or video, medical records
etc., 3) velocity– referring to the high speed at which the data is handled
in/out for stream processing, 4) variability– referring to the different
characteristics and data value where the data stream is handled, 5) veracity–
referring to the quality of data. These features introduce huge open doors and
enormous difficulties for big data stream computing. A big data stream is continuous
in nature and it is important to perform real-time analysis as the lifetime of the data
is often very short (data is accessed only once) [4-5, 42-43]. As the volume and
velocity of the data is so high, there is not enough space to store and
process; hence, the traditional batch computing model is not suitable.
Even though big data stream processing has become an important research topic in
the current era, data stream security has received little attention from researchers [4-
5]. Some of these data streams are analysed and used in very critical
applications (e.g. surveillance data, military applications, Supervisory Control
and Data Acquisition (SCADA), etc.), where data streams need to be secured
in order to detect malicious activities. The problem is exacerbated when
thousands to millions of small sensors in self-organising wireless networks become
the sources of the data stream. How can we provide the security for big data streams?
17
In addition, compared to conventional store-and-process, these sensors will
have limited processing power, storage, bandwidth, and energy. Furthermore,
data streams ought to be processed on-the-fly in a prescribed sequence. This
thesis addresses these issues by designing an efficient architecture for real-time
processing of big sensing data streams, and the corresponding security scheme.
Throughout this survey we focus on security issues of big data streams, analysing
both the solutions available in the context of the various security threats starting
from IoT devices communication technologies, as well as those proposed in the
literature. We also identify and discuss the open challenges and possible strategies
for future research work in the area. As our focus is on standardized communication
protocols for the IoT, our discussion is guided by the protocol stack enabled by the
various IoT communication protocols available or currently being designed, and we
also discuss security aspects of big data stream by following a CIA triad method.
We divided the whole big data stream security into three directions (i.e. CIA triad).
In our discussion we include works available both in published research proposals
and in the form of currently active Internet-Draft (I-D) documents submitted for
discussion in relevant working groups.
2.2 Review of Reviews
This section reviews existing security reviews of different research aspects from IoT
generated data sources up to the cloud data centre. According to the architecture
from Figure 1-1, we divided the security related reviews in cloud security (data
centre security), network security, IoT security and other Security Reviews in the
following subsections.
2.1.1 Data Centre Security (Cloud Security)
Recent advances have given rise to the popularity and success of cloud computing.
However, outsourcing the data and business application to a third party causes the
security and privacy issues to become a critical concern. In reality, the cloud service
provider is identified as the core scientific problem that separates cloud computing
18
security from other topics in computing security. In the last few years, the research
community has been focusing on the non-functional aspects of the cloud paradigm,
among which cloud security stands out. Several approaches to security have been
described and summarised in general surveys on cloud security techniques where,
Ardagna et al. focuses on the interface between cloud security and cloud security
assurance [44]. Here the authors classified the vulnerability, threat, and attack in
service layer wise with considering the five most common security properties i.e.
Confidentiality, Integrity, Availability, Authenticity, Privacy. These five properties
in terms of cloud are listed as follows.
Confidentiality: The capability of limiting information access and disclosure
to authorised clients only.
Integrity: The capability of preserving structure and content of information
resources.
Availability: The capability of guaranteeing continuous access to data and
resources by authorised clients.
Authenticity: The capability of ensuring that clients or objects are genuine.
Privacy: The capability of protecting all information pertaining to the
personal domain.
There is a basic review of cloud security by identifying unique security requirements
and then presenting a viable solution that eliminates these potential threats of
individual service layers [13]. The different layer security survey along with
solutions and solution directives are listed in [47]. Xiao et al. identified the five most
representative security and privacy attributes (i.e. confidentiality, integrity,
availability, accountability, and privacy-preservability) of cloud computing and
discussed the vulnerabilities which may be exploited by adversaries in order to
perform various attacks [45]. Service delivery is the most important feature of cloud
computing over distributed computing. Subashini et al. [46] surveyed the security
issues by identifying several potential security elements and threats only focusing of
software service level. In the same way, Huang et al. [49] surveyed the security
mechanisms of infrastructure service layer of the cloud system. Authors surveyed
the academic and industrial IaaS security mechanism separately and finally find
relation between these. Rong et al. [48] emphasized three areas of particular cloud
service, namely SLAs, trusted data sharing, and accountability in the cloud. In this
19
review, they outlined ongoing work on security SLAs for cloud computing, and
briefly presented a scheme to address the security and privacy issue in the cloud. In
[50], the authors provide a comprehensive review on intrusion detection techniques
in the Cloud. They classified the various instruction detection methods in cloud
computing and possible attacks. By focusing on application authors reviewed the
mobile cloud system [51]. The cloud’s service layer wise possible security threats
shown in Figure 2-1.
2.1.2 Network Security
Sensor nodes are an important source of IoT for many types of applications. There
are always unique security requirements for sensor nodes or sensor networks
because of a node’s limited energy and processing power. As sensor networks
become wide-spread, security issues become a central concern, especially in
emergency applications. Chen et al. [52] surveyed the threats and vulnerabilities to
Wireless Sensor Networks (WSNs) and summarised the defence methods based on
the networking protocol layer. The authors divided the issues and analysed them in
seven different categories: cryptography, key management, attack detections and
preventions, secure routing, secure location security, secure data fusion, and other
security issues. A comprehensive survey of WSNs security listed the security
requirements, security challenges, attacks with existing key management solutions
[53]. The authors focused on individual security threats and solutions instead of
communication layer wise security and later on compared and evaluated security
Figure 2-1: Cloud computing security architecture
20
protocols based on each of these five categories. WSNs concept turns to Visual
sensor networks (VSNs) when source devices are sensors, adequate processing
power, and memory. In [54], the authors presented an overview of the characteristics
of VSN applications, the involved security threats and attack scenarios, and the
major security challenges. Their central contribution in this survey is the
classification of VSN security aspects into data-centric, node-centric, network-
centric, and user-centric security. They identified and discussed the individual
security requirements and presented a profound overview of related work for each
class.
Mobile devices are another major data source for IoT. Security in mobile ad hoc
networks is difficult to achieve, notably because of the vulnerability of wireless
links, the limited physical protection of nodes, the dynamically changing topology,
the absence of a certification authority, and the lack of a centralised monitoring or
management point. Earlier studies on mobile ad hoc networks (MANETs) aimed at
proposing protocols for some fundamental problems, such as routing, and tried to
cope with the challenges imposed by the new environment [87, 89]. These protocols,
however, fully trust all nodes and do not consider the security aspect. They are
consequently vulnerable to attacks and misbehaviour. A complete review of
different network layers of MANETs’ security problems along with the proposed
solutions (as of July 2005) is in [55]. The authors consider the security issues
including routing and data forwarding, medium access, key management and
intrusion detection systems (IDSs). Abusalah et al. [56] reviewed the different
routing protocol with a particular focus on security aspects. The authors chose four
representative routing protocols for analysis and evaluation including: Ad Hoc on
demand Distance Vector routing (AODV), Dynamic Source Routing (DSR),
Optimized Link State Routing (OLSR) and Temporally Ordered Routing Algorithm
(TORA). Secure ad hoc networks have to meet five security requirements:
confidentiality, integrity, authentication, non-repudiation and availability [89].
Peer to peer security and solutions at the state of the art, mentioning the suitability
and drawbacks of the different schemes are reviewed by Chopra et al. [57]. The
authors classified the security requirements in File-sharing applications and Real-
time communication applications. Information-centric networking (ICN) is a new
communication paradigm that focuses on content retrieval from a network regardless
21
of the storage location or physical representation of this content. AbdAllah et al. [58]
provide a survey of attacks to ICN architectures and other generic attacks that have
an impact on ICN. It also provides a taxonomy of these attacks in ICN, which are
classified into four main categories, i.e. naming, routing, caching, and other
miscellaneous related attacks. Later on the authors presented the severity levels of
ICN attacks and discuss the existing ICN security solutions. Long Term Evolution
(LTE) networks security aspects are reviewed in [59]. The authors presented an
overview of the security functionality of the LTE followed by, security
vulnerabilities then the existing solutions to these problems are classically reviewed.
2.1.3 IoT Security
The IoT is enabled by the latest developments in RFID, smart sensors,
communication technologies, and Internet protocols. The basic premise is to have
smart sensors collaborate directly without human involvement to deliver a new class
of applications [20]. As security will be a fundamental enabling factor of most IoT
applications, mechanisms must also be designed to protect communications enabled
by such technologies. Granjal et al. [20] reviewed the first article on IoT
communication security. Other surveys do exist that, rather than analysing the
technologies currently being designed to enable Internet communications with
sensing and actuating devices, focus on the identification of security requirements
and on the discussion of approaches to the design of new security mechanisms [60],
or on the other end discuss the legal aspects surrounding the impact of the IoT on the
security and privacy of its users [61].
Granjal et al. [20] performed a survey by analysing existing protocols and
mechanisms to secure communications in the IoT. The authors also analysed how
existing approaches ensure fundamental security requirements and protect
communications in the IoT. The authors classified the communication layer wise
security requirements, security threats, and security solutions and set a comparison
between them.
IoT architecture is generally divided into three layers, including perception layer,
network layer, and application layer. Some systems take the network support
technology (such as network processing, computing technology, middleware
22
technology, etc.) as the processing layer [62]. Figure 2-2 shows the layer wise IoT
security architecture, which is an extended idea from [63]. We have highlighted the
security measures as follows:
The Security Problems of Perception Layer Data Information Collection and
Transmission: Sensor nodes have many varieties and high heterogeneity.
They have generally simple structure and processor. These mean they cannot
have complex security protection capability.
The Traditional Security Issues of Network Layer: Although Internet
security architecture is very mature, there are still many means of attack. For
example, if a large number of malicious nodes send data at the same time it
will lead to DoS attack. So the specific network should be built for fitting
IoT information transmission.
The Application Layer Security Problems: For the different application field,
there are many complex and varied security issues.
Figure 2-2: Layer wise IoT Security architecture
23
2.3 IoT Generated Data Stream Architecture
2.3.1 IoT Architecture
The connection of physical things to the Internet makes it possible to access remote
sensor data and to control the physical world from a distance. The Internet of Things
is based on this vision. A smart object, which is the building block of the Internet of
Things, is just another name for an embedded system that is connected to the
Internet [64]. Al-Fuqaha et al. in [65] clearly defined the individual elements of IoT,
which includes identification, sensing, communication, computation, services, and
semantics. There is another technology that points in the same direction as RFID
technology. The novelty of the Internet-of-Things (IoT) is not in any new disruptive
technology, but in the pervasive deployment of smart objects. A critical requirement
of an IoT is that the things in the network must be inter-connected. IoT system
architecture must guarantee the operations of IoT, which bridges the gap between
the physical and the virtual worlds. And IoT should possess a decentralised and
heterogeneous nature. Due to the fact that things may move geographically and need
to interact with others in real-time mode, IoT architecture should be adaptive to
make devices interact with other things dynamically and support unambiguous
communication of events [66]. We broadly divided the complete architecture of IoT
into three different layers, such as source smart sensing device, communication
(Networks) layer and cloud data centre as shown in Figure 2-3. These layers can be
related to the service layer of IoT, where service layer and interface layer are
integrated into the data centre in our architecture. The service level architecture of
IoT consists of four different layers with functionality such as sensing layer,
network layer, service layer, and interfaces layer [66-67].
Sensing layer: This layer is integrated with available hardware objects
(sensors, RFID, etc.) to sense/control statuses of things.
Network layer: This layer supports the infrastructure for networking over
wireless or wired connections.
Service layer: This layer creates and manages services requirements
according to the user’s need.
24
Interfaces layer: this layer provides interaction methods to users and
applications.
2.3.1.1 Sensing Layer IoT is expected to be a world-wide physical inner-connected network, in which
things are connected seamlessly and can be controlled remotely. In this layer, more
and more devices are equipped with RFID or intelligent sensors, connecting things
becomes much easier [68]. The smart systems on tags or sensors are able to
automatically sense the environment and exchange data among devices. Individual
objects in IoT hold a digital identity which helps to track easily in the domain. The
technique of assigning a unique identity to an object is called a universal unique
identifier (UUID). In particular, UUID is critical to successful services deployment
in a huge network like IoT. The identifiers might refer to names and addresses.
There are a few aspects that need to be considered in the sensing layer such as
deployment (devices need to deployed randomly or incrementally), heterogeneity
(devices have different properties), communication (needs to communicate with
each other in order to get access), network (devices maintain different topology for
data transmission process), cost, size, resources and energy consumption. As the use
of IoT increases day by day, a large number of hardware and software components
are involved in it. IoT should have these two important properties: energy efficiency
and protocols [66].
Energy efficiency: Sensors should be active all the time to acquire real-time
data. This brings the challenge to supply power to sensors; high energy
efficiency allows sensors to work for a longer period of time without the
discontinuity of service.
Protocols: Different things existing in IoT provide multiple functions of
systems. IoT must support the coexistence of different communications such
as ZigBee, 6LoWPAN etc.
2.3.1.2 Networking Layer The role of the networking layer is to connect all things together and allow things to
share information with other connected things. In addition, the networking layer is
capable of aggregating information from existing IT infrastructures [33], data can
25
then be transmitted to cloud data centre for the high-level complex services. The
communication in the network might involve the Quality of Service (QoS) to
guarantee reliable services for different users or applications [36]. Automatic
assignment of the devices in an IoT environment is one of the major tasks, it enables
devices to perform tasks collaboratively. There are some issues related to the
networking layer as listed below [66]:
Network management technologies including managing fixed, wireless,
mobile networks
Network energy efficiency
Requirements of QoS
Technologies for mining and searching
Data and signal processing
Security and privacy
Among these issues, information confidentiality and privacy are critical because of
the IoT device deployment, mobility, and complexity. For information
confidentiality, the existing encryption technology used in WSNs can be extended
and deployed in IoT. However, it may increase the complexity of IoT. The existing
network security technologies can provide a basis for privacy and security in IoT,
but more work still needs to be done. Granjal et al. [20] divided the communication
layer for IoT applications into five different parts: Physical layer, MAC layer,
Adaptation layer, network/routing layer, application layers. They also mentioned the
associated protocols for energy efficiency as shown in Figure 2-4.
26
2.3.1.3 Service Layer A main activity in the service layer involves the service specifications for
middleware, which are being developed by various organisations. A well-designed
service layer will be able to identify common application requirements.
The service layer relies on the middleware technology, which provides
functionalities to integrate services and applications in IoT. The middleware
technology provides a cost-effective platform, where the hardware and software
platforms can be reused. The services in the service layer run directly on the network
to effectively locate new services for an application and retrieve metadata
dynamically about services. Most of the specifications are undertaken by various
standards developed by different organisations. However, a universally accepted
service layer is important for IoT. A practical service layer consists of a minimum
set of the common requirements of applications, application programming interfaces
(APIs), and protocols supporting required applications and services.
All of the service-oriented activities, such as information exchanging and storage,
management of data, ontologies database, search engines and communication, are
performed at the service layer. The activities are conducted by the following
components:
Figure 2-3: Layer wise IoT architecture from IoT device to cloud data centre
27
Service discovery finds objects that can provide the required service and
information in an effective way.
Service composition enables interaction among connected things. The
discovery exploits the relationships of things to find the desired service, and
the service composition schedules or re-creates a more suitable service to
obtain the most reliable services.
Trustworthiness management aims at understanding how the information
provided by other services has to be processed.
Service APIs provide the interactions between services required by users.
2.3.1.4 Interface Layer In IoT, a large number of devices are involved; those devices can be provided by
different vendors and hence do not always comply with same standards. The
compatibility issue among the heterogeneous things must be addressed for the
interactions among things. Compatibility involves information exchanging,
communication, and events processing. There is a strong need for an effective
interface mechanism to simplify the management and interconnection of things. An
interface profile (IFP) can be seen as a subset of service standards that allows a
minimal interaction with the applications running on application layers. The
interface profiles are used to describe the specifications between applications and
services. An illustration of the interface layer is the implementation of Universal
Plug and Play (UPnP), which specifies a protocol for seamless interactions among
heterogeneous things.
28
2.3.2 Security Threats of Each Layer
This subsection lists the security threats and security issues is each individual layer
as divided in the above subsections.
2.3.2.1 Sensing Layer The sensing layer is responsible for frequency selection, carrier frequency
generation, signal detection, modulation, and data encryption [20, 69]. An adversary
may possess a broad range of attack capabilities. A physically damaged or
manipulated node used for attack may be less powerful than a normally functioning
node. Destabilized nodes that interact with the network only through software are as
powerful as other nodes. Nodes in a sensor network use wireless communication
because the network’s ad hoc, large-scale deployment makes anything else
impractical. Base stations or uplink nodes can use wired or satellite communication,
but limitations on their mobility and energy make them scarcer. As with any radio-
Figure 2-4: Communication protocol in IoT.
29
based medium, there exists the possibility of jamming in sensor network. In addition,
nodes in sensor networks may be deployed in hostile or insecure environments
where an attacker has easy physical access. Network jamming and source device
tampering are the major types of possible attack in the sensing layer. The features of
sensing layers follows from Figure 2-4.
Jamming: Interference with the radio frequencies a network’s nodes are using
Tampering: Physical compromise of nodes
Solutions: spread spectrum communication, jamming reports, accurate and complete
design of the node physical package.
2.3.2.2 Network Layer The security mechanisms designed to protect communications with the previously
discussed protocols must provide appropriate assurances in terms of confidentiality,
integrity, authentication and non-repudiation of the information flows. Other
relevant security requirements are privacy, anonymity, liability and trust, which will
be fundamental for the social acceptance of most of the future IoT applications
employing Internet integrated sensing devices. According to the communication
protocol in IoT, we divided in five different layer as shown in Figure 2-4. Table 2-1
classifies the security threats in each individual communication layer. We listed four
different layers such as MAC layer, Adaptation layer, networking/routing layer,
Application layer [20]. We use the physical layer as the sensing layer which is
discussed in the previous subsection.
MAC Layer
The MAC layer manages, besides the data service, other operations, namely
accesses to the physical channel, network beaconing, validation of frames,
guaranteed time slots, node association and security. The standard distinguishes
sensing devices by its capabilities and roles in the network. A full-function device
(FFD) is able to coordinate a network of devices, while a reduced-function device
(RFD) is only able to communicate with other devices (of RFD or FFD types). By
using RFD and FFD devices, IEEE 802.15.4 can support network topologies such as
peer-to-peer, star and cluster networks. The mechanisms defined in IEEE 802.15.4e
will be part of the next revision of the IEEE 802.15.4 standard, and as such open the
30
door for the usage of Internet communication technologies in the context of time-
critical (e.g. industrial) applications [70].
Network Layer
One fundamental characteristic of the Internet architecture is that it enables packets
to traverse interconnected networks using heterogeneous link-layer technologies,
and the mechanisms and adaptations required to transport IP packets over particular
link-layer technologies are defined in appropriate specifications. With a similar goal,
the IETF IPv6 over Low-power Wireless Personal Area Networks (6LoWPAN)
working group was formed in 2007 to produce a specification enabling the
transportation of IPv6 packets over low-energy IEEE 802.15.4 and similar wireless
communication environments. 6LoWPAN is currently a key technology to support
Internet communications in the IoT, and one that has changed a previous perception
of IPv6 as being impractical for constrained low energy wireless communication
environments. No security mechanisms are currently defined in the context of the
6LoWPAN adaptation layer, but the relevant documents include discussions on the
security vulnerabilities, requirements and approaches to consider for the usage of
network layer security.
Routing Layer
The Routing Over Low-power and Lossy Networks (ROLL) working group of the
IETF was formed with the goal of designing routing solutions for IoT applications.
The current approach to routing in 6LoWPAN environments is materialized in the
Routing Protocol for Low power and Lossy Networks (RPL) [71] Protocol. Rather
than providing a generic approach to routing, RPL provides in reality a framework
that is adaptable to the requirements of particular classes of applications. In the
following discussion we analyse the internal operation of RPL, and later the security
mechanisms designed to protect communications in the context of routing operations.
The information in the Security field indicates the level of security and the
cryptographic algorithms employed to process security for the message. What this
field doesn’t include is the security-related data required to process security for the
message, for example a Message Integrity Code (MIC) or a signature. Instead, the
31
security transformation itself states how the cryptographic fields should be
employed in the context of the protected message.
Application Layer
As previously discussed, application-layer communications are supported by the
CoAP [72] protocol, currently being designed by the Constrained RESTful
Environments (CoRE) working group of the IETF. We next discuss the operation of
the protocol as well as the mechanisms available to apply security to CoAP
communications. The CoAP Protocol [72] defines bindings to DTLS (Datagram
Transport-Layer Security) [73] to secure CoAP messages, along with a few
mandatory minimal configurations appropriate for constrained environments.
Table 2-1: Network layer security threats Communication layers Security threats
MAC Confidentiality Data integrity Data authenticity Message Replay Attacks Access Control Mechanisms Time-Synchronised Communications
Adaptation layer Security Vulnerabilities
Network /Routing Layer Selective Forwarding Integrity and Data Authenticity Replay Attacks Sinkhole Sybil attack
Application Confidentiality
Authentication
Integrity
Non-Repudiation
Replay Attacks.
2.3.2.3 Service Layer (Middleware Security) Due to the very large number of technologies normally in place within the IoT
paradigm, a type of middleware layer is employed to enforce seamless integration of
devices and data within the same information network. Within such middleware,
32
data must be exchanged respecting strict protection constraints. IoT applications are
vulnerable to security attacks for several reasons: first, devices are physically
vulnerable and are often left unattended; second, it is difficult to implement any
security countermeasure due to the large scale and the decentralised paradigm;
finally, most of the IoT components are devices with limited resources, that can’t
support complex security schemes [74]. The major security challenge in IoT
middleware is to protect data from data integrity, authenticity, and confidentiality
attacks [75]. It also has issues with access control.
Both the networking and security issues have driven the design and the development
of the VIRTUS Middleware, an IoT middleware relying on the open XMPP protocol
to provide secure event driven communications within an IoT scenario [74].
Leveraging the standard security features provided by XMPP, the middleware offers
a reliable and secure communication channel for distributed applications, protected
with both authentication (through TLS protocol) and encryption (SASL protocol)
mechanisms.
Security and privacy are responsible for confidentiality, authenticity, and
nonrepudiation. Security can be implemented in two ways – (i) secure high-level
peer communication which enables higher layers to communicate among peers in a
secure and abstract way and (ii) secure topology management which deals with the
authentication of new peers, permissions to access the network and protection of
routing information exchanged in the network [76]. Other approaches to implement
security and privacy in IoT-middleware are of trust management, device
authentication, and integrity service and access control. The major IoT security
requirements are data authentication, access control, and client privacy [61].
An AmI framework also called Otsopack provides two core features: (i) it is
designed to be simple, modular and extensible and (ii) it runs in different
computational platforms, including Java SE and Android [77]. As regards security,
given the data-centric nature of the framework, there are mainly two core
requirements: (i) a data provider may only grant access to certain data to a certain set
of users and (ii) a data consumer may trust only a set of providers for a certain set of
acquired data. A derived issue is how to authenticate each other in such a dynamic
scenario. In order to support the first requirement, an OpenID-based solution has
33
been built. An Identity Provider securely identifies data consumers to the data
providers. Data providers can establish which graphs can be accessed by which users.
Therefore, the provider will return a restricted graph only if the valid user is
requesting it. In other words, the same application can get different amounts of
information depending on whether it provides credentials or not.
The authors in [78] suggest the use of lightweight symmetric encryption (for data)
and asymmetric encryption protocols (for key exchange) in Trivial File Transfer
Protocol (TFTP). The target implementation of TFTP is the embedded devices such
as Wi-Fi Access Points (AP) and remote Base Stations (BS), which could be
attacked by malicious users or malwares with the installation of malicious code (e.g.,
backdoors). The authors emphasize finding a solution for strengthening the
communication protocol among AP and BS [78]. To verify this proposal, the authors
decided to use UBOOT (Universal Boot loader). Two schemes are implied: AES,
used to protect personal and sensitive data, and DHKE (Diffie-Hellman Key
Exchange), for exchanging cryptographic keys between two entities that do not
know each other.
In [79] a Naming, Addressing and Profile Server (NAPS) is presented as a
middleware to bridge different platforms in IoT environments. Given massive
amount of heterogeneous devices deployed across different platforms, NAPS serves
as key module at the back-end data centre to aid the upstream, the content-based
data filtering and matching and the downstream from applications. The system deals
with Authentication, Authorization and Accounting (AAA). Although it is not the
focus of this work, the design can largely leverage the Network SEcurity Capability
(NSEC) SC in ETSI M2M service architecture. Note that the device domain is
organised in a tree structure. It uses a key hierarchy, composed of root key, service
key and application keys. Root key is used to derive service keys through
authentication, and key agreement between the device or gateway and the M2M SCs
at the M2M Core. The application key, derived from service key, is unique for M2M
applications.
Several recent works tried to address the presented issues. For example [80] deals
with the problem of task allocation in IoT. In more detail, the cooperation among
nodes has to perform an interoperability towards a collaborative deployment of
34
applications, able to take into account the available resources, such as energy,
memory, processing, and object capability to perform a given task. In order to
address such an issue, a resource allocation middleware for the deployment of
distributed applications in IoT is proposed. Starting from this component, a
consensus protocol for the cooperation among network objects in performing the
target application is added, which aims to distribute the burden of the application
execution, so that resources are adequately shared. Such a work exploits a
distributed mechanism and demonstrates better performance than its centralised
counterpart.
2.3.2.4 Cloud Security Cloud computing is a merger of several known technologies including grid and
distributed computing, utilising the Internet as a service delivery network. The
public Cloud environment is extremely complex when compared to a traditional data
centre environment [22]. Under the paradigm of Cloud computing, an organisation
surrenders direct control over major aspects of security, conferring a substantial
level of trust onto the Cloud provider. A recent survey regarding the use of Cloud
services made by IDC highlights that security is the greatest challenge for the
adoption of Cloud [13, 46, 47]. Figure 2-5 is extended from an idea outlined in [47],
which shows the vulnerabilities, threats and possible attacks in cloud environments.
The following subsections describe these.
2.3.2.4.1 Vulnerabilities in the Cloud Environment This section discusses major Cloud specific vulnerabilities, which pose serious
threats to Cloud computing.
Vulnerabilities in virtualization/multi tenancy
Virtualization/multi-tenancy serves as the basis for Cloud computing architecture.
There are mainly three types of virtualization used: OS level virtualization,
application based virtualization, and Hypervisor based virtualization.
Vulnerabilities in Internet protocol
Vulnerabilities in Internet protocols may prove to be an implicit way of attacking the
Cloud system that include common types of attacks like man-in-the-middle attack,
35
IP spoofing, ARP spoofing, DNS poisoning, RIP attacks, and flooding. ARP
poisoning is one of the well-known vulnerabilities in Internet protocols.
Unauthorised access to management interface
In Cloud, users have to manage their subscription including Cloud instance, data
upload or data computation through a management interface. Unauthorised access to
such a management interface may become very critical for a Cloud system.
Injection vulnerabilities
Vulnerabilities like SQL injection flaw, OS injection flaw, and Lightweight
Directory Access Protocol (LDAP) injection flaw are used to disclose application
components. Such vulnerabilities are the outcomes of defects in design and
architecture of applications.
Vulnerabilities in browsers and APIs
Cloud providers publish a set of software interfaces (or APIs) that customers can use
to manage and interact with Cloud services. Service provisioning, management,
orchestration, and monitoring are performed using these interfaces via clients (e.g.
Web browser).
2.3.2.4.2 Attacks on Cloud Computing By exploiting vulnerabilities in Cloud, an adversary can launch the following attacks.
Zombie attack
Through the Internet, an attacker tries to flood the victim by sending requests from
innocent hosts in the network. These types of hosts are called zombies. In the Cloud,
the requests for Virtual Machines (VMs) are accessible by each user through the
Internet. An attacker can flood a large number of requests via zombies.
Service injection attack
Cloud system is responsible for determining and eventually instantiating a free-to-
use instance of the requested service. The address for accessing that new instance is
to be communicated back to the requesting user. An adversary tries to inject a
malicious service or new virtual machine into the Cloud system and can provide
malicious service to users. Cloud malware affects the Cloud services by changing
(or blocking)Cloud functionalities.
Virtualization attack
36
There are mainly two types of attacks performed over virtualization: VM Escape and
Rootkit in hypervisor. These types of attacks are possible with the cloud
virtualization concept.
Man-in-the Middle attack
In Cloud, an attacker is able to access the data communication among data centres.
Proper SSL configuration and data communication tests between authorised parties
can be useful to reduce the risk of Man-in-the-Middle attack.
Metadata spoofing attack
In this type of attack, an adversary modifies or changes the service’s Web Services
Description Language (WSDL) file where descriptions about service instances are
stored.
Phishing attack
Phishing attacks are well known for manipulating a web link and redirecting a user
to a false link to get sensitive data. In Cloud, it may be possible that an attacker uses
the cloud service to host a phishing attack site to hijack accounts and services of
other users in the Cloud.
Backdoor channel attack
It is a passive attack, which allows hackers to gain remote access to the
compromised system. Using backdoor channels, hackers can control victim’s
resources and can make a zombie for attempting a DDoS attack.
2.3.2.4.3 Threats to Cloud computing Cloud security alliance presented a primary draft for threats relevant to the security
architecture of Cloud services. We discuss here some potential threats relevant to
Cloud and relevant mitigation directives.
Changes to business model
Cloud computing changes the way in which IT services are delivered. As servers,
storage and applications are provided by off-site external service providers,
organisations need to evaluate the risks associated with the loss of control over the
infrastructure. This is one of the major threats which hinder the usage of Cloud
computing services.
Abusive use of Cloud computing
37
Cloud computing provides several utilities including bandwidth and storage
capacities. Some vendors also give a predefined trial period to use their services.
However, they do not have sufficient control over the attackers, malicious users or
spammers that can take advantages of the trials. These can often allow an intruder to
plant a malicious attack and prove to be a platform for serious attacks.
Insecure interfaces and API
Cloud providers often publish a set of APIs to allow their customers to design an
interface for interacting with Cloud services. These interfaces often add a layer on
top of the framework, which in turn would increase the complexity of Cloud. Such
interfaces allow vulnerabilities (in the existing API) to move to the Cloud
environment. Improper use of such interfaces would often pose threats such as clear-
text authentication, transmission of content, improper authorizations, etc.
Malicious insiders
Most of the organisations hide their policies regarding the level of access to
employees and their recruitment procedure for employees. However, using a higher
level of access, an employee can gain access to confidential data and services. Due
to lack of transparency in a Cloud provider’s process and procedure, insiders often
have the privilege. Insider activities are often bypassed by a firewall or Intrusion
Detection system (IDS) assuming it to be a legal activity.
Shared technology issues/multi-tenancy nature
In multi-tenant architecture, virtualization is used to offer shared on-demand
services. The same application is shared among different users having access to the
virtual machine. However, as highlighted earlier, vulnerabilities in a hypervisor
allow a malicious user to gain access and control of the legitimate users’ virtual
machine.
Data loss and leakage
Data may be compromised in many ways. This may include data compromise,
deletion, or modification. Due to the dynamic and shared nature of the Cloud, such a
threat could prove to be a major issue leading to data theft.
Service hijacking
Service hijacking may redirect the client to an illegitimate website. User accounts
and service instances could in turn make a new base for attackers. Phishing attack,
38
fraud, exploitation of software vulnerabilities, reused credentials, and passwords
may pose service or account hijacking.
Risk profiling
Cloud offerings make organisations less involved with ownership and maintenance
of hardware and software. This offers significant advantages. However, this makes
them unaware of internal security procedures, security compliance, hardening,
patching, auditing, and logging process and exposes the organisation to greater risk.
Identity theft
Identity theft is a form of fraud in which someone pretends to be someone else, to
access resources or obtain credit and other benefits. The victim (of identity theft) can
suffer adverse consequences and losses and be held accountable for the perpetrator’s
actions. Relevant security risks include weak password recovery workflows,
phishing attacks, key loggers, etc.
2.4 Big Data stream Security
Applications dealing with large data sets obtained via simulation or actual real-time
sensor networks/social network are increasing in abundance [81]. The data obtained
from real-time sources may contain certain discrepancies which arise from the
dynamic nature of the source. Furthermore, certain computations may not require all
the data and hence this data must be filtered before it can be processed. By installing
adaptive filters that can be controlled in real-time, we can filter out only the relevant
parts of the data thereby improving the overall computation speed.
Figure 2-5: Cloud computing security threats, attacks and vulnerabilities.
39
Nehme et al. [82] proposed a system, StreamShield, designed to address the problem
of security and privacy in the data stream. They have clearly highlighted the need for
two types of security in data stream i.e. (1) the “data security punctuations” (dsps)
describing the data-side security policies, and (2) the “query security punctuations”
(qsps) in their paper. The advantages of such a stream-centric security model include
flexibility, dynamicity and speed of enforcement. A stream processor can adapt to
not only data-related but also to security-related selectivity, which helps reduce
waste of resources, when few subjects have access to streaming data.
Security verification is very important in data stream in order to avoid the
unwanted and corrupted data.
Another important problem need to address is to perform the security
verification in near real-time.
Security verification should not degrade the performance of the stream
processing engine. i.e. speed of security verification should be much more
efficient than stream processing engine.
There are several applications where sensor nodes work as the source of the data
stream. Here we list several applications such as real-time health monitoring
applications (Health care), industrial monitoring, geo-social networking, home
automation, war front monitoring, smart city monitoring, SCADA, event detection,
disaster management and emergency management.
From all the above applications, we found data needs to be protected from malicious
attacks to maintain originality of data before it reaches a data processing centre [83].
As the data sources is sensor nodes, it is always important to propose lightweight
security solutions for data streams [83].
These applications require real-time processing of very high-volume data streams
(also known as big data stream). The complexity of big data is defined through 4Vs
i.e. volume, variety, velocity, veracity. These features present significant
opportunities and challenges for big data stream processing. Big data stream is
continuous in nature and it is important to perform the real-time analysis as the life
time of the data is often very short (applications can access the data only once) [4-5].
40
2.4.1 Security Requirements
The goal of security services in a big data stream is to protect the information and
resources from malicious attacks and misbehaviour. The security requirements in
big data stream include:
Availability: Which ensures that the data stream is accessible to
authenticated users and specific applications according to the predefined
features.
Authorization: Which ensures that only authorised users can be involved in
data analysis and modification.
Authentication: Which ensures that the data are from the legitimate sources
by providing end-to-end security services. Data should not be from any
malicious sources.
Confidentiality: which ensures that a given message cannot be understood
by anyone other than the desired recipients
Integrity: which ensures that a message sent from sources to the data centre
(cloud) is not modified by malicious intermediate
Nonrepudiation: which denotes that a source cannot deny sending a
message it has previously sent
Freshness: which implies that the data is recent and ensures that no
adversary can replay old messages.
Moreover, forward and backward secrecy should also be considered, when we get
the data from net set of sources.
Forward secrecy: a source device should not be able to read any future
messages after it leaves the network.
Backward secrecy: a joining source device should not be able to read any
previously transmitted message.
By considering the above security requirements, we divided the security issues,
threats, and solutions of IoT generated big data stream based on the CIA
(Confidentiality, Integrity, and Availability) triad. The following sections give the
details about this.
41
2.4.2 CIA Triad Properties
Confidentiality, integrity and availability, also known as the CIA triad, is a
model designed to guide policies for information security within big data streams.
The CIA triad is shown in Figure 2-6. The model is also sometimes referred to as
the AIC triad (availability, integrity and confidentiality) to avoid confusion with the
Central Intelligence Agency. The elements of the triad are considered the three most
crucial components of security.
Confidentiality – secrecy of the data either in transit or at rest, our data stream deals
with data in transit. Def: Confidentiality is a set of rules or a promise that limits
access or places restrictions on certain types of information in the data stream.
Partial confidentiality of the data, when there is considerable informational
disclosure on some situation [84]. Weak confidentiality is in the case where some
parts of the original data blocks can be reconstructed explicitly from fewer than m
pieces [85]. Information Dispersal Algorithms (IDAs) have weak confidentiality and
an eavesdropper can reconstruct some segments of the original file F explicitly from
fewer than m pieces in the case of weak confidentiality.
System measure the impact on confidentiality of a successful exploit of the
vulnerability on the target system.
Partial: There is considerable informational disclosure.
Complete: A total compromise of critical system information.
Figure 2-6: CIA triad of data security either data in transit or at rest.
42
Integrity –Integrity, in terms of big data streams, is the assurance that information
can only be accessed or modified by those authorised users or applications.
Measures taken to ensure integrity include controlling the physical environment of
networked terminals and servers, restricting access to data, and maintaining rigorous
authentication practices. The authentication process is the part of the integrity
process.
System measure the impact on Integrity of a successful exploit of the vulnerability
on the target system.
Partial: Considerable breach in integrity.
Complete: A total compromise of system integrity.
Authentication: Measures whether or not an attacker needs to be authenticated in a
big data stream in order to exploit the vulnerability. Authentication is required to
access and exploit the vulnerability.
Availability –Data availability is a term of big data streams to ensure that data
continues to be available at a required level of performance in situations ranging
from normal through data streams. In general, data availability is achieved through
providing access to authenticate users and/or applications.
System measure the impact on Availability of a successful exploit of the
vulnerability on the target system.
Partial: Considerable lag in or interruptions in resource availability.
Complete: Total shutdown of the affected resource
2.4.3 Confidentiality of Big Data Streams
Confidentiality is the ability to cover messages from a passive attacker so that
information in big data streams remains confidential. This is one of the most
important issues for several applications such as health care, military etc.
Confidentiality must be maintained throughout the entire lifetime of the data, from
source smart IoT device to long-term archiving in a cloud data centre.
43
Confidentiality is typically achieved via data encryption. This is a kind of passive
attack, where unauthorised attackers monitor and listen to the communication
medium. The attacks against data privacy and is always passive in nature.
Access Authorization [54] — Access to confidential data must be limited to a group
of legitimate users to analyse data. An access authorization scheme must ensure that
only persons with adequate security clearance get access to stream data. For access
to especially sensitive data, involvement of more than one operator should be
required to prevent misuse. If a video stream contains different levels of information
(e.g. full video, annotations), access should be managed separately for each level.
Additionally, all attempts to access confidential data should be securely logged.
Attack Against Privacy — In our classification, privacy is a subproperty of
confidentiality. Whereas confidentiality denotes protection of all data against
external parties, privacy means protection of sensitive data against misuse by
legitimate users (i.e., insiders). In fact, much information from sensor networks
could probably be collected through direct site surveillance [54]. Rather, sensor
networks intensify the privacy problem because they make large volumes of
information easily available through remote access. Hence, adversaries need not be
physically present to maintain surveillance. They can gather information at low-risk
in an anonymous manner [86]. For system operators who perform monitoring tasks,
behavioural information is usually sufficient and identity information is not required.
This can be achieved by automatic detection and removal of sensitive data from data
streams.
Monitor and Eavesdropping — This is the most common attack on confidentiality.
By snooping to the data, the adversary could easily discover the communication
contents. When the traffic conveys the control information about the sensor network
configuration, which contains potentially more detailed information than accessible
through the location server, the eavesdropping can act effectively against the privacy
protection [86].
Traffic Analysis — Even when the messages transferred are encrypted, it still leaves
a high possibility of analysis of the communication patterns. Sensor activities can
potentially reveal enough information to enable an adversary to cause malicious
harm to the sensor network [86].
44
Camouflage Adversaries — One can insert their node or compromise the nodes to
hide in the sensor network. After that these nodes can act as a normal node to attract
the packets, then misroute the packets, conducting the privacy analysis.
Related works on data confidentiality
Lou et al. [87] proposed a novel scheme, Security Protocol for REliable dAta
Delivery (SPREAD), focusing on confidentiality as a service. The proposed
SPREAD scheme aims to provide further protection to secret messages from being
compromised (or eavesdropped) when data are in an insecure network. Authors
presented the overall system architecture and investigate the design issues and as a
result SPREAD is more secure and also provides a certain degree of reliability
without sacrificing the security. Their simulation results show that SPREAD can
provide more secure data transmission when messages are transmitted across an
insecure wireless medium. Location Privacy Routing Protocol (LPR) is able to
minimize the traffic direction information that an adversary can retrieve from
eavesdropping [88]. A novel anonymous on demand routing protocol, named as
MASK [89], can accomplish both MAC-layer and network-layer communications
without disclosing real IDs of the participating nodes in any adversary environment.
MASK offers the anonymity of senders, receivers, and sender-receiver relationships
in addition to node unlocatability and untrack ability and end-to-end flow
intractability. Saini et al. [90] proposed a privacy protection method that adopts
adaptive data transformation involving the use of selective obfuscation and global
operations to provide robust privacy even with unreliable detectors in surveillance
environments. Experimental results of the proposed method show that the proposed
method incurs 38% less distortion of the information needed for surveillance in
comparison to earlier methods of global transformation, while still providing close
to zero privacy loss. Security protocols optimized for sensor networks: Luo has two
secure building blocks: SNEP and μTESLA [91]. SNEP includes: data
confidentiality, two-party data authentication, and evidence of data freshness.
Luo et al. [92] proposed a new approach to protect confidentiality against a parasitic
adversary by obtaining measurements in an unauthorised way. The low-complexity
solution, GossiCrypt, leverages the large scale of sensor networks to protect
45
confidentiality efficiently and effectively. GossiCrypt protects data by symmetric
key encryption at their source nodes and re-encryption at a randomly chosen subset
of nodes and route to the sink. The authors validate GossiCrypt analytically and with
simulations, showing it protects data confidentiality with a probability of almost one.
Chan et al. [93] proposed key management schemes for data confidentiality, when
data are disseminated to multiple destinations. The authors categorized these
schemes into four groups: key tree-based approaches, contributory key agreement
schemes supported by the Diffie-Hellman algorithm, computational number
theoretic approaches, and secure multicast framework approaches. Through
examples, authors describe the operation of the schemes and compare their
performances. Traffic analysis poses a serious threat while data transmitted through
wireless medium [91]. Strong encryption and traffic padding are often used to hide
message contents to maintain confidentiality of the data. Jiang et al. [94] discussed
different methods of constructing traffic cover mode, formulate an optimality
problem and presenting a solution.
2.4.4 Integrity of Big Data Streams
In the following we list the possible integrity attacks on big data streams. First we
highlight the possible attacks on big data streams.
Spoofed, Altered, or Replayed data stream Information — The most direct attack
against data in transit after IoT devices. An attacker may spoof, alter, or replay
information in order to interrupt traffic in the network [92]. These interruptions
include the creation of routing loops, attracting or repelling network traffic,
extending and shortening source routes, generating fake error messages, partitioning
the network, and increasing end-to-end latency [69].
A countermeasure against spoofing and alteration is to append a message
authentication code (MAC) after the message. By adding a MAC to the message, the
receivers can verify whether the messages have been spoofed or altered. To defend
against replayed information, counters or timestamps can be included in the
messages [94].
46
Selective Forwarding — A significant assumption made in multihop networks is
that all intermediate nodes in the network will accurately forward received messages.
An attacker may create malicious nodes which selectively forward only certain
messages and simply drop others [63]. A specific form of this attack is the black
hole attack in which a node drops all messages it receives. One defence against
selective forwarding attacks is using multiple paths to send data [95]. A second
defence is to detect the malicious node or assume it has failed and seek an
alternative route [69].
Sinkhole — In a sinkhole attack, an attacker makes a compromised node look more
attractive to surrounding nodes by forging routing information [95-96]. It chooses
intermediate nodes to transfer the big data stream. The end result is that surrounding
nodes will choose the compromised node as the next node to route the data stream.
This type of attack makes selective forwarding very simple, as all traffic from a
large area in the network will flow through the adversary’s node [69].
Sybil — The Sybil attack is a case where one node presents more than one identity
in the source network [95, 97]. Protocols and algorithms which are easily affected
include fault-tolerant schemes, distributed storage, and network-topology
maintenance. For example, a distributed storage scheme may rely on there being
three replicas of the same data to achieve a given level of redundancy [69]. If a
compromised node pretends to be two of the three nodes, the algorithms used may
conclude that redundancy has been achieved while in reality it has not.
Wormholes — A wormhole is a low-latency link between two portions of the
network over which an attacker replays network information [95]. This link may be
established either by a single node forwarding messages between two adjacent but
otherwise non-neighbouring nodes or by a pair of nodes in different parts of the
network communicating with each other. The latter case is closely related to the
sinkhole attack, as an attacking node near to the data centre can provide a one-hop
link to that base station via the other attacking node in a distant part of the network
[69]. Hu et al. presented a novel and general mechanism called packet leashes for
detecting and defending against wormhole attacks [98]. Two types of leashes were
introduced: geographic leashes and temporal leashes. The proposed mechanisms can
also be used in sensor networks.
47
Hello Flood Attacks — Many protocols which use HELLO packets make the naive
assumption that receiving such a packet means the sender is within radio range and
is therefore a neighbour. An attacker may use a high-powered transmitter to trick a
large area of nodes into believing they are neighbours of that transmitting node [69,
95]. If the attacker falsely broadcasts a superior route to the base station, all of these
nodes will attempt transmission to the attacking node, despite many being out of
radio range in reality.
Acknowledgment Spoofing — Routing algorithms used in sensor networks
sometimes require Acknowledgments to be used. An attacking node can spoof the
Acknowledgments of overheard packets destined for neighbouring nodes in order to
provide false information to those neighbouring nodes [69, 95]. An example of such
false information is claiming that a node is alive when in fact it is dead.
Desynchronisation — Desynchronisation refers to the disruption of an existing
connection [69]. An attacker may, for example, repeatedly spoof messages to a data
centre, causing missed frames. If timed correctly, an attacker may degrade or even
prevent the ability of the end hosts to successfully exchange data, thus causing them
to instead waste energy by attempting to recover from errors which never really
existed.
A possible solution to this type of attack is to require authentication of all packets
communicated between hosts [96]. Provided that the authentication method is itself
secure, an attacker will be unable to send the spoofed messages to the end hosts.
Time Synchronisation — Every source IoT device has its own local clock. For
example, to correlate events detected by multiple nodes, a common time base among
the participants of the data transmission is required. Since the clocks of the sensor
nodes operate independently, the time readings of the nodes will differ. These time
differences are increased further by the individual drifts of the nodes’ oscillators
[54]. Consequently, clock and time synchronisation is required to enable meaningful
comparison of observed events and to jointly solve distributed tasks. From a security
perspective, it is apparent that time synchronisation protocols are an attractive target
for attackers who want to disrupt the services of a big data stream. Boukerche et al.
[99] defined three different groups of attackers on time synchronisation. The first
group is malicious outsiders who can eavesdrop the on communication and who can
48
inject messages. The second group is able to jam communication channel and can
delay and replay captured packages. Finally, the third group includes insiders who
have managed to capture an IoT source device and therefore also have access to the
cryptographic keys of the node. Protection against malicious outsiders is based on
cryptographic techniques and is not different from protecting any other protocol in
data streams.
Eavesdropping (Passive Attacks) — Wireless medium of data communication
happens in IoT generated big data stream and the wireless channel can be easily
accessible. Moreover, the promiscuous mode, which means capturing packets by a
node that is not the appropriate destination, is allowed and employed by protocols to
operate or to ensure more efficiency, e.g. a routing protocol may use this mode to
learn routes [55]. These features can be exploited by malicious nodes to eavesdrop
on packets in transit, then analyse them to obtain confidential and sensitive
information. The obvious preventive solution to protect information is to encrypt
packets, but data encryption does not prevent malicious nodes from eavesdropping
and trying to break decryption keys [55]. Since breaking keys is always possible and
key revocation is problematic, as we will see later, eavesdropping remains a serious
attack against data forwarding.
Dropping Data Packets Attack — Since packets follow multi-hop routes, a
malicious node can participate in routing, include itself in routes, and drop all
packets it gets to forward. To do this, the malicious node first attacks the routing
protocol to gain participation in the routing, using one or more of the attacks
presented previously [55]. This attack has the same effects as the selfish
misbehaviour presented hereafter and the same solutions may be applied.
Selfish Behaviour on Data Forwarding — In many civilian applications, such as
networks of cars and the provision of communication facilities in remote areas in
IoT, the source devices typically do not belong to a single authority and do not
pursue a common goal. In such networks, forwarding packets for others is not in the
direct interest of nodes, so there is no good reason to trust nodes and assume that
they always cooperate. Indeed, a selfish node may try to preserve its resources,
particularly battery power, which is a precious resource, by dropping packets it is
asked to forward while using other nodes’ services and consuming their resources to
49
transmit its own packets toward remote nodes. This is not an intentional attack but a
selfish misbehaviour.
Related works on data integrity
There are so many works already being done in and related to protect against
integrity attack.
Perrig et al. [91] presented a suite of security protocols optimized for sensor
networks: SPINS. SPINS has two secure building blocks: SNEP and μTESLA.
SNEP includes: data confidentiality, two-party data authentication, and evidence of
data freshness. μTESLA provides authenticated broadcast for severely resource-
constrained environments. Their implementation of these protocol shows that they
are practical even on minimal hardware: the performance of the protocol suite easily
matches the data rate network. Security is important for many sensor network
applications and a particularly harmful attack against sensor and ad hoc networks is
known as the Sybil attack [97], where a node illegitimately claims multiple identities.
Authors systematically analyse the threat posed by the Sybil attack to wireless
sensor networks and demonstrate that the attack can be exceedingly detrimental to
many important functions of the sensor network such as routing, resource allocation,
misbehaviour detection, etc. Then they propose several novel techniques to defend
against the Sybil attack, and analyse their effectiveness quantitatively. Many new
time synchronisation algorithms have been proposed, and a few of them provide
security measures against various degrees of attacks. In [99] the authors reviewed
the most commonly used time synchronisation algorithms and evaluate these
algorithms based on factors such as their countermeasures against various attacks
and the types of techniques used. Deng et al. [83] introduced an INtrusion-tolerant
routing protocol for wireless SEnsor NetworkS (INSENS). INSENS is secure and
efficient, and constructs tree-structured routing for wireless sensor networks (WSNs).
The key objective of an INSENS network is to tolerate damage caused by an
intruder who has compromised deployed sensor nodes and is intent on injecting,
modifying, or blocking packets. To limit or localize the damage caused by such an
intruder, INSENS incorporates distributed lightweight security mechanisms,
including efficient one way hash chains and nested keyed message authentication
50
codes that defend against wormhole attacks. An enhanced single-phase version of
INSENS scales to large networks, integrates bidirectional verification to defend
against rushing attacks. With considering the importance of security issues Du et al.
[100] summarise typical attacks on sensor networks and survey the literature on
several important security issues relevant to the sensor networks, including key
management, secure time synchronisation, secure location discovery, and secure
routing, where sensors are the data source of IoT.
Tubaishat et al. [101] designed a Secure Routing Protocol for Sensor Networks
(SRPSN) to safeguard the data packet passing on the sensor networks under
different types of attacks (integrity attack). The authors also proposed a group key
management scheme, which contains group communication policies, group
membership requirements and an algorithm for generating a distributed group key
for secure communication. The authors used a highly efficient symmetric key and
use hierarchical architecture, which greatly lowers the computation and
communication overhead. Considering eavesdrop and trace packet movement in the
network, Jian et al. [86] proposed a location privacy routing protocol (LPR) that is
easy to implement and provides path diversity. Combining with fake packet
injection, LPR is able to minimize the traffic direction information that an adversary
can retrieve from eavesdropping. The authors evaluate the system based on three
criteria: delivery time, privacy protection strength, and energy cost. The
performance of this protocol can be tuned through a couple of parameters that
determine the tradeoff between energy cost and the strength of location-privacy
protection.
Routing plays an important role in the security of the entire network. In general,
routing security in wireless MANETs appears to be a problem that is not trivial to
solve. Deng et al. [102] studied the routing security issues of MANETs, and analyse
in detail the black hole attacks for MANETs. Their purposed solution follows the
black hole problem for ad hoc on-demand distance vector (AODV) routing protocol.
The limitation of this work is that malicious nodes do not work as a group, although
this may happen in a real situation.
51
2.4.5 Availability of Big Data Streams
Data availability in big data streams is to control the access to the end users or
specific applications. This subsection classifies the different access control with data
steam properties. There are two questions required for the access control process i.e.
who? and what?.
Several access control models exist. Their corresponding access control
mechanisms—the concrete implementations of those access control models—can
take several forms, make use of different technologies and underlying infrastructure
components, and involve varying degrees of complexity. In some cases, the more
complicated models expand upon and enhance earlier models, while in other cases
they represent a rethinking of the fundamental manner in which access control
should be done. In many cases, the newer, more complicated models arose not from
deficiencies in the security that earlier models provide, but from the need for new
models to address changes in organisational structures, technologies, organisational
needs, technical capabilities, and/or organisational relationships [78].
Mandatory Access Control System (MAC) — Mandatory Access Control, or MAC,
relies on labels that correspond to the sensitivity levels of information for clients and
objects [103]. MAC policy compares the sensitivity label at which the user is
working to the sensitivity label of the object being accessed and refuses access
unless certain MAC checks are passed. MAC is mandatory because the labelling of
information happens automatically, and ordinary users cannot change labels unless
an administrator authorises them [79]. In big data streams, the authorised user or
administrator can access the data to read and analyse to detect even in near real time.
Role-Based Access Control (RBAC) — Role-based access control (RBAC) models
are receiving increasing attention as a generalized approach to access control [104].
In an RBAC model, roles represent functions within a given organisation. For big
data streams, roles can be assigned to the specific application and the access control
mechanism can be granted to roles, rather than single users. The authorizations
granted to a role are strictly related to the data objects and resources that are needed
for exercising the functions associated with the role. Users are thus simply
authorised to “play” the appropriate roles, thereby acquiring the roles’ authorizations
52
[105]. When users log in, they can activate a subset of the roles they are authorised
to play. We call a role that a user can activate during a session an enabled role.
View-Based Access Control (VBAC) — VBAC [106] extends RBAC by introducing
a view as a static, typed language construct for the description of fine grained access
rights, which are permissions or denials for operations of distributed objects. In
essence, VBAC is based on the classical access matrix model with roles as subjects
and views as matrix entries. Individual subjects or roles, but a principal has access to
an operation of an object if (s)he has a view on the object with a permission for the
operation.
Activity-based access control (AcBAC) — Activity-based access control is a term
seldom used and seems to be an earlier version of attribute-based access control.
An activity-based access control (AcBAC) model has been introduced recently,
which was designed for collaborative work environments [107]. Workflow is
defined as a set of activities (tasks) that are connected to achieve a common goal.
AcBAC separates access right assignment for users and access right activation. Even
if a user was allocated access rights on the workflow template, he/she can exercise
his rights during the activation of the task in the specific workflow instance [107].
Attribute based access control — The attribute based access control (AtBAC)
model has the following characteristics [108].
• Users have a set of identity attributes that describe properties of users. For example,
organisational role(s), seniority, applications and so on.
• Data is associated with AtBAC policies that specify conditions over identity
attributes.
• A user whose identity attributes satisfy the AtBAC policy associated with a data
item is allowed to access the data item.
Proximity based access control — The model relies on an intuitive notion that
“proximity” means the users are present (or not) within the same physical space.
This lack of a rigorous understanding of proximity can lead to surprising
interpretations [109]. Location based or application based users can access stream
data. This method can be applied to application oriented access to data stream. In
PBAC, administrators can write policies that specify either the presence or absence
53
of other users within a protected area. PBAC also makes a distinction between
policies that require authorization only a single time prior to access and policies that
specify conditions that must continue to hold for as long as the permission is used.
Encryption based access control — Big Data technologies are increasingly used to
stream data analysis and other sensitive data. In order to comply with various
regulations and policies, such data need to be stored encrypted and the access to
them needs to be controlled based on the identity attributes of users. Nableel et al.
[110] proposed an efficient symmetric key encryption scheme for access control for
big data. Unlike the direct application of symmetric key encryption, keys are not
stored in the system; they are dynamically derived when data is to be decrypted.
This approach is an order of magnitude moreefficient than the ABE based approach
as ours is based on symmetric key encryption and broadcast group key management.
The main bottleneck in the approach is the key generation operation.
Privilege State based access control (PSAC) — To support fine-grained intrusion
response, manually moving a privilege to the suspend state provides the basis for an
event based continuous authentication mechanism. Similar arguments can be made
for attaching the taint state to a privilege that triggers auditing of the request in
progress. The decision semantics of an access control system using privilege states
is called a privilege state based access control (PSAC) system [111]. For the
completeness of the access control decisions, a privilege, assigned to a user or role,
in PSAC can exist in the following five states: unassign, grant, taint, suspend, and
deny.
Risk based access control [112] — Fuzzy inference is a promising approach to
implement risk-based access control systems. However, its application to access
control raises some novel problems that have not yet been investigated. Risk-based
access control, though it improves information flow and better addresses
requirements from critical organisations, may result in damage by malicious users
before mitigating steps are taken. The time required by a fuzzy inference engine to
estimate risks may be quite high, especially when there are tens of parameters and
hundreds of fuzzy rules. However, an access control system may need to serve
hundreds or thousands of users. Ni et al. [104] investigated these issues and present
new solutions or answers to above issues.
54
Discretionary access control (DAC) [113] — Discretionary access control, based on
checking access requests against users' authorizations, does not provide any way of
restricting the usage of information once it has been "legally" accessed [80]. This
makes discretionary systems vulnerable to Trojan Horses maliciously leaking
information. Therefore the need arises for providing additional controls limiting the
indiscriminate flow of information in the system. Conventional authorization models
enforcing discretionary policies are based on authorizations which specify, for each
user or group of users in the system, the accesses he/she is allowed to execute on
objects. Bertino et al. [113] proposed a model that allowed the specification of
temporal dependencies among authorizations. Temporal dependencies allow the
derivation of new authorizations on the basis of the presence or absence of other
authorizations in given time intervals. This model allows association of each
authorization temporal constraint which restricts the validity of the authorization.
Related works on data availability
This section gives some standardised related work on data availability. This related
work contains the proposed solutions against the above specified access control
mechanism.
Bertino et al. [113] defined discretionary access control, based on checking access
requests against users' authorizations. It does not provide any way of restricting the
usage of information once it has been "legally" accessed. Conventional authorization
models enforcing discretionary policies are based on authorizations which specify,
for each user or group of users in the system, the accesses he is allowed to execute
on objects. In [113], the authors proposed an authorization model in association with
authorization temporal constraints which restrict the authorization validity. This
model allows temporal dependencies among authorizations and temporal
dependencies allow the derivation of new authorizations on the basis of the presence
or absence of other authorizations in given time intervals.
Ni et al. [112] (Risk-based Access Control) proposed risk-based access control
systems, which is built on fuzzy inferences. The authors shows that fuzzy inference
is a good approach for estimating access risks. Specific problems concerning the
application of fuzzy inference to access control are investigated and solved in their
55
paper. The time required by a fuzzy inference engine to estimate risks may be quite
high especially when there are tens of parameters and hundreds of fuzzy rules.
Kamra et al. [111] proposed a privilege states based access control model more
specifically developed to support fine-grained response actions, such as request
suspension and request tainting, in the context of an anomaly detection system for
databases. In their model author privileges, assigned to a user or role, have a state
attached to them, thereby resulting in a privilege states based access control (PSAC)
system. PSAC has been designed to also take into account role hierarchies that are
often present in the access control models of current DBMSs. They implemented
PSAC in the PostgreSQL DBMS and discussed relevant implementation issues.
Nabeel et al. [110] proposed a novel approach using attribute based group key
management. This approach is an order of magnitude more efficient than the ABE
based approach as ours is based on symmetric key encryption and broadcast group
key management. They utilise a MapReduce framework to improve the performance
of the key generation by generating intermediate keys during the Map phase and
generating the final key during the Reduce phase. The encryption is performed at the
granularity of HDFS (Hadoop Distributed File System) blocks.
Prox-RBAC (Role-based Access Control) was proposed as an extension to consider
the relative proximity of other users with the help of a pervasive monitoring
infrastructure [109]. In this work, the authors presented a more rigorous definition of
proximity based on formal topological relations. In addition to that this concept can
be applied to several additional domains, such as social networks, communication
channels, attributes, and time. Thus, this policy model and language is more flexible
and powerful than the previous work. However, that work offered only an informal
view of proximity, and unnecessarily restricted the domain to spatial concerns.
DBMask is a novel solution that supports fine-grained cryptographically enforced
access control, including column, row and cell level access control, when evaluating
SQL queries on encrypted data [108]. In their solutions, the authors do not require
modifications to the database engine, and thus maximise the reuse of the existing
DBMS infrastructures.
56
Oh et al. [107] proposed an integration model of RABC and ABAC. For this the
authors described the basic concept and limitations of RBAC and ABAC models and
introduced the concept of classifications for tasks. We use task by means of
connection between RBAC and ABAC models. Also we discuss the effect of the
new integration model. An improved access control model for enterprise
environment is examined and a task–role-based access control (T–RBAC) model is
founded on the concept of classification of tasks [107]. T–RBAC deals with each
task differently according to its class, and supports task level access control and
supervision role hierarchy. Demurjian et al. [103] presented a new constraint base
security model to implement distributed database to enhance sensitive data security
of Mandatory Access Control security system (MAC), and increase communication
among each distributed site in current government classified information system and
achieve data replication.
Another most important way to control the access is using lattice based information
flow control [114]. The first article for access control over data streams supports a
very expressive access control model and, at the same time, is, as much as possible,
independent from the target DSMS [115]. The lattice structure is based on a typical
lightweight big data stream. The lattice is designed to control the information flow
and map data from a source lattice to the destination lattice.
2.5 Comparison
Table 2-2 presents the possible threats and attacks of IoT generated big data streams
and the classification is based on the CIA (Confidentiality, Integrity, and
Availability) triad from the previous section. As we have classified the complete
security threats and solutions over big data streams in the CIA triad in Section 4,
here in Table 2-2 we give the pictorial classifications. As seen from Table 2-2,
security threats in big data streams always fall in any properties of CIA triad.
Protocols in [54, 86, 87, 90, 92, 93] focus solely on data confidentiality. Protocols in
[55, 63, 69, 91, 95 - 99] defined the security threats, which falls in the integrity
attacks on big data stream, whereas, protocols in [103 - 115] shows the threats
against stream data availability.
57
Table 2-2: Possible threats of IoT generated big data streams in CIA triad representation Confidentiality
References [54, 86, 87,
90, 92, 93]
Integrity
References [55, 63, 69, 91, 95
- 99]
Availability
References [103 - 115]
Access Authorization
Spoofed, Altered, or Replayed data stream Information
Mandatory Access Control System
Attack Against Privacy
Selective Forwarding Role Based Access Control
Monitor and Eavesdropping
Sinkhole View Based Access Control
Traffic Analysis Sybil Activity based access control
Camouflage Adversaries
Wormholes Attribute based access control
Hello Flood Attacks Proximity based access control
Acknowledgment Spoofing
Encryption based access control
Desynchronisation Privilege State based access control
Time Synchronisation Risk based access control
Eavesdropping (Passive Attacks)
Discretionary access control
Dropping Data Packets Attack
Information flow control model
Selfish Behaviour on Data Forwarding
Table 2-3 shows a comparative evaluation of existing security literature based on the
classification criteria defined in Section 4. Most security models with potential
solutions for big data streams classified in Table 2-3. This table shows various
properties of security have received little attention in the literature, such as
confidentiality, authentication, integrity and availability. We classified the security
threats and solutions of big data stream.
58
Table 2-3: comparison of IoT generated big data stream security threats and solutions according to CIA triad method.
Protocol name Confidentiality Authentication Integrity Availability
SPINS [91] × ×
SPREAD [87] × × ×
Robust Privacy Protection [90]
× × ×
GossiCrypt [92] × × ×
Key Management for Data Confidentiality [93]
×
Traffic Analysis [94]
× × ×
INSENS [11] × ×
Secure Time Synchronisation [99]
× × ×
Routing Security [102]
× × ×
SECURITY IN WSN [100]
× ×
SRPSN [85] × ×
LPR [88] × × ×
Sybil Attack in Sensor Networks [97]
× × ×
MASK [89] × ×
Lightweight and Secure TFTP [78]
× ×
NAPS [71] × ×
MAC [103] × ×
RBAC [104] × × ×
TRBAC [105] × × ×
T–RBAC [107] × × ×
Integration Model of RBAC and ABAC [107]
× × ×
59
DBMask [108] ×
Prox-RBAC [109]
×
encryption-based access control [110]
× ×
PSAC [111] × ×
Risk-based Access Control [112]
× × ×
Discretionary Access Control [13]
× ×
DPBSV [4] Partial ×
DLSeF [5] Partial ×
SEEN ×
2.6 Summary
A glimpse of the IoT may be already visible in current deployments where networks
of smart sensing devices are being interconnected with a wireless medium, and IP-
based standard technologies will be fundamental in providing a common and well
accepted ground for the development and deployment of new IoT applications.
According to the 4Vs features of big data, the current data stream heading towards
the new term as big data stream where sources are the IoT smart sensing devices.
Considering that security may be an enabling factor of many of IoT applications,
mechanisms to secure data stream using data in flow for the IoT will be fundamental.
With such aspects in mind, in the survey we perform an exhaustive analysis on the
security protocols and mechanisms available to protect big data streams on IoT
applications. We also address existing research proposals and challenges providing
opportunities for future research work in the area.
In Table 2-2 we summarise the security threats over big data streams by following
the CIA triad. In Table 2-3 we summarise the main characteristics of the
mechanisms and proposals analysed throughout the survey, together with its security
60
properties and existing solutions to support the CIA triad. In conclusion, we believe
this survey may provide an important contribution to the research community, by
documenting the current status of this important and very dynamic area of research,
helping readers interested in developing new solutions to address security in the
context of IoT generated big data streams.
61
Chapter 3
Security Verification Framework for
Big Sensing Data Streams
From this chapter on, we begin to explore research problems with the solutions on
big sensing data stream security issues. While dealing with big sensing data streams
in sensor networks, a DSM must always perform the security verification (i.e.
authenticity, integrity, and confidentiality) of the data to ensure an end-to-end
security as the medium of communication is untrusted, and malicious attackers
could access and modify the data. Existing technologies for data security verification
are not suitable for data streaming applications, as the verification in real time
introduces a delay in the data stream. This chapter proposes a Dynamic Prime
Number Based Security Verification (DPBSV) scheme for big data streams. This
scheme is based on a common shared key that is updated dynamically by generating
synchronised prime numbers. The common shared key updates at both ends, i.e.
source sensor and DSM, without further communication after handshaking.
Moreover, the proposed security mechanism not only reduces the verification time
or buffer size in DSM, but also strengthens the security of the data by constantly
changing the shared keys.
62
3.1 Introduction
A large number of application scenarios (e.g. telecommunications, network
security, large-scale sensor networks, SCADA) require real-time processing of data
streams, where the application of the traditional “store-and-process” method is
limited [24]. There is an extensive variety of applications for data stream processing
in the cloud (e.g. data from large scale sensors, information monitoring, web
exploring, data from social networks like Twitter and Facebook, surveillance data
analysis, financial data analysis). These applications are described at this very
moment, ongoing, and have a large volume of data input, and consequently require
an alternate ideal model of data processing. As a result, a new computing paradigm
based on Stream Processing Engines (SPEs) has appeared. SPEs deal with the
specific types of challenges and are intended to process data streams with a minimal
delay [23, 25 - 27]. In SPEs, data streams are processed in real time (i.e. on-the-fly)
rather than batch processing after storing the data in the cloud as shown in Figure 3-1.
Several applications such as network monitoring and fraud detection by
surveillance camera are approaching the bottleneck of current data streaming
infrastructures [167]. These applications require real-time processing of very high-
volume and high-velocity data streams (also known as big data streams). A big data
stream is continuous in nature and it is important to perform real-time analysis as the
Figure 3-1: A simplified view of a DSMS to process and analyse input data
stream [23].
63
lifetime of the data is often very short (data is accessed only once) [28 - 29]. As the
volume and velocity of the data is so high, there is not enough space to store and
process; hence, the traditional batch computing model is not suitable. Cloud
computing has become a platform of choice due to its extremely low-latency and
massively parallel processing architecture [30]. It supports the most efficient way to
obtain actionable information from big data streams [28, 31 - 33].
Big data stream processing has become an important research topic in the current
era, whereas the data stream security has received little attention from researchers.
Some of these data streams are analysed and used in very critical applications (e.g.
surveillance data, military applications, etc), where data streams need to be secured in
every aspect to detect malicious activity. The problem is exacerbated when thousands
to millions of small sensors in self-organising wireless networks become the sources
of the data stream. How can we provide security for big data streams? In addition,
compared to conventional store-and-process, these sensors will have limited
processing power, storage, bandwidth, and energy. Furthermore, data streams ought
to be processed on-the-fly in a prescribed sequence. This chapter addresses these
issues by designing an efficient architecture for real-time processing of big sensing
data streams, and the corresponding security scheme.
In order to address the challenge, we have designed and developed a Dynamic
Prime-Number Based Security Verification (DPBSV) scheme. This scheme takes in
to account a typical shared key that is updated dynamically by producing synchronise
prime numbers. The synchronised prime number generation at both source sensing
device and DSM enables reduction of the communication overhead without
compromising security. Due to the reduced communication overhead, this scheme is
suitable for big data streams as it verifies the security on-the-fly (near real time). The
proposed scheme uses smaller key length (64-bit). This enables faster security
processing at DSM without compromising security. The same level of security is
accomplished by changing the key progressively in a specific interval of time.
Dynamic key generation is based on random prime numbers, which initialise and
synchronise at source sensors and DSM without further communications between
them after handshaking. Due to the reduced key length, the scheme is suitable for
processing high volumes of data without any delay. This makes DPBSV highly
efficient at DSM for processing secured big data streams.
64
The remainder of this chapter is organised as follows: preliminary to the chapter
is reviewed in the next section, Section 3.3 provides the research challenges and
research motivations, Section 3.4 describes the DPBSV key exchange scheme,
Section 3.5 presents the security analysis of the scheme formally, Section 3.6
evaluates the performance and efficiency of the scheme through experimental results
and Section 3.7 summarises the contributions in this chapter.
3.2 Preliminaries to the Chapter
One of the security threats is the man-in-the-middle attack, in which a malicious
attacker can access or modify the data stream from sensors. As described from
Introduction section, even symmetric keys fail to meet the requirements of real-time
processing of big sensing data streams. So, there is a need for an efficient scheme for
securing big data streams. The possible types of attacks in big data streams are
attacks on authenticity, confidentiality and integrity. This chapter addresses the
authentication, confidentiality and integrity attacks and proposes a solution to process
efficient security verification of data streams in real-time.
The Data Encryption Standard (DES) has been a standard symmetric key
algorithm since 1977. However, it can be cracked quickly and inexpensively. In
2000, the Advanced Encryption Standard (AES) [31] replaced the DES to meet the
ever-increasing requirements of data security. The Advanced Encryption Standard
(AES), also known as the Rijndael algorithm, is a symmetric block cipher that can
encrypt data blocks of 128 bits using symmetric keys of 128, 192 or 256 bits [38 -
40]. AES was introduced to replace the Triple DES (3DES) algorithm used for a
significant length of time universally. AES was acquainted with supplant the Triple
DES (3DES) algorithm utilised for a significant length of time all around. Hence, we
have compared the proposed solution against AES.
We also assume that deployed source nodes operate in two modes: trusted and
untrusted. In the trusted mode, the nodes operate in a cryptographically secure space
and adversaries cannot penetrate this space. Nodes can incorporate Trusted Platform
Module (TPM) to design trusted mode of operation. The TPM is a dedicated security
chip following the Trust Computing standard specification for cryptographic
65
microcontroller systems [134]. TPM provides a cost effective way of “hardening”
many recently deployed applications, those are previously based on software
encryption algorithms with keys kept on a host’s disk [134]. It provides hardware
based trust, which contains cryptographic functionality like key generation, store, and
management in hardware. The detailed architecture is at [134]. We assume that the
proposed prime number generation procedure Prime (Pi) and secret key calculation
operate in the trusted mode.
The proposed scheme is efficient in comparison to AES, as it reduces the
computational load and execution time significantly compared to the original AES;
furthermore, it also strengthens the security of the data, which is the main research
contribution of this chapter.
3.3 Research Challenges and Research Motivation
This section presents the research challenges and motivations in detail. Here we
have highlighted the challenges towards the proposed approach followed by
motivations to the research problem by following our architectural diagram as
shown in Figure 3-2.
Figure 3-2: Overlay of our architecture from sensing device to cloud data
processing centre.
66
3.3.1 Research challenges
As discussed earlier in this section, a symmetric cryptographic solution is the best
way to protect data in faster processing time. Existing symmetric cryptographic based
security solutions for data security are either static shared key or centralised dynamic
key. In static shared key, we need to have a long key to defend against a potential
attacker. Length of the key is always proportional to security verification time. From
the required features of big data streams specified in the last subsection, it is clear
that security verification should be in real-time. For the dynamic key, centralising
processor rekeying and distributing keys to all the sources is a time consuming
process. A big data stream is always continuous in nature and huge in size. This
makes it impossible to halt data for rekeying, distribution to the sources and
synchronisation with DSM. To address this problem, we are proposing a scheme for
big data stream security verification without the need for key exchange for rekeying.
The additional benefit of this is that it reduces the communication overhead and
increases the efficiency of the security verification process at DSM.
The common problem in the data flow between sensors and DSM is that attackers
may read the data in the middle while still in transit. Existing solutions to this
problem are based on the symmetric key algorithm. The periodic key update message
in symmetric key algorithms may disclose secret information, which may result in an
intruder getting information about the encryption process. Even when a nonce is used
at the periodic packet, an intruder always comes to know when the server is going to
change the key. This increases the chances of attacks in future. In this proposed
scheme, key exchanges happen only once as described before, but the shared key is
updated periodically with equal time intervals. Synchronisation between a source and
DSM is important in dynamic symmetric key update; otherwise it will result in
unsuccessful security verification.
Buffer size for the security verification is another major issue because of the
volume and velocity of big data streams. According to the features of big data
streams (i.e. 4Vs) we cannot halt the data for more time before security verification.
This leads to appointing a bigger buffer size and may reduce the performance of
SPEs. So buffer size reduction is one of the major challenges for big data streams.
67
Proposed security solutions for big data streams should deal with reduction of buffer
size (smaller buffer size).
The proposed scheme is as follows: we use a common shared key for both sensors
and DSM. The key is updated dynamically by generating synchronised prime
numbers without having further communication between them. This reduces the
communication overhead, required by rekeying in existing methods, without
compromising security. Due to the reduced communication overhead, this scheme
performs the security verification with minimum delay and reduced buffer usage. The
communication is required at the beginning for the initial key establishment and
synchronisation because DSM sends all the keys and key generation properties to the
sources in this step. There will not be further communication between the source
sensor and DSM after handshaking, which increases the efficiency of the solution.
Based on the shared key properties, individual source sensors update their dynamic
key independently.
3.3.2 Research motivation
The four most important features of the big data stream from the point of view of
security verification:
1. Security verification needs to be performed in near real time (on-the-fly).
2. Verification framework has to deal with high volume and high velocity data.
3. Data items can be read once in the prescribed sequence.
4. Unlike the store-and-process paradigm, original data is not available for
comparisons in the context of the stream processing paradigm.
68
In light of the above features and properties of big data streams, we classified
existing security systems into two classes: Communication Security [10 - 12] and
Server side data security [13 – 16]. Communication security deals with data security
when it is in motion and server side security deals with data security when it is at rest.
The security threats and solutions proposed in the literature outlined in the following
section are either dealing with the data stored at the server/cloud or the data flow.
They are not suitable to use in big data streams for the following reasons.
Communication security is primarily proposed for network communication and
communication related attacks are broadly divided into two types i.e. external and
internal. To avoid such attacks, security solutions have been proposed for every
individual TCP/IP layer. Several security solutions exist to avoid these
communication threats but are not suitable according to the properties of big data
stream stated above. Server side data security is basically proposed for physical data
centres, when data is at rest and accessed through applications. There are several
proposed security solutions to overcome from server side data security. Those
proposed solutions are suitable for store-and-process, however they are not plausible
for big data streams.
Another major motivation is to perform the security verification on near real time
in order to synchronise with the processing speed of SPEs [82]. Stream data analysis
performance should not degrade because of security processing time, there are
Figure 3-3: Pair of dynamic relative prime number generation, one at DSM, and another in distributed sensor node, are maintained with a standard time interval. Information is communicated from the sensors to the DSM only if encrypted with the Pi based secret key.
69
several applications need to perform data analysis on real time. According to the
features of big data streams, existing security solutions needs huge buffer size to
process security verification. It is simply impossible to maintain such big buffers for
data streams because of the continuous nature of data. So a lightweight security
mechanism is very important to perform security verification.
Table 3-1. Notations
Acronym Description
ith Sensor’s ID.
ith Sensor’s Secret key.
ith Sensor’s Session Key.
Generated key for the authentication.
Secret key calculated by the sensor and DSM.
Encrypted with sensor’s secret key for user authentication.
Calculated hash value.
Pseudorandom number generated by the sensors.
Interval time to generate the prime number.
Random prime number.
Secret key of the DSM.
k Initial shared key for sensor and DSM for authentication.
j Integrity checking interval.
Encrypted data for integrity check.
Secret key for authenticity check.
Encryption function.
One-way hash function.
Random prime number generation function.
KeyGen Key generation procedure.
Bitwise X-OR operation.
Concatenation operation.
Fresh data at sensor before encryption.
Retrieve key from DSM database by knowing specific source.
Randomly generate the key.
70
3.4 Dynamic Prime-Number Based Security Verification
This section describes the DPBSV scheme for big sensing data streams. Similar to
any secret key based symmetric key cryptography, the proposed DPBSV scheme
consists of four independent components: system setup, handshaking, rekeying, and
security verification. Table 3-1 provides the notations used in describing the scheme.
We next describe the security scheme in detail.
3.4.1 DPBSV System Setup
We have made various sensible and practical assumptions while characterizing the
security scheme. First, we assume that DSM has all deployed sensors’ identities (IDs)
and secret keys at the time of deployment because the network is fully untrusted. We
increase the number of key exchanges between the sensors and DSM for the starting
session key establishment process to accomplish better security. The main aim is to
make this session more secure because we transmit all the secret information of
KeyGen to individual source sensors. Second, we assume that each sensor node Si
knows the identity of its DSM and both maintain the same secret key i.e. k for initial
authentication process.
Step 1:
A sensor (Si) generates a pseudorandom number r and sends it to the DSM in
association with its own identity as {Si, r}. There are n numbers of sensors deployed
in the area such as S1, S2, S3, ..., Sn and Si is denoted as the id of ith sensor. In this
security scheme, sensors never communicate between each other to reduce the
communication overhead. It also updates the dynamic shared key on both ends to
prevent potential attacks or key compromise from traffic behaviour analysis. Initially
both sensors and DSM maintain a secret key i.e. k for the authentication process.
1) Si → DSM: {Si, r}.
Step 2:
When the DSM receives the request from a sensor, it retrieves the corresponding
sensor’s secret key, i.e., , DSM selects a random session key
. In order to share this with corresponding senor (Si), DSM generates a key based
71
on a selected session key and the corresponding sensor’s private key i.e.
. Then DSM encrypts the generated key with the session key
and it performs the hash function to generate C, i.e.
. Finally, DSM sends the value of C and to Si. The complete computational
steps are listed as follows.
,
, by using random selected session key
(3-1)
2) Si ← DSM: { , }
Step 3:
The corresponding sensor gets the authentication packet from DSM and starts
calculating its session key from based on its own secret key i.e.
. The sensor finds out the value of based on the value of and , i.e.
by using the initial secret key k. Then it gets the hash
from Equation 3-1 and checks whether or not it is equal to C. If the hashes are equal
and , Si can authenticate DSM. However, if it is not equal, then Si ends the
protocol. Following the authentication, it transmits to
DSM as follows by identifying sensor failure to authenticate the DSM.
, to extract the session key for own.
, to authenticate the DSM
(3-2)
3) Si → DSM: { }.
Step 4:
DSM receives , DSM compares it with , which is computed
from Equation 3-2 to see whether or not they are equal. If yes, DSM authenticates Si.
Otherwise, the protocol is terminated. After authentication of both parties, the DSM
and sensors can share the session key and .
72
(3-3)
4) Si ← DSM: { }
3.4.2 DPBSV Handshaking
The DSM sends its all properties to sensors (S1, S2, S2, …, Sn) based on their
individual session key. Generally, the larger the prime number of secret shares used
in the pairwise key establishment process, the better security the pairwise key will
achieve. However, using a larger prime number for the secret shares requires a
greater computation time. In order to make the security verification lighter and faster,
we reduce the prime number size.
The dynamic prime number generation function is defined in Theorem 2 later. We
calculate the prime number on both source and DSM sides to reduce communication
overhead and minimize the chances of disclosing the shared key.
Step 5:
computes the relative prime number on both sides with a time interval
t. In the handshaking process, DSM transmits all its procedures to generate the key
and prime number like to individual sensors
by encrypting with the initial shared key (k).
5) Si ← DSM: { }
In this step, DSM sends all the parameters and properties of KeyGen to source
sensors. All of this transferred information is stored in trusted parts of sensors (e.g.
TPM).
3.4.3 DPBSV Rekeying
We propose a novel rekeying concept by calculating prime numbers dynamically
on both source sensors and DSM. Figure 3-3 shows the synchronisation of the shared
key. In this security scheme, a smaller size of the key makes the security verification
faster. But we change the key very frequently in the DPBSV rekeying process to
ensure that the protocol remains secure. If any types of damage happen at the source,
the corresponding sensor is desynchronised with DSM. The source sensor follows
73
Step 3 to reinitialise and synchronise with DSM. According to our assumption, we
store all the secret information at a trusted part of the sensor. So the sensor can
reinitialise the synchronisation by sending its own identity to DSM. Once DSM
authenticates the source sensor, it sends the current key and time of key generation.
Authenticated sensors can update the next key by using the key generation process
from a secure module of the sensor (TPM).
Rekeying is often accomplished by running initial exchanges all over again. The
following presents an alternative approach to rekeying and the corresponding
analysis in terms of efficiency.
Step 6:
The above defined DPBSV Handshaking process makes sensors aware about the
Prime (Pi) and KeyGen. We now describe the complete secure data transmission and
verification process using those functions and keys. As mentioned above, this
security scheme uses the synchronised dynamic prime number generation Prime (Pi)
on both sides, i.e. sensors and DSM as shown in Figure 3-3. At the end of the
handshaking process, sensors have their own secret keys, initial prime number and
initial shared key generated by the DSM. The next prime generation process is based
on the current prime number and the given time interval. Sensors generate the shared
key using the prime number and DSM secret key . Each
data block is associated with the authentication tag and contains two different parts.
One is encrypted DATA based on its secret key and shared key for integrity
checking (i.e., ), and the other part is for the authenticity
checking (i.e., ). The resulting data block is:
. The key generation and individual block encryption process listed are
as follows.
(3-4)
6) Si → DSM: { }.
74
3.4.4 DPBSV Security Verification
Security verification should be performed in real time (with minimal delay) based
on the features of big data streams stated above. In the following step, we perform the
security verification of the proposed scheme. In this step, DSM verifies for
authenticity in each individual data block and for integrity in specific selected data
blocks. The aim is to maintain the end-to-end security of the proposed scheme.
Step 7:
The DSM verifies whether the data is modified or comes from an authenticated
node. As DSM has the common initial shared key, it decrypts the complete block to
find out the individual data blocks for the integrity and authenticity check. The DSM
first checks for the authenticity in each data block and checks for the integrity
with random interval data blocks . This random value is calculated based on the
corresponding prime number i.e. . The calculated values vary from 0 to 6,
i.e. the maximum interval of 6 blocks and if the value of j is 0, then it will not skip
any data block. For the authenticity check, the DSM decrypts with shared
key . Once Si is obtained, the DSM checks its source database and
extracts the corresponding secret key for the integrity check according to the value
of j. Given , the DSM calculates/decrypts data and checks MAC for integrity
check . All the security verification process happens based
on shared key from Equation 3-4.
The complete mechanism beginning from source sensing device and DSM
authentication to handshaking, security verification mentioned in algorithmic format
is shown in Algorithm 3-1. Algorithm 3-1 represents the description of the proposed
mechanism in the stepwise process.
Algorithm 3-1. Security Framework for Big Sensing Data Stream
Description Based on the dynamic prime number generation at both source
sensor and DSM end, the proposed security framework of big
sensing data streams works more efficiently without
75
compromising security strength.
Input Process inputs the prime generation process , key
generation process , sensor and DSM secret key and
session key for handshaking.
Output Successful security verification without any malicious attack
and comparatively faster security verification than standard
symmetric key solution (AES).
Step 1 DPBSV System setup
1.1 Si → DSM: {Si, r}, ith sensor sends its random number with its identity 1.2 Si ← DSM: { , }, DSM identifies the sensor and generates the
session key for it. Then DSM encrypts and sends back to the ith sensor 1.3 Si → DSM: { }, ith sensor identifies the DSM based on its own secret
key. If sender is not authenticated then it starts authentication transaction.
1.4 Si ← DSM: { } DSM authenticates the last transaction and sends back to ith sensor with this format. Otherwise protocol terminates to start the new process.
Step 2 DPBSV Handshaking
DSM sends its properties to individual sensors based on their individual session key. It includes the prime number generation and time interval to generation etc.
2.1 DSM ← Si: { ( )}, for details refer to Table 3-1.
Step 3 DPBSV Rekeying
Key updates on both source sensor and DSM and they are aware of the Prime (Pi) and KeyGen. Sensors generate the shared key
and each data block is associated with two different parts. One is encrypted i.e., and another for authenticity checking i.e., .
3.1 Si → DSM: { }, these blocks for authentication, integration, and confidentiality checks.
Step 4 DPBSV Security Verification
The DSM checks for authenticity in each data block and checks for the integrity with random interval data blocks and random value is calculated based on the corresponding prime number i.e. .
4.1 For the authenticity check, the DSM gets source ID. Once Si is obtained, the DSM checks source database and extracts corresponding secret key
for the integrity check according to the value of j. 4.2
Given , the DSM calculates/decrypts data and checks MAC for integrity check.
76
3.5 Security Analysis
This section provides theoretical analysis of the security scheme to show that it is
safe against attacks on authenticity, confidentiality and integrity.
3.5.1 Security Proof
Assumption 1: Any participant in the scheme cannot decrypt data that was
encrypted by a symmetric-key algorithm, unless it has the session/shared key which
was used to encrypt the data at the source side.
Assumption 2: As DSM is located at the cloud server side, we assume that DSM is
fully trusted and no one can attack it.
Assumption 3: Sensor’s secret key, Prime (Pi) and secret key calculation procedures
reside inside trusted parts of the sensor (like TPM) so that they are not available to
intruders.
Similar to most cryptological analyses to public-key communication protocols, we
now define the attack models for the purpose of verifying the authenticity,
confidentiality and integrity.
Definition 1 (attack on authentication): A malicious attacker Ma is an adversary who
is capable of monitoring, intercepting, and introducing itself as an authenticated
source node to send data in the data stream.
Definition 2 (attack on confidentiality): A malicious attacker Mc is an unauthorised
party who has the ability to access or view the unauthorised data stream before it
reaches DSM.
Definition 3 (attack on integrity): A malicious attacker Mi attacks integrity, and is an
adversary capable of monitoring the data stream regularly and trying to access and
modify the data blocks before they reach DSM.
Theorem 1: The security is not compromised by reducing the size of shared key
( ).
77
Proof: We reduce the size of the prime number to make the key generation process
faster and more efficient. The ECRYPT II recommendations on key length say that a
128-bit symmetric key provides the same strength of protection as a 3,248-bit
asymmetric key. Low length of key also provides more security in a symmetric key
algorithm because it is never shared publicly. Advanced processor (Intel i7
Processor) took about 1.7 nanoseconds to try out one key from one block. With this
speed it would take about 1.3 × 1012 × the age of the universe to check all the keys
from the possible key set [35]. By reducing the size of the prime number, we fixed
the key length to 64-bit to make the security verification faster at DSM using the
data from Table 3-2. From Table 3-2, a 64-bit symmetric key takes 3136e +19
nanoseconds (more than a month), so we fixed interval time to generate prime
number as a week (i.e. t=168 hours). The dynamic shared key calculates based on
the calculated prime number. Based on this calculation, we conclude that an attacker
cannot calculate within the interval time t. we are changing the shared key without
exchanging information between the sensors and DSM. Brute-force attack may be
able to get the shared key once an intruder has the key length, but this possibility is
also associated with 128-bit cryptographic solution. It confuses malicious nodes that
are listening to the data flow continuously. The key has already been changed four
times before an attacker knows the key and this knowledge is not known to the
attackers. This leads to the conclusion that even if we reduced the key size to 64 bit,
we get the same security strength by changing the key in time interval t.
Table 3-2: Notations Symmetric key (AES) algorithm time taken to get all
possible keys using most advanced Intel i7 Processor.
Key Length 8 16 32 64 128
Key domain
size
256 65536 4.295e+09 1.845e +19 3.4028e+38
Time (in
nanoseconds)
1435.2 1e+05 7.301e+09 3136e +19 5.7848e+35
Theorem 2: Relative prime number Pi calculated in Algorithm 3-2 synchronises
between the source sensors (Si) and DSM.
78
Proof: The normal method to check the prime number is 6k+1, k N+ (an integer).
Here, we initially initialise the value of k based on this primary test formula. Our
prime generation method is based on this concept and from the extended idea of
[117]. In this security scheme, the input Pi is the currently used prime number
(initialised by DSM) and the return Pi is the calculated new prime number. Intially Pi
is intialized by the DSM at DPBSV Handshaking process and the interval time is t
seconds.
Algorithm 3-2. Dynamic Prime Number Generation Prime ( )
1. 2. Set 3. Set 4. If then 5. 6. GO TO: 3 7. If S then 8. GO TO: 14 9. Set 10. If S then 11. GO TO: 14 12. 13. GO TO: 3 14. 15. Return ( ) // calculated new prime number
From Algorithm 3-2, we calculate the new prime number based on the previous
one . The complete process of the prime number calculation is based on the value
of m and m is initialised from the value k. The value of k is constant at source
because it is calculated from the current prime number. This process is initialised
during DPBSV Handshaking. Since the value of k is the same on both sides, the
procedure Prime (Pi) returns identical values. In Algorithm 3-2, the value of S(m) is
computed as follows.
79
(3-5)
If from equation 3-5 then x is prime, otherwise x is not a prime.
The following procedure validates the above features
if x is prime
Then put the value of x as a prime number, then
k within the specified range i.e , then
(3-6)
Same is also 1 as shown in Equation 3-6 and then
(3-7)
Hence, the property of is proved.
Theorem 3: An attacker Ma cannot read the secret information from sensor node (Si)
or introduce itself as an authenticated node in DPBSV.
Proof: Following Definition 1, we know that an attacker Ma can gain access to the
shared key by monitoring the network thoroughly, but Ma cannot get secret
information such as Prime (Pi) and KeyGen. Considering the computational
hardness of secure modules (such as TPM), we know that Ma cannot get the secret
information for Pi generation, Ki and KeyGen. So there are no possibilities for the
malicious node to trap a sensor and process according to it, but Ma can introduce
him/herself as the authenticated node to send its information. In this security scheme,
sensor (Si) sends , where the second part of
the data block is used for the authentication check. DSM decrypts this
80
part of the data block for authentication checks. DSM retrieves Si after decryption
and matches corresponding Si within its database. If the calculated Si matches with
the DSM database, it accepts; otherwise it rejects the node as a source and it is not
an authenticated sensor node. All required secured information for prime number
and key generation procedure is stored at trusted parts of the sensor node (i.e. TPM).
According to the features of TPM, an attacker cannot get the information from TPM
as discussed before. Hence we conclude that attacker Ma cannot attack big data
streams.
The proposed scheme avoids or drops the data blocks which are from malicious
sources with minimum computation time by process only during
authentication. The proposed mechanism also able to avoid DDoS attack.
Theorem 4: An attacker Mc cannot access or view the unauthorised data stream in
the proposed DPBSV.
Proof: Following Algorithm 3-2, we found prime number is generated at
sensors and DSM dynamically without any further communication. Shared secret
key calculates based on the generated prime number. Considering the
computational hardness of secure modules (such as TPM), we know that Mc cannot
get the secret information for Pi generation, Ki and KeyGen within the time frame.
Following Definition 2, we know that an attacker Mc can gain access to the shared
key but no other information. In this security scheme, source sensor (Si) sends
data blocks in the format , where the first part
of the data block contains the original data. Getting the
original data is impossible from this because Mc does not have other
information and at the same time shared key updates dynamically at equal
intervals of time (t). As the data is protected and cannot be read within the time
frame (i.e. before update of the shared key),
Theorem 5: An attacker Mi cannot read the shared key within the time interval t
in DPBSV scheme.
Proof: Following Definition 3, we know that an attacker Mi has full access to the
network to read the shared key , but Mi cannot get correct secret information
such as KSH. Considering the method described in Theorem 1, we know that Mi
81
cannot get the currently used KSH within the time interval t, because the proposed
scheme calculates Pi randomly after time t and then uses the value Pi sensor to
generate KSH. For more details on computation analysis, refer to Theorem 1.
3.5.2 Forward Secrecy
As with other symmetric key procedures, shared keys used for encrypting
communications are only used for certain periods of time (t) until the new prime
number is generated. Thus, a previously used shared key or secret keying material is
worthless to a malicious opponent even when a previously-used secret key is known
to the attackers. This is one of the major advantages of frequent changing of the
shared key. This is one of the reasons we did not choose symmetric key cryptography
or an asymmetric-key encryption algorithm.
3.6 Experiment and Evaluation
The proposed DPBSV security scheme is generic though it is deployed in big
sensing data streams in this chapter. In order to evaluate the efficiency and
effectiveness of the proposed architecture and protocol, even under adverse
conditions, we observe each individual data blocks for authentication checks and
selected data blocks for integrity attacks. The integrity attack verification interval is
dynamic in nature and the data verification is done at the DSM only.
To validate the proposed security scheme, we experimented in multiple
simulation environments to validate that the security mechanism works perfectly in
(a) (b)
Figure 3-4: The sensors used for experiment (a) Z1 low power sensor. (b)
TmoteSky ultra low power sensor.
82
big sensing data streams. We first measured the performance of sensor nodes using
COOJA in Contiki OS [118], then verified the security scheme using Scyther [119],
and finally measured the efficiency of the scheme using JCE (Java Cryptographic
Environment) [120]. We also checked the minimum buffer size required to process
this proposed scheme and compared with the standard AES algorithm using Matlab
[121].
3.6.1 Sensor Node Performance
We experimented with the performance of the sensor in COOJA simulator in
Contiki OS. We took the two most common types of sensor, i.e. Z1 and TmoteSky
sensors, for our experiment and performance checking as shown in Figure 3-4. In
this experiment, we checked the performance of sensors while computing or
updating the shared key.
Z1 sensor nodes are produced by Zolertia, which is a low-power WSN module
that is designed as a general purpose development platform for WSN researchers. It
is designed for maximum backwards compatibility with the successful Tmote like
family motes while improving the performance and maximum flexibility and
expandability with regards to any combination of power-supplies, sensors and
connectors. It supports the open source operating systems currently most employed
by the WSN community, like Contiki [118]. COOJA is a network simulator for
Contiki, which provides real time sensor node features to simulate.
A Z1 sensor node is equipped with the low power microcontroller MSP430F2617,
which features a powerful 16-bit RISC CPU @16MHz clock speed, built-in clock
factory calibration, 8KB RAM and a 92KB Flash memory. Z1 hardware selection
guarantees maximum efficiency and robustness with low energy cost. As TmoteSky
is an ultra-low power sensor, it is equipped with the low power microcontroller
MSP430F1611, which has built-in clock factory calibration, 10KB RAM and a
48KB Flash memory.
We successfully demonstrated in the COOJA Simulator that our key generation
process works successfully in both types of sensors i.e. Z1 sensor and TmoteSky
sensor. These sensors support the security mechanism. The energy consumption
83
during the key generation process is shown in Figure 3-5. This shows the normal
power consumption behaviour for the key generation process. From this experiment
we conclude that this proposed security verification mechanism DPBSV is
supported by most common types of sensors and feasible for big sensing data
streams.
3.6.2 Security Verification
The scheme is written in the Scyther simulation environment using Security
Protocol Description Language (.spdl). According to the features of Scyther, we
define the role of D and S, where S is the sender (i.e. sensor nodes) and D is the
recipient (i.e. DSM). In our scenario, D and S have all the required information that
is exchanged during the handshake process. This enables D and S to update their
own shared key. S sends the data packets to D and D performs the security
verification. In the simulation, we introduce two types of attacks. The first type of
attack is defined for the transmission between S and D (integrity) and the second
attack is defined where an adversary acquires the property of S and sends the attack
data packets to D (authentication). In this experiment, we evaluated all packets at D
(DSM) for security verification. We experimented with 100 runs for each claim (also
known as bounds) and found out the number of attacks at D as shown in Figure 3-6.
Apart from these, we follow the default properties of Scyther.
Attack model: Many types of cryptographic attacks can be considered. In this
case, we focus on integrity attacks, confidentiality attacks and authentication attacks
Figure 3-5: Estimated power consumption during the key generation process
84
as discussed above. In integrity attacks, an attacker can only observe encrypted data
blocks/packets travelling on the network that contain information about sensed data
as shown in Figure 3-2. The attacker can perform a brute force attack on captured
packets by systematically testing every possible key, and we assumed that he/she is
able to determine when the attack is successful. In confidentiality attack, the attacker
continuously observes the data flow and tries to read the data. In authentication
attacks, an attacker can observe a source node, and try to get the behaviour of the
source node. We assume that he/she is able to determine the source node’s
behaviour. In such cases, the attacker can introduce an authenticated node and act as
the original source node. In our concept, we are using trusted modules in sensors to
store the secret information and procedure for key generation and encryption (such
as TPM).
Experiment model: In practice, attacks may be more sophisticated and efficient
than brute force attacks. However, this does not affect the validity of the proposed
DPBSV scheme as we are interested in efficient security verification without
periodic key exchanges and successful attacks. Here, we model the process as
described in the previous section and fixed the key size at 64 bits (see Table 3-2).
We used Scyther, an automatic security protocol verification tool, to verify the
proposed mechanism.
Figure 3-6: Scyther simulation environment with parameters and result page
of successful security verification at DSM.
85
Results: We did the simulation using variable numbers of data blocks in each run.
The experiment ranges from 100 to 1000 instances with 100 intervals. We check
authentication for each data block, whereas the integrity check is performed on
selected data blocks. As the secure information such as
are stored within the trusted module of the sensor, no one can get access to
that information except the corresponding sensor. Without this information,
attackers cannot authenticate encrypted data blocks. Hence, we did not find any
attacks for authentication checks. For integrity attacks, it is hard to get the shared
key ( ), as we are frequently changing the shared key ( ) based on the dynamic
prime number on both source sensor ( ) and DSM. In the experiment, we did not
encounter any attack in integrity check. As the shared key is changing with time
interval t, an attacker cannot read a data stream within the time interval which leads
to the conclusion that the proposed mechanism provides weak confidentiality.
Figure 3-6 shows the result of security verification experiments in the Scyther
environment. This shows that the scheme is secured from integrity and
authentication attacks even after reducing key size. As we are updating the rekey
process at equal intervals of time, we found this security scheme is secured with 64
Figure 3-7: Performance of the security scheme compared in efficiency to 128
bit AES and 256 bit AES.
86
bit key length. From the observations above, we can conclude that the proposed
scheme is secure.
3.6.3 Performance Comparison
Experiment model: It is clear that the actual efficiency improvement brought by
this scheme highly depends on the size of key and rekeying without further
communication between sensor and DSM. We have performed experiments with
different sizes of data blocks. The results of the experiments are given below.
We compare the performance of the proposed scheme DPBSV with advanced
encryption standard (AES), the standard symmetric key encryption algorithm [38 -
39]. The scheme efficiency is compared with two standard symmetric key
algorithms such as 128-bit AES and 256-bit AES. This performance comparison
experiment was carried out in JCE (Java Cryptographic Environment), and we
compared the processing time with different data block size. This comparison is
based on the features of JCE in Java virtual machine version 1.6 64 bit. JCE is the
standard extension to the Java platform which provides a framework implementation
for cryptographic methods. We experimented with many-to-one communication. All
sensor nodes communicate to the single node (DSM). All sensors have similar
properties whereas the destination node has the properties of DSM (more powerful
to initialise the process). The rekey process is executed at all the nodes without any
intercommunication. Processing time of data verification is measured at the DSM
node. The experimental results are shown in Figure 3-7; the result validates the
theoretical analysis presented in Section 5.
Results: The performance of the scheme is better than the standard AES
algorithm when different sizes of data blocks are considered. Figure 3-7 shows the
processing time of the proposed DPBSV scheme in comparison with base 128-bit
AES and 256-bit AES for different sizes of the data block. The performance
comparison shows that the proposed scheme is efficient and faster than the baseline
AES protocols.
We calculated the time taken for DPBSV encryption and decryption in AMD K7-
700 MHz processor and compare with standard AES-128 bit algorithm [122]. Based
87
on the calculation DPBSV takes 3.2 microseconds and AES (128-bit) 35.8
microseconds for encryption, whereas DPBSV takes 3.3 microseconds and AES
(128-bit) 36 microseconds for decryption.
3.6.4 Required Buffer Size
Experiment model: We experimented with the features of the DSM buffer by
using MATLAB as the simulation tool [121]. This performance is based on the
processing time performance calculated in Figure 3-7. Here we compared this
security scheme with standard 128-bit AES and 256-bit AES as the processing time
performance comparison. The minimum buffer size required to process security
verification at DSM with various data rate starts from 50 to 200 MB/S with 50
interval. Performance comparison was used to measure the efficiency of the
proposed scheme (DPBSV).
Results: The performance of the scheme is better than the standard AES
algorithm with different rates of data. Figure 3-8 shows the minimum buffer size
required to process security at DSM and proposed DPBSV scheme performance
compared with base 128-bit AES and 256-bit AES. The performance comparison
Figure 3-8: Performance comparison of minimum buffer size required to
process the security verification with various data rates to DSM.
88
shows that the proposed scheme is efficient and required less buffer to process
security than the baseline AES protocols.
From the above experiments, we conclude that the proposed DPBSV scheme is
secured (from authenticity, confidentiality and integrity attacks), and efficient
(compare to standard symmetric algorithms such as 128-bit AES and 256-bit AES).
The proposed scheme also needs less buffer to process the security verification.
3.7 Summary
This chapter discussed a novel authenticated key exchange scheme, namely
Dynamic Prime-Number Based Security Verification (DPBSV), which aims to
provide an efficient and fast (on-the-fly) security verification scheme for big data
streams. The scheme has been designed based on symmetric key cryptography and
random prime number generation. By theoretical analyses and experimental
evaluations, we showed that the DPBSV scheme has provided significant
improvement in processing time, required less buffer for processing and prevented
malicious attacks on authenticity, confidentiality and integrity. In this security
method, we decrease the communication and computation overhead by dynamic key
initialisation at both sensor and DSM end, which in effect eliminates the need for
rekeying and decreases the communication overhead. DSM implements before
stream data processing as shown in the main architecture diagram. Several
applications (e.g. emergency management and event detection) need to discard
unwanted data and get original data for stream data analysis. The proposed security
verification scheme (i.e. DPBSV) performs in near real time to synchronise with the
performance of the stream processing engine. The main aim is not to degrade the
performance of stream processing such as Hadoop, S4, and Spark etc.
89
Chapter 4
Lightweight Security Protocol for Big
Sensing Data Streams
Chapter 3 solved the first important step of security verification in big sensing data
streams. The important step is to make the security verification model more
lightweight to satisfy the properties of big data streams. We refer to this as an online
security verification problem. To address this problem, we propose a Dynamic Key
Length Based Security Framework (DLSeF) based on a shared key derived from
synchronised prime numbers; the key is dynamically updated at short intervals to
thwart potential attacks to ensure end-to-end security. Theoretical analyses and
experimental results of the DLSeF framework show that it can significantly improve
the efficiency of processing stream data by reducing the security verification time
and buffer usage without compromising security.
4.1 Introduction
A variety of applications, such as emergency management, SCADA (Supervisory
Control and Data Acquisition), remote health monitoring, telecommunication fraud
detection and large scale sensing networks, require real-time processing of data
90
streams, where the traditional store-and-process method falls short of the challenge
[24]. These applications have been characterized as producing high speed, real-time,
sensitive and large volume data input, and therefore require a new paradigm of data
processing. The data in these applications falls in the big data category, as its size is
beyond the ability of typical database software tools and applications to capture,
store, manage and analyse in real time [123]. More formally, the characteristics of big
data are defined by “4Vs” [124 - 125]: Volume, Velocity, Variety, and Veracity; the
streaming data from a sensing source meets these characteristics. This chapter
focuses on providing end-to-end security for real-time high volume, high velocity
data streams.
A big data stream is continuous in nature and it is critical to perform real-time
analysis as: (i) the lifetime of the data is often very short (i.e. the data can be accessed
only once) [28 - 29] and (ii) the data is utilised for detecting events (e.g. flooding of
highways, collapse of railway bridge) in real-time in many risk-critical applications
(e.g. emergency management). Since a big data stream in risk-critical applications
has high volume and velocity and the processing has to be done in real-time, it is not
economically viable and practically feasible to store and then process (as done in the
traditional batch computing model). Hence, stream processing engines (e.g. Spark,
Storm, S4) have emerged in the recent past that have the capability to undertake real-
time big data processing. Stream processing engines offer two significant advantages.
Firstly, they circumvent the need to store large volumes of data and secondly, they
enable real-time computation over data as needed by emerging applications such as
emergency management and industrial control systems. Further, integration of stream
processing engines with elastic cloud computing resources has further revolutionized
big data stream computation as stream processing engines can now be easily scaled
[28, 31, 33] in response to changing volume and velocity.
Although stream data processing has been studied in recent years within the
database research community, the focus has been on query processing [126],
distribution [127] and data integration. Data security related issues, however, have
been largely ignored. Many emerging risk-critical applications, as discussed above,
need to process big streaming data while ensuring end-to-end security. For example,
consider emergency management applications that collect soil, weather, and water
data through field sensing devices. Data from these sensing devices are processed in
91
real-time to detect emergency events such as sudden flooding, and landslides on
railways and highways. In these applications, compromised data can lead to wrong
decisions and in some cases even loss of lives and critical public infrastructure.
Hence, the problem is how to ensure end-to-end security (i.e. confidentiality,
integrity, and authenticity) of such data streams in near real-time processing. We
refer to this as an online security verification problem.
The problem in processing big data becomes extremely challenging when millions
of small sensors in self-organising wireless networks are streaming data through
intermediaries to the data stream manager. In these cases, intermediaries as well as
the sensors are prone to different kinds of security attacks such as Man in the Middle
Attacks. In addition, these sensors have limited processing power, storage, and
energy; hence, there is a requirement to develop lightweight security verification
schemes. Furthermore, data streams need to be processed on-the-fly in the correct
sequence. This chapter addresses these issues by designing an efficient model for
online security verification of big data streams.
The most common approach for ensuring data security is to apply cryptographic
methods. In the literature, the two most common types of cryptographic encryption
methods are asymmetric and symmetric key encryption. Asymmetric key encryption
(e.g. RSA, ElGamal, DSS, YAK, Rabin) performs a number of exponential
operations over a large finite field and is therefore 1000 times slower than symmetric
key cryptography [34 - 35]. Hence, efficiency becomes an issue if an asymmetric key
such as Public Key Infrastructure (PKI) [37] is applied to securing big data streams.
Thus, symmetric key encryption is the most efficient cryptographic solution for such
applications. However, existing symmetric key methods (e.g. DES, AES, IDEA,
RC4) fail to meet the requirements of real-time security verification of big data
streams because the volume and velocity of a big data stream is very high (refer to
the performance evaluation section for the performance values). Hence, there is a
need to develop an efficient and scalable model for performing security verification
of big data streams. The main contributions of this chapter can be summarised as
follows:
We have designed and developed a Dynamic Key Length Based Secure
Framework (DLSeF) to provide end-to-end security for big data stream
processing. This model is based on a common shared key that is generated by
92
exploiting synchronised prime numbers. The proposed method avoids
excessive communication between data sources and Data Stream Manager
(DSM) for the rekey process. Hence, this leads to reduction in the overall
communication overhead. Due to this reduced communication overhead, this
model is able to do security verification on-the-fly (with minimum delay)
with minimal computational overhead.
The proposed model adopts a moving target approach, using a dynamic key
length from the set 128-bit, 64-bit, and 32-bit. This enables faster security
verification at DSM without compromising security. Hence, this model is
suitable for processing high volumes of data without any delay.
We compare the proposed model with the standard symmetric key solution
(AES) in order to evaluate the relative computational efficiency. The results
show that this security model performs better than the standard AES method.
The remainder of this chapter is organised as follows: preliminary to the chapter
is reviewed in the next section, Section 4.3 provides the research challenges and
research motivations, Section 4.4 describes the DPBSV key exchange scheme,
Section 4.5 presents the security analysis of this model formally, Section 4.6
evaluates the performance and efficiency of this model through experimental results
and Section 4.7 summarises the contributions in this chapter.
4.2 Preliminaries to the Chapter
Figure 4-1 shows the overall architecture for big data stream processing from
source sensing devices to the data processing centre including the proposed security
framework. Refer to [128] for further information on stream data processing in
datacentre clouds. In sensor networks, data packets from the sources are transmitted
to the sink (data collector) through multiple intermediary hops (e.g. routers and
gateways). Collected data at sink nodes are then forwarded to the DSM as data
streams may also pass through many untrusted intermediaries. The number of hops
and intermediaries depends on the network architecture designed for a particular
application. The intermediaries in the network may behave as a malicious attacker by
93
modifying and/or dropping data packets. Hence, traditional communication security
techniques [52, 91, 129] are not sufficient to provide end-to-end security. In this
framework, both queries and data security related techniques are handled by DSM in
coordination with the on-field deployed sensing devices. It is important to note that
the security verification of streaming data has to be performed before the query
processing phase and in near real-time (with minimal delay) with a fixed (small)
buffer size. The processed data are stored in the big data storage system supported by
cloud infrastructure [30]. Queries used in DSM are defined as “continuous” since
they are continuously applied to the streaming data. Results (e.g. significant events)
are pushed to the application/user each time the streaming data satisfies a predefined
query predicate.
The discussion of the architecture above clearly identifies the following most
important requirements for security verification for big data stream processing. In
summary, they include: (a) the security verification needs to be performed in real
time (on-the-fly), (b) the framework has to deal with a high volume of data at high
velocity, (c) the data items should be read once in the prescribed sequence, and (d)
the original data is not available for comparisons which are widely available in a
store-and-process batch processing paradigm. The above requirements need to be met
by a big data stream processing framework in addition to end-to-end data security as
stated in last chapter.
Figure 4-1: High level of architecture from source sensing device to big data processing centre.
94
Based on the above requirements of big data stream processing, we categorize
existing data security methods into two classes: communication security [131 - 132]
and server side data security [13 - 14]. To address this problem, we propose a
distributed and scalable model for big data stream security verification.
The Data Encryption Standard (DES) has been a standard symmetric key
algorithm since 1977. However, it can be cracked quickly and inexpensively. In
2000, the Advanced Encryption Standard (AES) [38] replaced the DES to meet the
ever increasing requirements of data security. The Rijndael algorithm, i.e. Advanced
Encryption Standard (AES), is a symmetric block cipher that encrypts data blocks of
128 bits using different sizes of symmetric keys such as 128, 192 or 256 bits [38 –
39, 132]. AES was introduced to replace the Triple DES (3DES) algorithm used for a
significant time universally. Hence, we have compared the proposed solution against
AES.
4.3 Research Challenges and Research Motivation
This section presents the research challenges and motivations in detail. Here we
have highlighted the challenges for the proposed approach followed by motivations
to the research problem.
4.3.1 Research Challenges
As discussed earlier, symmetric key cryptography is one of the best way to protect
big data stream data in a lightweight manner. Current symmetric key cryptographic
security solutions use a static shared key which is controlled in a centralised manner.
to security processing time i.e. for both encryption and decryption time. As the big
data streams volume and velocity is very high, the security verification should be in
near real-time and synchronise with the speed of the stream processing engine.
Another major concern is the communication overhead i.e. shared key initialisation
between DSM and source sensor. A big data stream is always continuous in nature
and huge in size. Initialising and distributing the shared key is a difficult task to
perform in real-time and in a centralised process. This can increase the efficiency of
95
the security verification process at DSM by initialising key generation at the source
end.
The common problem in the data flow between sensors and DSM is that attackers
may read the data in the middle while still in transit. Existing solutions to this
problem are based on the symmetric key algorithm. The periodic key update message
in symmetric key algorithms may disclose secret information, which may result in an
intruder getting information about the encryption process. Even when a nonce is used
at the periodic packet, an intruder always comes to know when the server is going to
change the key. This increases the chances of attacks in future. In this proposed
model, key exchanges happen only once as described before, but the shared key is
updated periodically with equal time intervals. Synchronisation between a source and
DSM is important in dynamic symmetric key update; otherwise it will result in
unsuccessful security verification.
Buffer size for the security verification is another major issue because of the
volume and velocity of big data streams. According to the features of big data
streams (i.e. 4Vs) we cannot halt the data for more time before security verification.
This leads us to appoint a bigger buffer size and may reduce the performance of
SPEs. So buffer size reduction is one of the major challenges for big data stream.
Proposed security solutions for big data stream should deal with reduction of buffer
size (smaller buffer size).
The proposed model is as follows: we use a common shared key for both sensors
and DSM. The shared key is updated dynamically by using dynamic prime numbers
without having further communication between them. The shared key is not only
changed before time but also changed in length. Our synchronisation method also
adopts a unique neighbour authentication technique to get the lost synchronisation
properties from its neighbours. The communication is required at the beginning for
the initial key establishment. There will not be further communication between the
source sensor and DSM after handshaking, which increases the efficiency of the
proposed solution. Based on the shared key properties, individual source sensors
updates their dynamic key independently. This method performs security verification
in a faster manner with minimum delay and also reduces the use of buffers.
96
4.3.2 Research Motivation
The four most important features of the big data stream from the point of view of
security verification as stated in the last chapter. In light of the above features and
properties of big data streams, we classified existing security systems into two
classes: Communication Security and Server side data security and proposed the
DPBSV model to deal with big data streams in the last chapter. We need a better
security solution to protect the big data streams in a lightweight or faster fashion. So
we need a faster and more efficient security solution to deal with big data streams.
Communication security protect data when it is in transit. There are two types of
communication attacks: external and internal. To avoid such attacks, security
solutions have to exist for every individual TCP/IP layer. Several security solutions
exist to avoid these communication threats but are not suitable according to the
properties of big data stream stated above.
4.4 DLSeF Lightweight Security Protocol
This security model is motivated by the concept of moving target defence. The
basic idea is that the keys are the targets of attacks by adversaries. If we keep on
moving the keys in spatial (dynamic key size) and temporal (same key size, but
different key) dimensions, we can achieve the required efficiency without
compromising the security. The proposed model, Dynamic Key Length Based
Security Framework (DLSeF), provides a robust security solution by changing both
key and key length dynamically. In this security model, if an intruder/attacker
eventually hacks the key, the data and time period is selected in such a way that
he/she cannot predict the key or its length for the next session. we argue that it is very
difficult for an intruder to guess the appropriate key and its length as the model
dynamically changes both across the sessions. Though the proposed model has weak
confidentiality (eventually the intruder may able to detect the keys if he/she has
sufficient processing and storage capabilities), it provides sufficient confidentiality
for the duration of online real-time processing. Hence, such a weak confidentiality
model is sufficient for a disaster management application scenario. It is important to
97
note that no compromise is made on the authenticity and integrity of the data, which
are important for making decisions from the data.
Similar to any secret key-based symmetric key cryptography, the DLSeF model
consists of four independent components and related processes: system setup,
handshaking, rekeying, and security verification. Stream processing is expected to be
performed in near real-time. The end-to-end delay is an important QoS parameter to
measure the performance of sensor networks [133]. We are collecting data from
sensor nodes to process for any emergency situation, data need to be collected at the
DSM in real time. So we assume there should not be much delay in data arrival at the
DSM for the model. Table 4-1 provides the notations used in the model. We next
describe the model.
4.4.1 DLSeF System setup
We have made various sensible and practical assumptions while characterizing
this model. First, we assume that DSM has all deployed sensors’ identities (IDs) and
secret keys at the time of deployment because the network is fully untrusted. We
increase the number of key exchanges between the sensors and DSM for the starting
session key establishment process to accomplish better security. Our aim is to make
this session more secure because we transmit all the secret information of KeyGen to
individual source sensors. Second, we assume that each sensor node Si knows the
identity of its DSM and both maintain the same secret key i.e. k for initial
authentication process.
Sensing devices and DSM implement some common primitives such as hash
function (H( )), and common key (K1), which are executed during the initial
identification and system setup steps.
The proposed authentication process includes five different steps. The first three
steps are for the sensing device and DSM authentication process and the final two
steps are for the session key generation process as shown in Figure 4-2. The shared
key is utilised during the handshaking process.
98
Table 4-1: Notations used in this model
Acronym Description
ith source sensing device’s ID
ith source sensing device’s secret key
ith source sensing device’s session key
Key length
Initial keys for authentication
Secret shared key calculated by the sensing device and DSM
Previous secret shared key maintained at DSM
Communicated format during authentication
Random number generated by the sensing devices
Interval time to generate the prime number
j Integrity checking interval
Timestamp added with data blocks
Random prime number
Secret key of the DSM
Encrypted data for integrity check
Secret key for authenticity check
Encryption function
One-way hash function
Random prime number generation function
KeyGen Key generation procedure
Key-Length ( ) Key length selection procedure
X-OR operation
Concatenation operation
Fresh data at sensing device before encryption
T′ Current time
T′′ Time to start the process
RQA Authentication request message
RPA Authentication response message
Step 1:
99
A sensing device (Si) generates a pseudorandom number (r) and encrypts it along
with its own secret key Ki. The encryption process uses the common shared key (K1),
which is initialised during the deployment. The output of encryption (EK1(r Ki)) is
denoted as P1. The output is then sent to the DSM: Si DSM: P1
Step 2:
Upon receiving the message, the DSM decrypts P1 (i. e. DK1(P1)) and retrieves the
corresponding source ID from secret key ( ). If the source
sensor’s ID is found in the database, it accepts; otherwise it rejects. The DSM
computes the hash of the key to generate another key for encryption K2 H(K1).
The DSM then encrypts the pseudorandom number (r) with the newly generated key
as P2 EK2(r) and sends it to the source sensing device for DSM authentication: Si
DSM:
Step 3:
The corresponding sensing device receives the encrypted pseudorandom number
and decrypts it to authenticate the DSM, i.e. r′ DK2(P2). It calculates the current
secret shared key using the hash of the existing shared key i.e.K2 H(K1). If the
received random number is the same as the sensor had before (i.e. r = r′), the sensing
device sends an acknowledgement (ACK) to the DSM. The ACK is encrypted with
the new key, which is computed using the hash of the current key (K3 H(K2)).
The encrypted ACK is denoted as P3 EK3(ACK), and sent to the DSM: Si DSM:
Step 4:
The DSM decrypts the ACK (i.e. ACK DK3(P3)) to confirm that the sensor is
now ready to establish the session. The current secret key is updated using the hash
of the existing secret key i.e. K3 H(K2). After the confirmation of ACK, the DSM
generates a random session key i.e. Ksi randomKey() for handshaking. The
generated session key (Ksi) is encrypted with the hash of the current key e.g. (K4
H(K3)) and then sent to individual sensors as Si DSM: { }, where P4
EK4(Ksi).
Step 5:
The sensor decrypts P4 and extracts the session key for handshaking (Ksi
DK4(P4)). It follows the same procedure as before, i.e. the current shared key is
updated with the hash value of the existing shared key (K4 H(K3)). We update the
100
shared key in every transaction to ensure the strength of security for handshaking.
The complete authentication process works as shown in Figure 4-2.
4.4.2 DLSeF Handshaking
In the handshaking process, the DSM sends the key generation and
synchronisation properties to sensors based on their individual session key (Ksi)
established earlier. Generally, a larger prime number is used to strengthen the
security process. However, a larger prime number requires greater computation time.
In order to make the rekeying process efficient (lighter and faster), we recommend
reducing the prime number size. The challenge is how to maintain security while
avoiding large prime number sizes. We achieve this by dynamically changing the key
size as described next.
The dynamic prime number generation function is defined in Algorithm 3-2. This
algorithm computes the relative prime number, which always depends on the
previous prime number. This relation between the current and previous prime number
process helps to synchronise the newly generated prime number. We have given the
mathematical proofs of Algorithm 3-2, that the generated number will always be a
prime number and will synchronise between source device and DSM. We calculate
Figure 4-2: Secure authentication of Sensor and DSM.
101
the prime number and shared key on both sensing source and DSM ends to reduce
communication overhead and minimize the chances of disclosing the shared key. The
computed shared keys have multiple lengths (32 bit, 64 bit, and 128 bit) which are
varied across the sessions. Initial key length is set to 64 bit and is dynamically
updated as per the logic depicted in Algorithm 4-1. This algorithm selects the key
length and the associated time interval to generate the shared key. The key and key
length selection process follows based on the time taken to find all possible keys in
the key domain by following Table 3-2. In Table 3-2, we compute the key domain
size and time required to find all possible keys for different key lengths (i.e. 8, 16, 32,
64, and 128) by using the most advanced Intel i7 processor. So Algorithm 4-1
follows the properties from Table 3-2 to initialise the rekeying time interval
according to the key length. After the time interval, the next shared key is generated
by applying Algorithm 3-2 where the size is determined by Algorithm 4-1 as follows:
periodically computes the relative prime number at both the sensor and
DSM ends after a time interval t, which is updated based on function .
The shared secret key ( ) generation process needs and . In the handshaking
process, the DSM transmits all properties required to generate a shared key to
sensors as follows: Si
DSM: { )}
All of the transferred information outlined above is stored in the trusted part of the
source for future rekeying processes (e.g. TPM) [134].
4.4.3 DLSeF Rekeying
We propose a novel rekeying concept by calculating prime numbers dynamically
on both source sensors and DSM. Figure 4-2 shows the synchronisation of the shared
key. In this model, a smaller size of the key makes the security verification faster. But
we change the key very frequently in the DPBSV rekeying process to ensure that the
protocol remains secure. If any type of damage happens at the source, the
corresponding sensor is desynchronised with DSM. The source sensor follows Step 3
to reinitialise and synchronise with DSM. According to our assumption, we store all
the secret information at a trusted part of the sensor. So the sensor can reinitialise the
synchronisation by sending its own identity to DSM. Once DSM authenticates the
102
source sensor, it sends the current key and time of key generation. Authenticated
sensors can update the next key by using the key generation process from a secure
module of the sensor (TPM).
ALGORITHM 4-1. Synchronisation of Dynamic Key Length Generation Key-Length ( )
1: 64 (for first iteration) 2: 3: i 4: If i = 0 then 5: Set kl 128 6: t 720 hours (1 month) 7: j no checking 8: Else If i = 1 then 9: Set kl 64 10: t 168 hours (1 week) 11: j Pi % 9 12: Else 13: Set kl 32 14: t 20 hours (1 day) 15: j Pi % 5 16: End If 17: End If Return ( ) // use to initialise for next iteration.
The proposed model not only calculates the dynamic prime number to update the
shared key without further communication after handshaking, but also proposes a
novel way of dynamically changing key length at source and DSM according to steps
described in Algorithm 4-1. We change the key periodically in the DLSeF Rekeying
process to ensure that the protocol remains secured. If there are any types of key or
data compromise at a source, the corresponding sensor is desynchronised with DSM
instantly. Following that the source sensor needs to reinitialise and synchronise with
DSM as described above. We assume that the secret information is stored in the
trusted part of the sensor (e.g. TPM) and it is sent by the sensor to DSM for
synchronisation. According to the properties of the TPM, no one has access to the
information stored inside the TPM. Only the sensor can access TPM properties. Even
if the sensor is destroyed, an adversary cannot get the information from the trusted
module of the sensor (i.e. TPM). In some cases, a data packet can arrive at DSM after
the shared key is updated. Such data packets are encrypted using the previous shared
103
key. We add a time stamp field to individual data packets to identify the encrypted
shared key. If the data is encrypted using the previous key then DSM uses key
for the security verification; otherwise, it follows the normal process.
The above defined DLSeF Handshaking process makes sensors aware of the
Prime (Pi), KeyLength, and KeyGen. We now describe the complete secure data
transmission and verification process using those functions and keys. As mentioned
above, our model uses the synchronised dynamic prime number generation Prime
(Pi) on both sides, i.e. sensors and DSM, as shown in Figure 3-3. At the end of the
handshaking process, sensors have their own secret keys, initial prime number and
initial shared key generated by the DSM. The next prime generation process is based
on the current prime number and the time interval as described in Algorithm 3-2. The
prime number generation process (Algorithm 3-2) always calls Algorithm 4-1 to
fetch the shared key length information and associated time interval. Sensors
generate the shared key =( ( , )) using the prime number , and the DSM’s
secret key (P , ). We use the secret key of DSM to improve the robustness of the
security verification process. We fixed the initial key length at 64 bits and 168 hours
as the initial time interval for rekeying. Each data block is associated with the
authentication and integration tag and contains two different parts. One is encrypted
DATA based on shared key for integrity checking (i.e. = ), and
the other is for authenticity checking (i.e. = ). The resulting data block
((DAT ) ( )) is sent to DSM as follows: Si DSM:
{( ( T))}. The time stamp which indicates the encrypted shared keys is always
associated with the authentication part. We prefer to add the time stamp with the
authentication part because the DSM can easily identify the data block if it is
encrypted with the previous shared key. More details about the time stamp are
described in the following subsection and the complete procedure of the key
generation (rekeying) process is shown in Algorithm 4-2. This algorithm takes
information from Algorithm 3-2 and Algorithm 4-1 to in order to perform the
rekeying process. From Algorithm 3-2, Algorithm 4-2 takes the dynamic prime
number ( ) to compute a shared key and from Algorithm 4-1, it takes the key
size and time interval for the rekeying process.
104
4.4.4 DLSeF Key Synchronisation
Synchronisation is one of the major issues during the rekeying process between
sensors and DSM, as they are not interacting after the handshaking process. The
shared key synchronisation is based on the initial key generation process followed by
the rekeying. So the initial key synchronisation is to make a common time to start the
key generation process. In this model, DSM works as a centralised controller. So
DSM initiates the key generation process. As defined before, during the handshaking
process DSM sends back to the source (Si) with a time stamp T′′ to initialise the key
generation process.
There are potentially two cases: (i) sensor starts the process on time to maintain
synchronisation; (ii) sensor may be missing the time stamp or later receives the
key generation properties after time stamp. In the second case, the source sensor
sends a request to get the next time stamp for the key generation process.
There are several reasons for sensors to be out of sync such as inability of the
source node to generate the shared key by some computational overhead or by any
natural disaster or by any malicious activity. Even if a sensor missed the
synchronisation, it does not miss the key generation properties because of the TPM
features [134]. In such cases, the source sensor (Si) gets synchronisation properties
from its neighbours. According to the source network structure, sensors do not have
neighbour information. So it’s a challenging task to identify the neighbours and get
the key synchronisation properties. The procedure to obtain shared key properties
from unknown neighbours is given below.
4.4.4.1 Initial Setup
Let us assume that sensor (Si) missed the synchronisation. The Sensor (Si)
computes a Pseudo Random Number, i.e. PRN(r), using the current prime number
(Pi) and the shared key (KSH) to generate the authentication request message (RQA)
i.e. RQA ← H(EKSH(r Pi Kd)). Then the resultant RQA, DSM ID (Di) and time
stamp (T) encrypt with mutual key K4 from the system setup steps (EK4(RQA T
Di)) (refer to Figure 4-2). We use this key for encryption because all the authenticated
nodes have this key from DSM during the system setup phase.
105
4.4.4.2. Synchronisation Phase
The out of sync sensor (Si) broadcasts this to its one-hop neighbours. When the
neighbour sensors receive the information, it decrypts with its mutual key i.e. K4
(DK4(RQA T Di)). It compares the received time frame (T) with its current time
(T′) to check the data freshness and avoid the replay attack (T - T′ ≤ ΔT). If the time
difference is less then ΔT, then it accepts the data packet; otherwise it is discarded.
Here ΔT is the average time required to transmit data packets between source and
DSM.
The neighbour node (denoted as Sj) compares the received DSM ID with its own
DSM ID to validate the source as the authenticated one. To make the authentication
process stronger, we perform two layer encryption of the request (RQA). Sensor (Sj)
performs the hash and decrypts the second layer with the shared key (KSH), i.e.
H(DKSH(r Pi Kd)). It uses the previous shared key if the shared key is
modified in the meantime and compares the DSM ID by retrieving it using the DSM
secret key (Di← retriveKey(Kd)).
Figure 4-3: Neighbour node discovered to get the current state of key
generation properties.
106
After the authentication process, Sj prepares an authentication response message
(RPA) by including its own ID, DSM ID and pseudo random number r (RPA ←
EKSH(Sj Di r)). It then encrypts the RPA along with the DSM key and time stamp
by using the same key K4 (EK4(RPA Kd T)).
Once Si receives the RPA, it is processed in the same way to authenticate the node
Sj (DK4(RPA Kd T)). First it compares the time to avoid replay attack (T - T′ ≤ ΔT)
and compares the DSM ID (Di← retriveKey(Kd)) and value of r to perform
authentication. Here desynchronised source node (Si) encounters three different types
of neighbours: malicious node, desynchronised authenticated node and synchronised
authenticated node as shown in Figure 4-4. Malicious neighbours cannot decrypt Si
request because it is encrypted by the secret key. But a desynchronised authenticated
node can read the request. Once it comes to know that the source (Si) is seeking the
key synchronisation properties, it sends the response with its desynchronisation
indication. The source discards the RPA received from such nodes. If the source node
receives RPA from an authenticated synchronised neighbour, Si choses such node by
sending the ACK in order to get the key synchronisation properties (EKSH(ACK Si
T)).
This acknowledgement message (i.e. ACK) confirms the mutual authentication
between the source and synchronised neighbour to obtain the key synchronisation
properties (DKSH(ACK Si T)). After receiving the acknowledgement message, the
authenticated neighbour gets the source node ID and sends the shared key properties
(Pi, KSH, t) to the source node as EKSH(Pi, KSH, t, T).
When the desynchronised source gets the shared key synchronisation properties
(DKSH(Pi, KSH, t, T)), it can generate the shared key by itself, because it has the prime
number (Pi), shared key (KSH), and time to change the next key (t). Every time we are
checking the time interval in order to avoid the replay and DoS attacks. The stepwise
representation of the neighbour authentication to obtain the shared key properties is
shown in Figure 4-3.
107
4.4.4.3 New Node Synchronisation
If there is a new source node joining the network, then it starts the authentication
process with DSM to get the key generation properties. After receiving the key
generation properties from DSM, the node (n) either starts the process or
authenticates with the neighbour nodes to compare the synchronisation properties.
ALGORITHM 4-2. Key Generation (Rekeying) Process at Sensor(Si) and DSM(D)
1. Session key ( ) from Figure 4-3 2. Dynamic prime number ( ) computed from Algorithm 3-2. 3. Time interval (T) computed.
3.1 T= {t1, t2, t3, …} Here t1, t2, t3, … are the time intervals of key generation.
3.2 Sensor ( ) and DSM (D) update the key after the time interval. 4. As stated before sensor and DSM have properties like H( ), E. The new key
generation = (H( , )). 5. The encryption process at sensor happens in two steps
5.1 = 5.2 =
6. DSM: {( ( T))}
Figure 4-4: Neighbour discovery to get the key synchronisation properties with
all possible conditions. (a) node Si sends RQA message to all its one-hop
neighbours; (b) the sender receives the RPA of individual RQA; (c) Si send
ACK to only authenticated synchronised neighbours; (d) node Si receives the
synchronisation properties.
108
4.4.5 DLSeF Security Verification
In this step, the DSM first checks the authenticity in each individual data block
and then the integrity with randomly selected data blocks . The random value is
calculated based on the corresponding prime number i.e. = % 5, when the key
length is 32; = % 9 when the key length is 64; and there is no integrity verification
when the key length is 128. We differ the integrity verification interval randomly for
individual key lengths. We prefer to change the integrity verification interval more
frequently when the key length is shorter because key length is inversely proportional
to possibilities to read/modify the data. As the key length 128 is computationally hard
and can last for a long time, we do not check the integrity verification. We update the
shared key before there is a possibility of attack. The DSM also checks the time
stamp of each individual data block to find the shared key used for encryption. For
the authenticity check, the DSM decrypts with shared key = . Once Si
is obtained, the DSM checks its source database and extracts the corresponding secret
key ( ). In the integrity check process, the DSM decrypts
the selected data such as = to get the original data and checks MAC
for data integrity.
In this step, the DSM first checks the authenticity in each individual data block
and then the integrity with the randomly selected data blocks . Here the data block
is divided into two blocks for authenticity checking and integrity checking. Along
with authenticity checking, we add timestamp (T) in order to get the data freshness
and avoid replay attack. The data block at DSM for security verification is
represented as: {( ( T))}. DSM first checks the authentication part to get the
timestamp. It compares its own timestamp with the received one i.e. T - T′ ≤ ΔT. If
the time interval is less than or equal to the predefined time ΔT, then it accepts the
data; otherwise it is rejected. This will help to maintain the data freshness and avoid
the replay attack. Initial time checking and the authenticated source checking can
avoid the DoS (denial of service) attack. Another important advantage of adding the
time stamp (T) is to get the shared key used for the encryption process. If the shared
key is updated after receiving the data block encryption, then DSM uses the previous
shared key ( ) for decrypting the data instead of the current key ( ).
109
The complete mechanism beginning from source and DSM authentication to
handshaking and security verification as mentioned in algorithmic format is shown in
Algorithm 4-3. Algorithm 4-3 represents the description of the proposed mechanism
as a stepwise process.
Algorithm 4-3. Lightweight Security Protocol for Big Sensing Data Streams
Description Based on the prime number generation on both sensor and
DSM ends, the proposed dynamic key length based security
framework of big data streams works more efficiently than
before without compromising security.
Input the prime generation process ,
key length generation process Key-Length ( ),
key generation process , and session key .
Output Successful security verification without detecting any
malicious attacks.
Step 1 DLSeF System setup
1.5 Si DSM: {EK1(r Ki)}, ith sensor sends the random number with its identity which is encrypted with common shared key i.e. K1.
1.6 Si DSM: { }, DSM identifies the sensor and generates a new key which is the hash of current key for encryption K2 H(K1). Then DSM encrypts the random number and sends back to the ith sensor
1.7 Si DSM: { }, ith sensor identifies the DSM by decrypting the packet. If sender is authenticated then it performs the hash of the current key (K3 H(K2)) to get a new key for encryption and sends back the acknowledgement.
1.8 Si DSM: { } DSM authenticates the last transaction and sends back to ith sensor with this format. DSM generates a session key Ksi randomKey() and encrypts with the newly generated key (k4) with the hash function of current key (k3).
1.9 Sensor authenticates the packet and gets the session key for handshaking (Ksi DK4(P4)).
Step 2 DLSeF Handshaking
DSM sends its properties to individual sensors based on their individual
session key. It includes the prime number generation and time interval to
generation etc.
2.2 DSM Si: { ( )} Step 3 DLSeF Rekeying
Key updates on both source sensor and DSM and both are aware about
110
the Prime (Pi) and KeyGen. Sensors generate the shared key
and each data block is associated with two different parts.
One is encrypted i.e. and the other is for authenticity
checking i.e., .
3.2 Si DSM: { }, these blocks for Authentication, integration, confidential check, and Time stamp for synchronisation.
Step 4 DLSeF Synchronisation
Desynchronised node gets the synchronisation properties from its
neighbours.
4.3 Sensor (Si) gets the synchronisation properties from its neighbours (See Figure 4-3 and 4-4)
Step 5 DLSeF Security Verification
The DSM checks for authenticity in each data block and checks for
the integrity with random interval data blocks and random value is
calculated based on the corresponding prime number.
5.1. DSM checks the timestamp (T) at every packet to get the key for decryption. If the timestamp is not the current one then it decrypts with
. 5.2.
For the authenticity check, the DSM gets the source ID. Once Si is
obtained, the DSM checks the source database and extracts the
corresponding secret key for the integrity check according to the value
of j.
5.3. DSM calculates/decrypts data and checks MAC for integrity.
4.5 Security Analysis
This section provides a theoretical analysis of the security model. We made the
following assumptions: (a) any participant in our security model cannot decrypt the
data that was encrypted by the DLSeF algorithm unless it has the shared key which
was used to encrypt the data; (b) as DSM is located at the big data processing system
side, we assume that DSM is fully trusted and no one can attack it; and (c) a sensors’
111
secret key, Prime (Pi) and secret key calculation procedures reside inside the trusted
part of the sensor (such as the TPM) so that they are not accessible to intruders.
Similar to most security analyses of communication protocols, we now define the
attack models for the purpose of verifying confidentiality, authenticity and integrity.
4.5.1 Security Proof
Definition 1 (attack on authentication). A malicious attacker Ma can attack the
authenticity if it is capable of monitoring, intercepting, and introducing itself as an
authenticated source node to send data in the data stream.
Definition 2 (attack on integrity). A malicious attacker Mi can attack the integrity if
it is an adversary capable of monitoring the data stream regularly and trying to
access and modify a data block before it reaches the DSM.
Definition 3 (attack on partial confidentiality): A malicious attacker Mc is an
unauthorised party which has the ability to access or view the unauthorised data
stream before it reaches the DSM (within the time bound).
We define the threat model, which is similar to most cryptologic analyses, to the
shared key properties as follows:
Theorem 1: The security is not compromised by changing the size of shared key
(KSH).
Proof: Refer to Chapter 3 Theorem number 1.
Theorem 2. According to the proposed synchronisation method, the shared key (KSH)
is always synchronised between Source sensor (Si) and DSM.
Proof: According to DLSeF properties, the dynamic shared key length varies
between 32 bit, 64 bit, and 128 bit; these keys are updated at both source and DSM
ends. The shared key is updated without further communications between Si and
DSM after handshaking. A variation in key length introduces a complexity to the
attackers to predict the next shared key. The ECRYPT II recommendations on key
length say that a 128-bit symmetric key provides the same strength of protection as a
3,248-bit asymmetric key. An advanced processor (Intel i7 Processor) took about 1.7
nanoseconds to try out one key from one block. With this speed, it would take about
112
1.3 × 1012× the age of the universe to check all the keys from the possible key set
[35]. All the related key domains and the time required to get the possible keys by
using Intel i7 processor are listed in Table 3-2.
Here, we are highlighting the synchronisation in two places (i) source sensor with
DSM at initial key generation process and (ii) while obtaining the synchronisation
properties from a neighbour. For the first option (during the handshaking process),
DSM sends the key generation properties to Si along with the timestamp (T′′) to set
the key generation time. Then both DSM and Si generate the shared key with
dynamic length and interval as in DLSeF method. This means the shared key will be
synchronised at both ends. In the second option (obtaining the synchronisation
properties from neighbours), if any of the source is desynchronised, it initiates the
neighbour authentication process to discover authenticated synchronised neighbours
(see Figure 4-3). After authentication, the neighbour sends the key generation
properties EKSH(Pi, KSH, t, T), where T is for data freshness and t is the start of the
key generation process. Then source Si can use the current key and use these
properties to update the next key (i.e. =( ( , ))) after time t. Now source Si
becomes synchronised with other sources and DSM.
Theorem 3: An attacker Ma cannot read the secret information from a sensor node
(Si) or introduce itself as an authenticated node in DLSeF.
Proof: Following Definition 1 and considering the computational hardness of a
secure module (such as TPM), we know that Ma cannot get the secret information
for Pi generation, Ki and KeyGen. So there are no possibilities for the malicious node
to trap the sensor, but Ma can introduce him/herself as an authenticated node to send
its information. In this model, a sensor (Si) sends , where the second
part of the data block is used for an authentication check. The DSM
decrypts this part of the data block for the authentication check. The DSM retrieves
Si after decryption and matches corresponding Si within its database. If the
calculated Si matches with the DSM database, it accepts; otherwise it rejects the
node as a source and it is not an authenticated sensor node. Hence, we conclude that
an attacker Ma cannot attack the big data stream.
113
Theorem 4: An attacker Mc cannot access or view the unauthorised data stream in
the proposed DLSeF within the time bound.
Proof: Following Algorithm 3-2, it is clear that the prime numbers are
generated at sensors and DSM dynamically without any further communication.
Shared secret key is calculated based on the generated prime number.
Considering the computational hardness of secure modules (such as TPM), we know
that Mc cannot get the secret information such as Pi generation, Ki and KeyGen
within the time frame. Following Definition 2, we know that an attacker Mc can gain
access to the shared key but no other information. In this model, source sensor
(Si) sends data blocks in the format , where
the first part of the data block contains the original data.
Getting the original data is impossible from this because Mc does not have
other information and at the same time the shared key is updated dynamically in
the interval of time (t). If Mc has sufficient processing and storage capabilities, it
may be able to get the shared key, but in the meantime the shared key must be
changed. In such a case, Mc can read the message. This does not affect the
application we are focusing on (e.g. disaster management) by stream data
processing. So our model DLSeF provides weak confidentiality by not breaking the
confidentiality in real time.
Theorem 5: An attacker Mi cannot read the shared key within the time interval
t in the DLSeF model.
Proof: Following Definition 2, we know that an attacker Mi has full access to the
network to read the shared key , but Mi cannot get correct secret information
such as KSH. Considering the method described in Theorem 1, we know that Mi
cannot get the currently used KSH within the time interval t (see Table 3-2), because
the proposed model calculates Pi randomly after time t and then uses the value Pi to
generate KSH as described in Theorems 1 and 2.
Theorem 6: The proposed DLSeF requires a comparatively smaller buffer size than
standard symmetric key solutions for security verification.
Proof: Following Algorithm 4-2, it is clear that the proposed DLSeF is a lightweight
security model for security verification. We are decrypting the identity of sensing
114
devices for authentication checks from every data block, whereas the selected data
block decrypts for integrity checks. Another important mechanism is the key length
used for encryption/decryption. As we are using the smaller key length to encrypt
the data blocks, it also makes the security verification faster. These above two
processes make the security verification much faster than other security mechanisms.
As we all know, the speed of the security verification is directly proportional to the
required buffer size. Finally, we conclude that the proposed DLSeF model for
security verification needs a comparatively smaller buffer size. The evaluation proof
is in the following section.
Theorem 7. Neighbour synchronisation is also protected against authentication,
integrity and partial confidentiality.
Proof: By considering TPM properties, we know that an attacker cannot get the
secret information (Pi, Ki, KSH) or the key generation properties (KeyGen). During
the neighbourhood authentication procss, a sensor (Si) shares the synchronisation
properties after authentication and gets the DSM ID and the secret key (see Figure 4-
3). So there are no possibilities for the malicious nodes to trap authenticated sensors
to get the shared key generation properties. Following neighbour synchronisation
properties, malicious nodes cannot interfere because neighbours identify each other
through the DSM ID (Kd) and the encryption process uses the secret key (EK4).
Those properties are not known to malicious nodes. We know that an intruder
cannot get the currently used KSH within the time interval t (see Table 3-2), because
the proposed method calculates Pi randomly after time interval t and then uses the
value Pi to generate KSH. But an attacker can introduce itself as an authenticated
node to send packets.
From above, we conclude that an attacker cannot get the shared key information
during neighbour synchronisation.
Theorem 8. After applying the synchronisation, the security verification models
also avoid replay attacks.
Proof: There are potentially two places for replay attacks (i) during the neighbour
authentication; (ii) the security verification at DSM. In both of these cases we are
adding a time stamp i.e. T in packets. During the security verification at DSM, DSM
115
checks for the data freshness by comparing the time interval between the sent and
received time of data blocks such as T - T′ ≤ ΔT. If the interval is less than or equal
to ΔT, then the data block is accepted; otherwise it is rejected. Application of this
rule keeps rejecting the delayed data packets but maintains the data freshness and
avoids the replay attacks. Through the time interval (ΔT), it is easy for DSM to find
the shared key used for encryption ( or KSH). We also follow the same method
to avoid replay attack during neighbour authentication. By using such a method, the
model is proven to be more efficient to avoid DoS attacks.
4.6 Experiment and Evaluation
The proposed DLSeF security model, though deployed in a big sensor data
stream in this chapter, is a generic approach and can be used in other application
domains. In order to evaluate the efficiency and effectiveness of the proposed
architecture and protocol, even under adverse conditions, we experimented with
different approaches in multiple simulation environments. We first measure the
performance of sensor nodes by using a COOJA simulator in Contiki OS [118];
second, we verify the proposed security approach using Scyther [119]; third, we
measure the performance of the approach using JCE (Java Cryptographic
Environment) [120]; finally, we compute the minimum buffer size required to
process the proposed approach by using MatLab [121] in order to measure the
efficiency of the proposed model.
4.6.1 Sensor Node Performance
We tested the performance of sensors in a COOJA simulator in Contiki OS to
measure the performance of sensors while running the proposed security verification
model. We took the two most common types of sensor, i.e. Z1 and TmoteSky
sensors, for the experiment and performance checking as shown in Figure 3-4. In
this experiment, we checked the performance of sensors while computing or
updating the shared key and the highest possible number of shared key generation
with specified energy level. Initially all sensor nodes have the same level of energy,
1.6 joule. [135].
116
Z1 sensor nodes are produced by Zolertia and are a low-power WSN module,
designed as a universal purpose development platform for sensor network
researchers. Most of the WSN communities prefer this because it supports most
employed open source operating systems, like Contiki. COOJA is a network
simulator for Contiki, which provides real time sensor node features for simulation.
The Z1 sensor is equipped with the low power microcontroller MSP430F2617,
(a) (b)
Figure 4-5: Performance computation of two different sensors (a) Estimated
power consumption during the key generation process. (b) Possible number
of key generation with initial 1.6 J power of sensors.
(a) (b)
Figure 4-6: Energy consumption by using COOJA in Contiki OS, (a) Energy
for neighbour authentication; (b) Energy for security verification.
117
which has features like a powerful 16-bit RISC CPU @16MHz clock speed, 8KB
RAM, built-in clock factory calibration, and a 92KB Flash memory. Z1 hardware
selection always guarantees robustness and maximum efficiency with low energy
cost. Similarly, TmoteSky is an ultra-low power sensor. It is equipped with the low
power microcontroller MSP430F1611, which has built-in clock factory calibration,
10KB RAM and a 48KB Flash memory.
From the features of the above two types of sensors, we successfully established
in the COOJA Simulator that the key generation process works successfully in both
types of sensors i.e. Z1 sensor and TmoteSky sensor. These sensors can easily
support the security model. The energy consumption during the key generation
process is shown in Figure 4-5 (a), and the maximum number of possible key
generations in Figure 4-5 (b). On average, the above sensors can generate the shared
key around 280 times, where sensors can support over a year to perform security
mechanism. From this experiment, we conclude that the proposed security
verification approach DLSeF is supported by most common types of sensors (tested
with Z1 and TmoteSky sensors) and feasible for big sensing data streams to work for
longer times.
In experiment for neighbour node authentication for getting synchronisation
properties, we measured the performance of sensors while they transmit/receive
information from neighbours or dynamically update the shared key for undertaking
security verification process. Figure 4-6 (a) shows the energy required by sensors
during transmitting/receiving synchronisation properties from neighbours and Figure
4-6 (b) shows the power consumption behaviours for the key generation process.
From these experiments, we conclude that the proposed model is lightweight as both
the application of synchronisation properties and security verification model
consume very little sensor battery power.
4.6.2 Security Verification
The protocols in the proposed model are written in a Scyther simulation
environment using Security Protocol Description Language (.spdl). According to the
features of Scyther, we define the roles of S and D, where S is the sender (i.e. sensor
118
nodes) and D is the recipient (i.e. DSM). In this scenario, S and D have all the
required information that is exchanged during the handshake process. This enables S
and D to update their own shared key. S sends the data packets to D and D performs
the security verification. In the simulation, we introduce three types of attack by
adversaries. In the first type of attack, a malicious attacker changes the data while it
is being transmitted from S to D through intermediaries (integrity attack). In the
second type of attack (authentication attack), an adversary acquires the property of S
and sends the data packets to D pretending that it is from S. In the third type of
attack (attack on confidentiality), an adversary gets the data block to analyse and
tries to read the data within the time bound. We experimented with 100 runs for each
claim and found no attacks at D as shown in Figure 4-7.
Experiment model: In practice, attacks may be more sophisticated and efficient
than brute force attacks. However, this does not affect the validity of the proposed
DLSeF model as we are interested in efficient security verification without periodic
key exchanges and successful attacks. Here, we model the process as described in
the previous section and vary the key size between 32 bits, 64 bits, and 128 bits (see
Table 3-2). We used Scyther, an automatic security protocol verification tool, to
verify our proposed model.
Figure 4-7: Scyther simulation environment with parameters and result page
of successful security verification of DLSeF protocol.
119
Results: We did the simulation using a different number of data blocks in each
run. The experiment ranged from 10 to 100 instances with 10 intervals. We checked
authentication for each data block, whereas the integrity check is performed on
selected data blocks. As the key generation process is saved in the trusted part of the
sensors, no one can get access to that information except the corresponding sensor.
Hence, we did not find any authentication attacks. For integrity attacks, it is hard to
get the shared key ( ), as we frequently change the shared key ( ) and its
length based on the dynamic prime number on both source sensor ( ) and DSM.
In the experiment, we did not encounter any integrity attacks. Figure 4-7 shows the
result of security verification experiments in the Scyther environment. This shows
(a) Secure authentication results.
(b) Security verifications results at DSM.
Figure 4-8: Security verification results of Scyther during neighbour
authentication and synchronisation.
120
that the security model is secured from integrity and authentication attacks.
During the neighbour authentication to get the synchronisation properties, both
sensors Si and Sj authenticate themselves while hiding the DSM ID and secret key.
In the experiment, we did not encounter any attacks that can compromise the
security properties of the big data streams. Results shown in Figure 4-8 (a) validate
the above hypothesis; it also shows the neighbour authentication in the Scyther
environment. We perform the security verification at DSM; here, we follow the
same concept while adding the new key synchronisation process. Figure 4-8 (b)
shows the results of the security verification at DSM after combining the
synchronisation method with DLSeF.
4.6.3 Performance Comparison
Experiment model: It is clear that the actual efficiency improvement brought by
the security model is highly dependent on the size of the key and rekeying without
further communication between sensor and DSM. We have performed experiments
with different sizes of data block. The results of the experiments are given below.
We compare the performance of the proposed model DLSeF with advanced
encryption standard (AES), and our previously proposed model for big sensing data
stream (DPBSV), the standard symmetric key encryption algorithm [38 - 39]. The
security model is efficient compared with DPBSV and two standard symmetric key
algorithms, 128-bit AES and 256-bit AES. This performance comparison
experiment was carried out in JCE (Java Cryptographic Environment). We
compared the processing time with different data block sizes. This comparison is
based on the features of JCE in Java virtual machine version 1.6 64 bit. JCE is the
standard extension to the Java platform which provides a framework implementation
for cryptographic methods. We experimented with many-to-one communication. All
sensor nodes communicate to the single node (DSM). All sensors have similar
properties whereas the destination node has more power to initialise the process
(DSM). The rekey process is executed at all the nodes without any
intercommunication. The processing time of data verification is measured at the
DSM node. The experimental results are shown in Figure 4-9.
121
Results: The performance of the security model is better than the standard AES
algorithm when different sizes of data blocks are considered. Figure 4-9 shows the
processing time of the DLSeF model in comparison with base 128-bit AES, and
256-bit AES for different sizes of data blocks. The performance comparison shows
that the proposed model is efficient and faster than the baseline AES protocols.
From the above two experiments, we conclude that the proposed DLSeF model is
secured (from both authenticity and integrity attacks), and efficient (compared to
standard symmetric algorithms such as DPBSV, 128-bit AES and 256-bit AES).
4.6.4 Required Buffer Size
Experiment model: We experimented with the features of the DSM buffer by
using MATLAB as the simulation tool [121]. This performance is based on the
processing time performance calculated in Figure 4-10. Here we compared our
scheme with DPBSV and standard 128-bit AES and 256-bit AES, the same as the
processing time performance comparison. The minimum size of buffer required to
process security verification at DSM with various data rates starts from 50 to 250
Figure 4-9: Performance comparison of the scheme with DPBSV and
standard AES algorithm i.e. 128 bit AES and 256 bit AES.
122
MB/S with a 50 MB/S interval. Here we compare the efficiency of the proposed
model (DLSeF).
Results: The performance of the security model is better than the standard AES
algorithm with different rates of data. Figure 4-10 shows the minimum buffer size
required to process security at the DSM and proposed DLSeF scheme performance
compared with DPBSV and base symmetric key solutions such as 128-bit AES and
256-bit AES. The performance comparison shows that the proposed model is
efficient and requires less buffer to process security than previous protocols.
From all the above experiments, we conclude that the proposed DLSeF model is
secured (from authenticity, confidentiality, and integrity attacks), and efficient
(compare to standard symmetric algorithms such as 128-bit AES and 256-bit AES
and DPBSV). We also show that the proposed model needs less buffer during the
security verification.
Figure 4-10: Efficiency comparison of minimum buffer size required to
process the security verification with various data rates to DSM.
123
4.7 Summary
This chapter proposed a novel authenticated key exchange protocol, namely
Dynamic Key Length Based Security Framework (DLSeF), which aims to provide a
real-time security verification model for big sensing data streams. The security
model has been designed based on symmetric key cryptography and dynamic key
length to provide more efficient security verification of big sensing data streams.
The proposed model is designed by two dimensional security i.e. not only the
dynamic key but also the dynamic length of the key. By theoretical analyses and
experimental evaluations, we showed that the DLSeF model has provided significant
improvement in the security processing time, and prevented malicious attacks on
authenticity, integrity and weak confidentiality. In this model, we decrease the
communication and computation overhead by performing dynamic key initialisation
along with dynamic key size at both source sensing devices and DSM, which in
effect eliminates the need for rekeying and decreases the communication overhead.
The proposed security verification model is implemented before stream data
processing (i.e. DSM) as shown in the architecture diagram. Several applications
such as disaster management, event detection etc. need to filter the modified and
corrupted data before stream data processing. These types of applications need only
original and unmodified data for analysis to detect the event. The proposed DLSeF
model performs security verification in near real time to synchronise with the
performance speed of the stream processing engine. The major concern is not to
degrade the performance of stream processing by performing security verification
near real time. Although the efficiency of big data stream security verification
benefits greatly from an efficient AES and DPBSV scheme such as DLSeF, this is
still not fast enough when verifying data blocks while maintaining as much data
security and privacy as possible.
124
Chapter 5
Selective Encryption Method to Ensure
Confidentiality for Big Sensing Data
Streams
Chapter 4 solved the big data stream issues against lightweight security solution to
protect data against authenticity, integrity and partial confidentiality attacks. Another
major concern is to maintain the privacy of sensitive data and protect against attacks
on data confidentiality. To ensure the confidentiality of collected data, sensed data
are always associated with different sensitivity levels based on sensitivity of
emerging applications or sensed data types or sensing devices. Providing multilevel
data confidentiality along with data integrity for big sensing data streams in the
context of near real time analytics is a challenging problem. This chapter proposes a
Selective Encryption (SEEN) method to secure big sensing data streams that
satisfies the desired multiple levels of confidentiality and integrity. This method is
based on two key concepts: common shared keys that are initialised and updated by
DSM without requiring retransmission, and a seamless key refreshment process
without interrupting the data stream encryption/decryption. Theoretical analyses and
experimental results of the SEEN method show that it can significantly improve the
efficiency and buffer usage at DSM without compromising confidentiality and
integrity of the data streams.
125
5.1 Introduction
A large number of mission critical applications such as disaster management,
cyber physical infrastructure systems and SCADA are building IoTs applications by
deploying a number of smart sensing devices in a heterogeneous environment. Data
produced from a large variety of sources using sensing devices are streamed towards
DSM for processing and decision making. This trend gives birth to an area called big
data stream [5, 42]. The variety of applications and data sources makes the need for
data dependability such that only trustworthy and dependable information is
considered for decision making processes. Data security (i.e. more specifically
ensuring data integrity and confidentiality) is an efficient and effective procedure to
assure data trustworthiness/dependability, since DSM processes the data streams in
near real time and performs the data analytics; the appropriate actions are performed
based on the results from the analytics. It is thus important that data trustworthiness is
assured during the lifecycle of big data stream processing. Recent research [136-137]
highlighted the key contributions on lightweight security provenance in data both in
transit and at rest.
The lifetime of a big data stream is very short because it is continuous in nature
(i.e. the data can be accessed only once) [5, 29]. Such data streams in critical
applications have high volume and velocity, but the stream processing has to be done
in near real time. It cannot follow the traditional store and process batch computing
model [24]. To address this challenge, stream processing engines (such as Spark,
Storm, S4) have emerged in the current era to provide the capability to commence big
data processing in real time [29, 128]. Stream processing engines (SPE) deal with
two important advantages: (i) there is no need to store large volumes of data and (ii)
it is capable of supporting real time computation needed by emerging applications.
As the important decisions are made in critical applications by data streams analysis
in near real time, it is important that such data are not accessed or tampered with by
malicious adversaries. This brings one of the key and open research problems in big
data streams; that is, how to ensure end-to-end security for stream data processing.
This includes guaranteeing data security properties (i.e. integrity, confidentiality,
authenticity and freshness) [4 – 5, 137].
126
There are different security requirements for different emerging critical
applications. Let us consider some applications such as disaster management,
terrestrial monitoring, military monitoring, healthcare, cyber physical infrastructure
systems, SCADA. that are the sources for big data streams [4 – 5, 138 – 139]. Some
applications, including terrestrial monitoring and disaster management, need data
integrity so that the system has high confidence in the detected events from stream
data processing; confidentiality is not that important in such applications [138, 140 -
141]. Some applications such as military applications, healthcare, and SCADA need
data confidentiality along with data integrity. The confidentiality of data depends not
only on applications, but also on data types. For example, some applications need
data confidentiality forever (i.e. strong confidentiality), whereas some applications
need to maintain data confidentiality in real time (i.e. partial confidentiality). If we
consider healthcare applications, personal health data need to be protected from
outsiders and we need strong confidentiality for such applications [138], whereas in
SCADA application data need to be protected in real time until a DSM detects the
event [136]. There are still several applications including military monitoring that
need different levels of data confidentiality [137, 140]. We have classified the
security threats and adversary models in the following sections. In such systems,
there is no need for data confidentiality for normal sensed data, but it is needed for
highly sensitive data such as movement in the battle field or detection of enemy
activities. This chapter addresses the issue specified above by designing a novel
security method for big sensing data streams.
The common approach to data security is to apply a cryptographic model. If the
encryption keys are managed properly, data encryption applying a cryptographic
method is the most widely recognised and secure way to transmit data. There are two
basic sorts of cryptographic encryption strategies: asymmetric and symmetric. It is
already proved that symmetric key cryptography is 1000 times faster than
asymmetric key cryptography [34 - 35]. We focus on symmetric key cryptography to
design a new security method for big data streams to ensure data confidentiality and
integrity.
In order to address the aforementioned challenge, we have designed and
developed a selective encryption method (SEEN) to secure and maintain
confidentiality of big data streams according to sensitivity levels of the data. This
127
method is based on a typical shared key that is initialised and updated by a DSM
without requiring retransmission. Furthermore, the proposed security method is able
to recover by detecting lost keys and perform seamless key refreshment without
interrupting ongoing data stream encryption/decryption. SEEN maintains different
levels of data confidentiality along with data integrity. The main contributions of the
chapter can be summarised as follows:
We have developed and designed a novel selective encryption method
(SEEN) to secure and maintain confidentiality of big sensing data streams
according to different data sensitivity levels. This method is based on
common shared keys and is initialised and updated by a DSM without
requiring retransmission. The security method performs seamless
refreshing of the shared key without disrupting ongoing data encryption or
decryption.
The proposed model adopts different keys for the three levels of data
confidentiality (i.e. no confidentiality, partial confidentiality and strong
confidentiality) based on the data sensitivity levels.
We validate this proposed method by theoretical analyses and
experimental results.
We compare the SEEN method with a standard symmetric key solution
(AES-128), DPBSV and DLSeF in order to evaluate efficiency.
The rest of this chapter is organised as follows. Section 5.2 gives a brief overview
of related work. Section 5.3 introduces the proposed system and the corresponding
security method. Section 5.4 provides a detailed description of the security method,
followed by its security analysis and performance evaluation in Sections 5.5 and 5.6,
respectively. Finally, Section 5.7 summarises the contributions in this chapter.
5.2 Design Consideration
This section gives the system architecture, possible threats and attack models in
different stages of the data stream flow.
128
5.2.1 System Architecture
The overall architecture of a big sensing data stream including security model is
shown in Figure 5-1. The architecture includes source sensing devices to transmit
data to the DSM through wireless networks including the security model (SEEN). We
follow [4 - 5] to design a DSM which is capable of handling high-volume and variety
data streams from multiple sources. In addition, the DSM is responsible for
performing the security verification of the incoming data streams in near real time to
synchronise with the processing speed of SPE (Spark Streaming, Apache Strom,
Apache S4, etc.). For further information on stream data processing on datacentre,
refer to [128].
Along with this, we consider both source sensor and cloud data centre deployed
with Intrusion Detection Systems (IDS). Sensor based IDS monitor a sensor’s
behaviour and generates alerts on potentially malicious activities onboard and
network traffic [142]. IDS can be set inline, attached to a spanning port of a sensor.
The idea here is to allow access to all packets we wish the IDS to monitor. LEoNIDS
(low-latency and energy efficient network IDS) is a system that determines the
energy expectancy trade off by giving both lower power utilisation and lower
recognition expectancy in the meantime [141]. By highlighting cloud based IDS, Lee
et al. [143] proposed an intrusion detection system where the learning operators
persistently process and give the redesigned methods to the discovery agents for
efficient learning and real-time detection. It generally computes inter and intra audit
record patterns, this can guide the data gathering process and simplify feature
extraction from audit data. Xie et al. [144] proposed a novel technique to analyse the
system (sensor) vulnerabilities and attack sources quickly and accurately.
In this architecture, the data streams are always in the encrypted format when they
arrive at the DSM. The idea is that while encrypting the data packets at the source,
we attach sensitivity level of data to each individual data packet. In the SEEN
method, we apply different keys to encrypt the data packets for different data
sensitivity levels. The aim is to provide different confidentiality levels based on the
applications as well as the sensitivity level of the data. In a very generic
representation, if we need n levels of data security then n-1 keys ( )
are required for encryption/decryption. In this chapter, we are considering three
129
levels of data confidentiality: strong confidentiality, partial confidentiality, and no
confidentiality; and two keys (i.e. k1, k2) for encryption methods. The strong
encryption method uses k1 and is used to provide strong confidentiality, and the weak
encryption method uses k2 to support partial confidentiality. Note that we do not
need to encrypt the data packets for no confidentiality.
Data packets can be transmitted to DSM using two different ways of encrypting
data: (i) encrypt the data stream and (ii) encrypt the data packets in the stream. In
both these ways, we are going to apply encryption methods (strong/weak encryption)
based on the data sensitivity or confidentiality level. The encrypted data stream
applies to those sensors which are deployed with the sensitivity levels, whereas the
encrypted data packet applies to the sensors with different sensitivity levels for
different types of data.
Here, we follow a three step process data collection, security verification, and
stream query processing at DSM as highlighted in Figure 5-1. The focus is to perform
the security verification at DSM by providing end-to-end security of big sensing data
streams. It is also important to perform security verification of a data stream before
stream query processing in order to maintain the originality of the data for SPE. The
security verification needs to be done on-the-fly (i.e. near real-time) with smaller
buffer size. The queries including security verification can be defined as a directed
acyclic graph and each node is an operator and edges are defined as data flows
between the nodes.The above system architecture and security requirements of big
data streams [4 - 5] lead to the following two important features:
130
Data packet needs to maintain confidentiality based on its sensitivity level.
Need optimized buffer size at DSM prior to stream query processing.
Motivated by this problem, this chapter aims to address the challenge of data
integrity and multilevel confidentiality on real time massive data streams.
5.2.2 Adversary Model
We assume that a large number of sensor nodes are sources of big sensing data
streams, fully connected and can communicate to the DSM through wireless
networks. We assume that the DSM is aware of the network topology and initially
deployed node. We assume that IDS is positioned at each source device and at the
DSM so that source sensors and DSM are capable of detecting packet-loss attacks
and data modifications [137]. The DSM is treated as fully secured and protected in
the model as it resides at the cloud data centre.
An attacker has several ways of attacking big sensing data streams:
Figure 5-1: High level architectural diagram of big sensing data streams, DSM and stream data processing system for SEEN security model.
131
After the deployment, the nodes may be captured by the attacker who will then be
able to access the data stored in these nodes, as well as reprogram them and
control their actions. The attacker could therefore make nodes refuse to forward
some of the packets (Selective Forwarding attack) or even all of them (Blackhole
attack).
The attacker may capture the data packets in the middle to get the information out
of them and modify the content of a data packet. The attacker can therefore cause
the loss of confidentiality (confidentiality attack) of sensitive information and
data integrity (integrity attack).
A replay attack (also well-known as playback attack) is a network based attack
where a data stream is maliciously delayed or fraudulently repeated.
Compromising a node to drop packets and introducing interference in the
network to access/tamper with the data are, from a high-level perspective, the two
ways in which an attacker can disrupt data transmission through a packet-loss attack.
For this reason, the adversarial model covers many different attacks that aim at
causing packet losses. The other type of attack is to capture sensitive data packets
and analysed to break the data confidentiality.
Each node whose IDS detects a packet loss attack will investigate the loss; we
assume the investigating source device to be trustworthy and not to report any false
response. This assumption is particularly important for the Majority Voting
algorithm adopted as part of this approach. However, we will also present a variant
of this algorithm able to relax this constraint, and thus able to tolerate up to a
confident number of colluding investigating source nodes.
5.2.3 Attack Model
There are three main threat approaches for attack models i.e. attack centric,
software centric and asset centric. An attack centric threat model always starts with
an attacker, whereas a software centric threat model starts with system designing.
An asset centric threat model follows the information collection and assets entrusted,
so the proposed method is an asset centric threat model.
132
We assume that multiple simultaneous attacks can be carried out at the same time
at various parts of the network. In fact, the strength of the approach is that multiple
simultaneous investigations can be carried out.
The integrity of a big data stream ensures that a message sent from sources to the
data centre (DSM) is not modified by malicious intermediates. Authentication of big
data streams ensures that the data are from legitimate sources to maintain end-to-end
security services.
Data confidentiality (privacy) is a set of guidelines that restrict access or puts
limitations on specific data streams. This guarantees that given data cannot be
comprehended by anybody other than the desired receivers whether the data is in
transit or at rest.
Data confidentiality measure on a successful exploitation of vulnerability on the
target system as follows.
Strong confidentiality: Only desired recipients can read the information.
Partial confidentiality: There is considerable informational disclosure in some
situations.
No confidentiality: A total compromise of critical system information.
5.3 Research Challenges and Research Motivation
In this section, we present the research challenges and motivations in detail. Here we
have highlighted the challenges towards the proposed approach followed by
motivations to the research problem.
5.3.1 Research challenges
As we proposed two solutions in last two chapters, symmetric cryptographic
solution is the best way to protect data in a faster way by saving buffer space and
133
computational overhead. Multiple levels of data streams need different keys to
perform the encryption and decryption.
We proposed two security frameworks in the last two chapters, where source
sensors initialise the shared key by themselves. This is simply impossible when we
focus on using multiple shared keys for multiple sensitive levels of data in big data
streams. It will be even more complex to reinitialise the lost or desynchronised keys.
So existing security solutions are not suitable to apply for multilevel security for big
data streams. A unique solution is needed for the above specified problems.
Existing symmetric cryptographic based security solutions for data security are
either static shared key or centralised dynamic key. In static shared key, we need to
have a long key to defend from a potential attacker. Length of the key is always
proportional to security verification time and strength of security. The confidentiality
level of big data streams depends on the strength of security encryption. It concludes
that the length of a shared key used for encryption is directly proportional to
confidentiality level. This chapter divides big sensing data streams into three levels
based on the data sensitivity level. Data levels are defined as high sensitivity, low
sensitivity and open access data with a need to apply strong confidentiality, partial
confidentiality and no confidentiality at respective data levels.
From the required features of big data streams specified in the last subsection, it is
clear that security verification should be in real-time. We need a solution to generate
and initialise the multiple shared keys seamlessly. A big data stream is always
continuous in nature and huge in size. This makes it impossible to halt data for
rekeying, distribution to the sources and synchronisation with DSM. To address this
problem, we are proposing a scheme for big data stream security verification with
seamless shared key initialisation at source sensors. This process needs to be done at
the DSM, because the DSM is located at the cloud data centre and according to our
assumptions the DSM is fully secured.
The common problem in the data flow between sensors and DSM is that attackers
may read the data in the middle while still in transit. Also the DSM needs to identify
the sensitive level of data packets before security verification. Then DSM can apply
separate keys for different levels of the data packets to perform decryption because
the sources applied separate keys to encrypt data packets based on their data
sensitivity level. In this proposed model, key exchanges happen once during the
134
handshaking process, but DSM updates the shared keys to individual sources before
expiration of current shared keys. Synchronisation between a source and DSM is
important in key update; otherwise it will result in unsuccessful security verification.
Buffer size for the security verification is another major issue because of the
volume and velocity of big data streams. According to the features of big data
streams (i.e. 4Vs) we cannot halt the data for more time before security verification
which leads us to appoint bigger buffer size and may reduce the performance of
SPEs. So buffer size reduction is one of the major challenges for big data streams.
Proposed security solutions for big data streams should deal with reduction of buffer
size (smaller buffer size).
5.3.2 Research Motivation
As stated in previous chapters, we cannot always apply strong encryption techniques
to maintain data confidentiality of big data streams. Security processing time
(encryption/decryption/security verification) is directly proportional to length of the
key. Length of the key used for encryption is directly proportional to strength of
security. So we conclude that we cannot always apply longer/stronger keys to protect
data confidentiality and integrity. So we decided to divide the complete data streams
into three different classes i.e. high sensitivity data, low sensitivity data and open
access data. So now we need to provide strong confidentiality for high sensitivity
data, partial confidentiality for low sensitivity data and no confidentiality for open
access data. Strong confidentiality will not allow an adversary to read data streams in
its life time, whereas partial confidentiality will protect data streams to disclose
information in real-time. Then we need to apply strong encryption for strong
confidentiality and weak encryption for partial confidentiality and no encryption for
open access data.
Another major motivation is to perform the security verification in near real time
in order to synchronise with the processing speed of SPEs [82]. Stream data analysis
performance should not degrade because of security processing time, there are
several applications that need to perform data analysis in real time. Also DSM needs
to identify the individual data packets based on its data sensitivity level to apply
separate shared key(s) to perform the decryption or security verification. So a
135
lightweight multilevel security mechanism is very important to perform security
verification in near real time and reduce buffer size.
5.4 Selective Encryption Method for Big Data Streams
This chapter proposes a selective encryption method for big data stream (SEEN)
which is furnished with key renewability and makes a tradeoff among security,
performance and resource utilisation. The SEEN security method’s salient features
are as follows:
efficient key broadcasting without retransmission;
ability to recover the lost keys with proper detection;
seamless key refreshment without interrupting data streams; and
maintain data confidentiality based on data sensitivity level.
Table 5-1 Notations Acronym Description
ith source sensing device’s ID
ith source sensing device’s secret key
DSM secret key
Initial secret key
Initial shared key generated by DSM
KSH(1) Shared key for strong encryption
KSH(0) Shared key for weak encryption
Time of packet generation
Time packet is received at DSM
Pseudorandom number
Centralised authentication code
SL Data sensitivity level
MAC Message authentication code ( ) / D( ) Encryption/Decryption function
( ) One-way hash function X-OR operation
Concatenation operation
136
We describe the proposed security method for big sensing data streams using four
independent components: system setup, rekeying, new node authentication, and
encryption/decryption. We refer readers to Table 5-1 for all notations used in
describing the security scheme. We made a number of sensible and practical
assumptions to characterize the proposed security method. We describe those
assumptions where necessary. We next describe independent components in detail.
5.4.1 Initial System setup
We follow the symmetric key method for the initial system setup because of the
limited resource availability at the source sensors [145]. In symmetric key
encryption, hashing functions need 5.9 μJ and encryption techniques 1.62 μJ whereas
in an asymmetric key, RSA-1024 needs 304 mJ to sign and 11.9 mJ to verify and
ECDSA-160 needs 22.82 mJ to sign and 45 mJ for verification [145]. So we choose
to follow symmetric key methods for initial setup. In the system setup process, DSM
always starts the process to identify the authenticated source. After successful
authentication, DSM shares the secret shared keys to the source sensors for
encryption. The initial shared key setup phase is as follows:
DSM generates a pseudorandom number (PNR) and performs a hash function
combined with its own secret key to generate a unique secret shared key. It then
encrypts the shared key by using the pre deployed secret key (k). The process is as
follows to generate a CAC (Centralise Authentication Code). The DSM broadcasts
the CAC to all the source sensors i.e. (1, …, n).
Once all the sensors receive the broadcast CAC from the DSM, sensors decrypt it
by using a pre deployed secret key (k) (i.e. ). Here we show the
operation for a single senor (i.e. ith sensor). The following is the procedure to be
performed at the sensor and sends an encrypted CAC to the DSM. The CAC contains
source ID, random number as nonce, and a timestamp to avoid replay attacks.
137
Once the CAC is received at the DSM, it decrypts and checks the source ID ( )
for authentication and retrieves the corresponding sensor secret key from its
database (Ki retrievekey (Si)). It also checks the time stamp to avoid replay
attacks. The complete procedure for authentication and replay attack avoidance is
shown below.
Ki retrievekey (Si) // for source authentication
(T- packet generated time; T′ - Packet received time)
It compares the received time frame (T) with its current time (T′) to check the
data freshness in order to avoid a replay attack (T - T′ ≤ ΔT). If the time difference is
less than ΔT , the DSM accepts the data packet otherwise the packets are discarded.
The DSM then generates a new key by performing X-OR on the existing shared
key and sensor’s secret key. The DSM uses this shared key to encrypt the nonce and
sends back to the corresponding sensor for handshaking along with a weak
encryption shared key.
After a sensor (Si) receives the data packet, it performs the same operation as the
DSM did to find the new shared keys to encrypt the data packets. It compares the
decrypted nonce ( ) with the nonce it has ( ); if both are the same, then it accepts
otherwise it rejects and starts a new authentication process. Received uses
64-bit key for weak encryption and uses 128-bit key for strong encryption.
138
If , then the sensors accept otherwise the process starts from the
beginning. The complete authentication process is shown in Figure 5-2, where we
show the stepwise process with information flow.
5.4.2 Re-keying
After this initial key setup phase, the DSM shares the shared secret key with
sensors for encryption. For the rekeying process, we follow a LiSP protocol [146]
and modify it to make SEEN more data centric instead of communication centric.
SEEN uses a key server (KS) at the DSM, that manages the security keys for both
strong and weak encryption. We use 128-bit symmetric shared key for strong
encryption and 64-bit symmetric key for weak encryption. Shared keys from KS are
always chosen to perform the rekeying operation. Along with the shared key,
individual sensors are able to perform the hash function.
In order to make the system more secure, the shared key distribution for rekeying
must be secure and fault tolerant; where “secure” means to maintain the
confidentiality and authenticity and “fault tolerant” implies the capacity to restore
the lost shared key ( ). In the SEEN method, we always use two kinds of control
packets i.e. UpdateKey and RequestKey. UpdateKey is for periodically updating the
Figure 5-2: Initial authentication methods with four step process.
139
shared key used by DSM, whereas RequestKey is used by sensors when they missed
the shared key during the rekeying process.
We follow PRESENT [147] to generate the shared key at DSM and distribute the
key before it is used for encryption at source sensors. The sensors have two buffer
places for each key; that means four buffer places are required to save the keys as
shown in Figure 5-3. The front two shared key are always used for encryption and the
back buffers contain the next shared key before the current shared key expires.
To ensure secure shared key distribution, the DSM initiates the shared key
distributions by encrypting the control packet (UpdateKey) using the current shared
key (KSH(i-1)) to distribute the next shared key (KSH(i)). The UpdateKey is always in
the format of EKSH(i-1)(KSH
i(1) KSHi(0)), where KSH is the current shared key and all
authenticated sensors have this key to perform the encryption. Let us assume the time
to change the shared key is t; this means the DSM needs to initialise the shared key
before the time t′. If the sensor did not get the shared key at time δt (t-t′= δt), then it
calls the RequestKey. The RequestKey always has the format of EKSH(Si ti), where
the control packet encrypts with the current shared key. The control packet contains
sensor ID (Si) for authentication and time slot (ti) where the sensor needs the shared
key for encryption. In such situations, the DSM sends an UpdateKey message to the
corresponding sensors. Algorithm 5-1 shows the stepwise procedure for rekeying.
5.4.3 New Node Authentication
Joining new nodes to the network is a common property of sensor networks. We
assume that the source node is initialised by the DSM during initial deployment
[148]. In such cases, source sensors always start the process to authenticate with
DSM to get the current shared key. Sensors use a control packet (i.e. InitKey) to start
the process. InitKey contains the source ID encrypted with the initially deployed
Figure 5-3: Key Selection.
140
secret key i.e. . Once the DSM receives the control packet, it checks its
authenticity. If the DSM succeeds in the authentication process, then it follows the
Initial key setup (from Figure 5-1) phase to share the current shared key. The DSM
uses the current shared key (KSH) instead of generating a new key i.e.
. At the final stage of sharing the shared key, the DSM shares the
keys along with a time stamp (ti) to source sensors ( ). For
the robust clock skew and shared information details, the source sensor can get the
information from its neighbours [146].
Algorithm 5-1. Rekeying process
t – time to rekeying
t′ – time to DSM starting the shared key distribution
δt – small time before t expires
1. At time t′: the DSM broadcasts (UpdateKey)
UpdateKey EKSH(KSH(1) KSH(0))
2. Sensors use the current shared key (EKSH) to get the next shared
key
DKSH(KSH(1) KSH(0))
3. At time δt: If any sensor does not have the next shared key
Sensors unicast to DSM (RequestKey)
RequestKey EKSH(Si ti)
4. After authentication, the DSM unicasts (UpdateKey)
UpdateKey EKSH(KSH(1) KSH(0))
Figure 5-4: Shared key management for robust clock skew.
141
5.4.4 Reconfiguration
The DSM will configure the shared key at the time of the next rekeying process, if
(1) any of the source sensors have been compromised; (2) any of the shared keys
have been revealed; (3) a source node has overtly requested the shared key; or (4) a
source has joined to participate in the data stream. The first condition forces all
source devices to be reconfigured, whereas the final two issues focus on requesting
that the source be configured. The actions required for the issues highlighted above
are summarised as follows:
(I) DSM withdraws the compromised nodes as the authenticated source, and if the
KSH(i) has been disclosed previously. This may expose all earlier shared keys.
(II) DSM computes new shared keys for both strong and weak encryption and
unicasts with control packets.
(III) DSM replies to the requesting source with current configuration.
Figure 5-5: Method to select encryption method based on the data sensitivitylevel.
142
(IV) DSM follows the authentication process, and if successful, DSM responds to
the source by initialising an InitKey control packet.
5.4.5 Encryption/Decryption
The above defined process makes both shared keys (KSH(1) KSH(0)) available at
sensors. Note that KSH(1) is always used for strong encryption, whereas KSH(0) is
always used for weak encryption. Each data block generated at sensors is a
combination of two different parts. The first part is for integrity checking and
maintaining the confidentiality level, whereas the other part is for the source
authentication (i.e., . The authentication part is always
encrypted using the strong encryption key; it contains the source ID for
authentication, time stamp (T) to avoid replay attack, and a flag value 1/0; where 1 is
for strong encryption of body part (highly sensitive data) or 0 for weak encryption of
body part (low sensitivity data). In order to encrypt the data part of the packet, every
sensor performs the XOR operation i.e. current shared keys KSH(1/0) with its own
secret key (ki), i.e. KSH(1)′ = KSH(1) Ki and KSH(0)′ = KSH(0) Ki’ then it uses the
newly generated key to encrypt the data packets. Shared key EKSH(1)′ is always used
for strong encryption, whereas shared key EKSH(0)′ is always used for weak
encryption (DATA MAC). The above specified data block encryption is always
based on the data sensitivity level and for data integrity and confidentiality.
The Capture layer of an IoT system (i.e. physical layer of sensor networks) is
responsible for obtaining the context of data from the deployed environment using
source sensors. This layer is also accompanied by classification methods, which
mostly follow the unsupervised neural network methods, such as KSOM (Kohonen
Self-Organising Map) used to categorize the real time sensed data [149]. The word
“sensor” not only signifies a sensing device but also applies to each data source that
may deliver functional context information [150]. Ganesan et al. [151] proposed a
similar kind of system, named DIMENSIONS, where authors extended sensors for
computation, storage and system performance. We follow the KSOM technique to
classify the sensed data at sensors to define the sensitivity level. KSOM uses data
mining techniques and classification to extract the data sensitivity level. Few sensors
143
are also pre-deployed with a high sensitivity level, where all generated data packets
are sent to the DSM with a high sensitivity level. The steps to select the encryption
method and shared key are shown in Figure 5-5. The strong encryption method
always uses a shared key EKSH(1) for highly sensitive data, whereas the weak
encryption method always uses the shared key EKSH(0) for low sensitivity data.
After data are received at the DSM, it always checks the authentication block and
applies the strong encryption shared key to get the authentication information i.e.
. Once it gets the source sensor ID, it checks its own database
to find the match and confirms that data packets are from authenticated sources. After
successful authentication, the DSM compares the received time frame (T) with its
current time (T′) to check the data freshness in order to avoid a replay attack (T - T′
≤ ΔT). After successfully checking for a replay attack, the DSM retrieves the
corresponding secret key of the sensors i.e. Ki retrievekey (Si) and checks the data
sensitivity level to find out the shared key used for encryption. If the data sensitivity
level is 1, then it performs the XOR operation i.e. KSH(1)′ = KSH(1) Ki, else KSH(0)′
= KSH(0) Ki. The computed new key is used for data decryption i.e. DKSH(1)′
(DATA MAC) for high sensitivity data and DKSH(0)′ (DATA MAC) for low
sensitivity data. After data decryption, the DSM compares the MAC as an integrity
check. The DSM always keeps the last shared key KSH(i-1) during use of KSH(i).
There is always the possibility of late arrival of data packets at a DSM because of
the untrusted wireless communication medium.
5.4.6 Tradeoffs
The communication overhead of the proposed security method depends on the
source network size. Each group of keys will have more chances of being
compromised as the network size increases [146]; this in turn increases the chances
of reconfigurations. The SEEN method accomplishes significant improvement over
a traditional rekeying approach by broadcasting shared keys without retransmissions.
Less frequent reconfigurations mean rich performance. In summary, a larger
network improves the performance by more proficient broadcasting. Therefore, we
144
can tradeoff between energy consumption and security to maximise the overall
performance.
The proposed shared key management is made robust by synchronizing clocks
among neighbours. As an example, Figure 5-4 describes the time domain and key-
slots of three sensors i.e. A, B and C, where there is a clock skew among three nodes.
Every UpdateKey control packet contains the time stamp to switch the shared key
(ti).
5.4.7 Required Resources for SEEN
5.4.7.1 Resources at Sensors for SEEN
We follow [140] to define the communication overhead and power consumption
theoretically. Following are the equations to define the communication overhead and
power consumption
(5-1)
PC1= 3(CSE + CSD) (5-2)
PC2= (TNSP + CSE) + (TNRP + CSD) (5-3)
PC3= (CSE + 2 × CSD) (5-4)
CO – Communication overhead
PC1 – power consumption during node authentication (initial phase)
PC2 – power consumption by a node during data transmission
PC3 – power consumption during the rekeying
Nc – total number of connections
Ni – number of packets transferred by node Si
CSE – computational power required by symmetric key encryption
CSD – computational power required by symmetric key decryption
TNSP – total number of sent packets
TNRP – total number of received packets
145
The communication overhead is always computed as a percentage and considers
the total number of communications and the number of packets transferred by each
sensor (see equation 5-1). The number of packets/total size is 74.125 bytes, whereas
the data packet size is 30 bytes. Power consumption for initial node authentication is
three times that required by both encryption and decryption processes (see equation
5-2). Each node needs a certain amount of power to participate in data transmission
and data packet encryption. The normalized form of power consumption shows in
equation 5-3. Sensors need power to decrypt the UpdateKey and also to initiate
RequestKey during the rekeying process. Sensors always need more power to
perform the encryption and decryption process. Equation 5-4 shows the
formulations for power consumption during rekeying.
5.4.7.2 Resources at DSM for SEEN
The buffer utilisations needs to be optimized at the DSM, as the security
mechanism of a big data stream needs to be performed in near real time because of
the big data stream features [4 - 5]. Here we present a procedure to compute the
halting time of a data block in a buffer before the stream data analysis is done. Let
there be n number of sensors and each sends m number of data packets. We assume
that the probability of success security verification at DSM is , or
delays with probability . We can compute Acquisition Probability
as [152]. Based on the value of A, we can measure
the resting time of the each individual data block; the resting time, represented as w,
is , where the value of w is inversely proportional to the value of A and
security verification time of DSM.
Algorithm 5-2. Selective encryption method for big sensor data streams
Description Applying different encryption (Strong/weak) to different
sensitivity levels of data.
Input Select the encryption methods for data packets based on
different sensitivity level.
Output Protect big data stream and maintain different levels of
146
confidentiality based on data sensitivity.
Step 1 SEEN System setup
1.10 DSM → Si: {Ek(KSH)}, DSM performs the centralised authentication 1.11 Si → DSM: { }, ith sensor authenticates the DSM and sends
this packet to DSM for registration. 1.12 DSM → Si: { }, DSM generates the shared key for
both strong and weak encryption and shares with corresponding source sensors.
Step 2 SEEN Rekeying
SEEN uses two control packets i.e. UpdateKey, RequestKey. UpdateKey
always uses EKSHi-1(KSH
i(1) KSHi(0)), where KSH is the current shared
key.
DSM → Si: { EKSHi-1(KSH
i(1) KSHi(0))}
If some sensors did not get the next shared key before δt, then sensor
uses RequestKey to get shared key from sensors.
Si → DSM: { EKSH(Si ti)}
Step 3 SEEN Node authentication
New node in the sensing network uses control packet named as InitKey
i.e. and sends request packet to DSM. Then DSM follows Step 1
by using current shared key.
Si → DSM: { }, follows Step 1 after this initial step.
Step 4 SEEN Encryption/Decryption
4.4 Encryption Every sensor has KSH(1) KSH(0) strong/weak encryption keys. Sensor
applies KSH(1) for authentication check and generates new key i.e.
KSH(1)′ = KSH(1) Ki to encrypt data packets.
EKSH(1/0)′ / EKSH(0)′ (DATA MAC)
4.5 Decryption DSM uses shared key (KSH(1)) for strong encryption for header.
Based on the flag value (1/0), DSM selects the shared key for
decryption, then performs the decryption process.
KSH(1/0)′ = KSH(1/0) Ki
147
DKSH(1/0)′ (DATA MAC)
5.5 Theoretical Analysis
This section provides theoretical analysis of the security scheme to show that it is
safe against attacks on authenticity, confidentiality and integrity.
5.5.1 Security Proof
We follow [4 - 5, 42 - 43] to define the following attack definitions and their
properties. Based on these attack definitions, we have proved the following theorems
and theoretical analysis.
Definition 1 (attack on authentication): An attacker Ma can attack on authenticity
if it is an adversary capable of monitoring, intercepting, and introducing him/herself
as an authenticated node to participate in the data stream.
Definition 2 (attack on integrity): An attacker Mi can attack on integrity if it is
capable of monitoring the data stream and trying to access and/or modify the data
block before DSM.
Definition 3 (attack on confidentiality): A malicious attacker Mc is an unauthorised
party which has the ability to access or view the unauthorised big data streams.
Definition 4 (replay attack): A malicious attacker Mr is an unauthorised party which
has the ability to intercept data packets and forward them later. This may cause the
loss of event detection during stream data analysis.
Theorem 1: Strong encryption (128-bit) is always safer and takes more
computational power and time than weak encryption (64-bit) in the SEEN security
model.
Proof: ECRYPT II proved that the key length of a 128-bit symmetric key provides
the same strength of protection as a 3,248-bit asymmetric key [34 - 35]. Symmetric
key cryptography becomes a natural choice for this purpose. It is mentioned with a
proof that symmetric key cryptography is approximately 1000 times faster than
148
strong public key ciphers [34]. From [4 - 5, 34], it is comparatively easy for an
attacker to read/modify packets which are encrypted with a smaller key length.
Crypto++ Benchmarks [153] also confirm that a smaller key length always takes less
time to break or find the shared key. From the above, we conclude that the key
length is directly proportional to the key domain size and also directly proportional to
the time required to find all the possible keys (see Table 3-2). This means an attacker
needs more computational time and resources to break 128-bit compared to 64-bit.
Theorem 2: DSM can easily identify delayed data packets which have been
intercepted by a replay attacker (Mr) using the SEEN security method.
Proof: A replay attack is also broadly known as playback attack, where an attacker
(Mr) intercepts the data packet(s) of data streams and forward later. The attacker also
repeatedly sends the data packets to block the DSM. This is carried out either by the
source sensor or man-in-the-middle attack.
In the SEEN security method, during encryption the source sensor always adds a
time stamp i.e. T (sending time/packet generation time) at the header part of the data
packets. This is in the format and is always used for
authentication and data freshness. For every data packet, the DSM compares the
received time frame (T) with its current time (T′) to check the data freshness and to
avoid a replay attack (T - T′ ≤ ΔT). If the time difference is less then ΔT then the data
packet is accepted otherwise it is discarded. Where ΔT tends to maximum time takes
to transmit data between source sensors and DSM. After successfully checking for
replay attack, DSM follows the data decryption process.
Theorem 3: In the SEEN security method, an attacker Mc cannot access or view the
unauthorised high sensitive data stream, whereas Mc cannot read low sensitivity data
stream in real time.
Proof: By following Figure 5-5 and Algorithm 5-2, it is clear that every data stream
or data packet within a stream is transmitted with a sensitivity level. Here in the
SEEN method, we consider two sensitivity levels i.e. ‘1’ for high and ‘0’ for low. For
highly sensitive data, sensors use 128-bit shared key i.e. KSH(1). The computational
hardness of the shared key is shown in Table 3-2. It shows that the most advanced
processor (Intel i7) takes decades to find all possible shared keys to perform the
decryption operation. Attacker Mc cannot read the high sensitivity data (strong
confidentiality).
149
For lower sensitivity data in the SEEN model, sensors use 64-bit shared key i.e.
KSH(0). Attacker Mc also needs years to get all the possible keys to decrypt the data
packets (see Table 3-2). The maximum time to update the shared key (i.e. t) is always
less than the time required for 64-bit in Table 3-2. So attacker Mc can read the low
sensitivity data but it is not possible in real time.
By following the above, we can confirm that the SEEN model maintains
confidentiality for sensitive data and partial confidentiality for low sensitivity data.
Theorem 4: An attacker Ma cannot forge the source to introduce itself as an
authenticated source and attacker Mi cannot get the shared key KSH to break data
integrity in the proposed security method SEEN.
Proof: The IDS at a sensor always monitors the sensor behaviour [141 - 142] and
reports to the DSM if it is captured by an attacker Ma. In this situation, the DSM
ignores the specific source sensor and does not consider the data packets from that
sensor for data analytics. DSM checks the authentication for each individual data
packet, where data packets arrive in a format . DSM
always applies the strong encryption shared key (i.e. KSH(1)) to decrypt and check for
authentication. After decryption, DSM compares the Si with its own database to
authenticate the source.
After a source device is authenticated, the DSM retrieves the corresponding secret
key of the sensors i.e. Ki retrieveKey(Si) and checks the data sensitivity level to
select the shared key used for encryption.
Based on the data sensitivity level, the DSM performs XOR operation i.e.
KSH(1/0)′ = KSH(1/0) Ki. The newly computed shared key will be used for data
decryption i.e. DKSH(1/0)′ (DATA MAC). After data decryption, DSM compares the
MAC as an integrity check. Through the MAC check, the DSM confirms that the
integrity of the data is intact.
The major drawback (this is also applicable to all other security models) is that the
confidentiality and integrity checks can be broken with a brute force attack.
Theorem 5: The proposed SEEN requires a comparatively small buffer size
compared to standard symmetric key solutions (i.e. AES-128) at DSM before stream
query processing.
Proof: Following Algorithm 5-2, it is clear that the proposed SEEN security method
provides a high level of data confidentiality for sensitive data, while it provides
150
partial confidentiality for low sensitivity data. We decrypt the header part for
authentication (see Theorem 4) and data freshness (see Theorem 2); after successful
authentication, we decrypt the data block for integrity checks. Another important
mechanism is the different keys with the key length used for encryption/decryption.
From Theorem 1, it is clear that key length is directly proportional to security
verification and security verification speed is inversely proportional to the buffer
required for security verification. By combining the above, we conclude that the
proposed SEEN security method needs a comparatively small buffer size. The
evaluation proof is in the following section.
5.5.2 Forward Secrecy
By following a standard symmetric key cryptography procedure, shared keys
used for encrypting data packets are only used once and until they are expired (i.e.
for time period t). Thus, previously used shared keys are worthless to an intruder
even when a previously-used shared key is known to the attackers. This is one of the
major advantages of frequent changing of the shared key. This is the reason we use
symmetric key cryptography over asymmetric-key cryptography. However, if an
intruder continuously monitors the data stream for a long period of time, he/she can
break the confidentiality of the low sensitivity data but not the high sensitivity data
(see Table 3-2). In order to maintain the different level of confidentiality, the SEEN
security method uses two different keys for different levels of data sensitivity. At the
same time data integrity is always maintained.
5.6 Experiment and Evaluation
In order to evaluate the security strength and efficiency of the SEEN security
method under the above specified adverse situations, we experimented in multiple
simulation environments. The experiment was conducted using the in-house
simulators on an Intel (R) Core (TM) i5-6300 CPU @ 2.40 GHz 2.50 GHz CPU and
8 GB RAM running on Microsoft Windows 7 Enterprise. We first verified the
proposed security approach using Scyther [119]; second, we measured the
151
performance of the approach using JCE (Java Cryptographic Environment) [120];
third, we computed the required buffer size to process the proposed approach using
MatLab [121] to measure the efficiency of the security method; finally, we used
COOJA simulator in Contiki OS [118] to get the network performance of SEEN.
5.6.1 Security Verification
The SEEN security protocol is simulated in the Scyther simulation environment
by using the underlying Security Protocol Description Language (.spdl). Scyther is
an automatic security protocol verification tool that can be used to check the
correctness of the security protocols. As per the Scyther model, we defined the roles
of S and D, where S is a sensing device and D is the receiver (i.e. DSM). In this
scenario, S and D have all information for encryption/decryption that is initialised in
the system setup and rekeying phase. In this simulation environment, S sends the
encrypted data packets to D for security verification. We introduced three types of
attacks. First, an attacker changes the data packet while it is network. In the second,
an adversary steals the property of source (S) and forwards the data packets to D
pretending to be S. In the third, an adversary gets the data block to analyse and tries
to read the data and replay the data packets. We experimented with 100 runs with 10
run intervals for individual claims with results as shown in Figure 5-6. Here, we
model the security method by following the previous section and used different key
sizes (i.e. 64 bits, and 128 bits) in random data packets. Here we follow the SEEN
method to update the different keys (see Table 3-2).
152
Results: This experiment ranges from 0 to 100 instances in 10 intervals using
different numbers of data blocks. We checked the data integrity and confidentiality
after data packet authentication. As the key generation and distribution process is
handled by DSM, so we assumed that none of the intruders have the shared secret
key. We are using two different keys for the encryption process i.e. (K(0)) for weak
encryption and ( (1)) for strong encryption. This also confuses the intruder in
attempting to guess the key. During this experiment, we did not come across any
potential attacks at the DSM to compromise the shared key, so it is secured in terms
of confidentiality and integrity. Figure 5-6 shows the result of the security
verification experimented with in the Scyther simulation environment. Finally, we
conclude that the proposed model is secured against confidentiality and integrity
attacks.
(a) Scyther simulation result page of successful security
verification.
(b) Scyther simulation result page of successful security at DSM.
Figure 5-6: Scyther simulation result page of security verification.
153
In practice, attacks may be more sophisticated and efficient than brute force
attacks. Here, we model the process as described in the previous section and used
different key size (i.e. 64 bits, and 128 bits) in random data packets. The efficacy of
the proposed security shows in two important instances (i) during the security
verification at DSM and (ii) during the neighbour authentication process. We used
Scyther, an automatic security protocols verification tool for verifying our model.
5.6.2 Performance Comparison
We used JCE (Java Cryptographic Environment) to experiment on and evaluate
the performance of the SEEN method. JCE is the standard extension to the Java
platform that provides an implementation context for cryptographic methods. The
experiment is based on the features of the JCE in 64 bit Java virtual machine version
1.6. The security verification time of the experiment is computed in the DSM. The
experiment outcomes for security verification are shown in Figure 5-7. We
performed experiments and compared security verification time with different sizes
of data packets. We compare the performance of SEEN security with advanced
encryption standard (AES-128, AES-192), LSec and previously proposed models for
big sensing data streams i.e. DPBSV and DLSeF [4 – 5, 140].
Figure 5-7: Performance comparison SEEN method with AES-128, AES-192,
LSec, DPBSV, and DLSeF.
154
Results: The experiment results of the SEEN security method are better than
AES-128, AES-192 and LSec algorithms with different data packets as shown in
Figure 5-7. SEEN does not use the trusted part of sensor (i.e. TPM) and avoids
confidentiality attacks in comparison to DPBSV and DLSeF (see Table 5-2). So
even though the performance of SEEN is not as good as DPBSV and DLSeF, it is
acceptable in any circumstance of sensor network applications. The performance of
SEEN shows that it is more efficient and faster than AES-128, AES-192 protocols
while providing the same level of security and removing some of the unrealistic
assumptions of DPBSV and DLSeF.
5.6.3 Required Buffer Size
This experiment for the required buffer size at DSM was carried out using a
MATLAB Simulation tool. The buffer size is based on the security verification time
at DSM (from Figure 5-7) with respect to different velocity of big data streams. This
performance is based on the verification time calculated as shown in Figure 5-8.
Here we compared the SEEN security method with standard AES-128, AES-192,
LSec, DPBSV and DLSeF (see Figure 5-8). The velocity of big data streams starts
from 50 to 300 MB/S with a 50 MB/S interval. The required buffer size for SEEN is
Figure 5-8: Efficiency comparison by comparing required buffer size at DSM
for security processing.
155
always smaller than the AES-128 algorithm with different rates of incoming data.
Figure 5-8 shows the minimum buffer size required at the DSM for the SEEN
method in comparison with AES-128, AES-192, LSec, DPBSV and DSeF. The
performance comparison proves that the SEEN method requires less buffer and is
efficient in performing security verification without compromising any security
properties.
5.6.4 Network Performance
We tested the SEEN protocol using a COOJA simulator in Contiki OS to get the
network performance (i.e. communication overhead and power consumption) [118].
We took the two most common types of sensor (i.e. Z1 and TmoteSky sensors) for
network simulation. In this experiment, we checked the performance while
computing and distributing the shared key.
For network simulation, we took a random area to deploy 51 nodes (i.e. 50
sensors and 1 DSM) in a COOJA simulation environment. We took initial battery
power of an individual sensor node 1x106 J, power consumption for transmission is
1.6W and power consumption for reception is 1.2W. Apart from these, we follow
the default properties of Z1 and TmoteSky sensors. We assume that the size of each
Figure 5-9: Energy Consumption.
156
data packet is 30 bytes, nonce 23 bits, secret key of 64/128 bits and token 4 bytes for
the simulation [140].
In order to compute the performance of the communication overhead, the
simulation used data packets of size 30 bytes of continuous interval. We follow
equation 5-1 to get the communication overhead. The performance of the
communication overhead is computed as a percentage (%) with respect to the
number of data packets as shown in Table 5-3. According to the network properties,
the communication overheard is inversely proportional to the number of packets in
the network. The simulation results also show similar performance to Table 5-3.
For every connection, SEEN exchanges control packets for source/DSM
authentication and shared key distributions based on the above specified packet sizes.
This is an acceptable tradeoff between energy and security for the sensor node. The
simulation results of energy consumption are shown in Figure 5-9. The SEEN
protocol required extra battery power for the network authentication but its
difference is very low. The energy consumption by using the SEEN protocol
remains the same even by increasing network size. We simulated the scenario using
50 nodes in 10 nodes interval as shows in Figure 5-9.
Table 5-2 Performance and Properties of Security Solutions
AES DPBSV DLSeF SEEN
Authenticity
Integrity
Confidentiality P P
Trust on Sensor Node
(TPM)
Computation HIGH LOW LOW LOW
Table 5-3 Communication overhead of SEEN protocol
NP 10 20 30 40 50 60 70 80
CO (%) 25 23 12.8 11 8 6.8 6 6
157
From all the above security analysis and experiments, we conclude that the
proposed security method (i.e. SEEN) is secured (from multi-level confidentiality
and integrity attacks), and efficient in terms of security verification speed and
required buffer size at DSM (compared to AES-128, AES-192 and LSec). Table 5-2
shows the comparisons of SEEN security properties with AES-128 and existing
DPBSV and DLSeF. It clearly shows that the proposed method provides the same
level of security as AES-128 while reducing computational overhead.
5.7 Summary
This chapter proposed a Selective Encryption (SEEN) method to maintain
confidentiality levels of big sensing data streams with data integrity. In SEEN, a
DSM independently maintains intrusion detection and shared key management as the
two major components. The method has been designed based on a symmetric key
block cipher and multiple shared key use for encryption. By employing the
cryptographic function with selective encryption, the DSM efficiently rekeys without
retransmissions. The rekeying process never disrupts ongoing data streams and
encryption/decryption. SEEN supports source node authentication and shared key
recovery without incurring additional overhead. We evaluated the performance of
SEEN by security analyses and experimental evaluations. We found that the SEEN
method provides significant improvement in processing time, buffer requirement and
protected data confidentiality and integrity from malicious attackers.
158
Chapter 6
Access Control Framework for Big
Sensing Data Streams
Chapter 5 solved the important steps of security verification by providing multilevel
security based on data sensitivity level in big data streams. Another important step is
to control the information leakage after security verification of big data streams. We
refer to this as an access control or information flow control problem over big
sensing data streams. To address this problem, we propose a lattice based
information flow control over big sensing data streams. We initialised two static
lattices i.e. sensor lattice for source sensor and user lattice for user or query
processor. We consider static lattices to process the information flow model faster,
because we are dealing with big data streams i.e. high volume and velocity of data
streams. The experimental results of the proposed information flow model show that
it can significantly handle the incoming big data streams with low latency and buffer
requirement.
6.1 Introduction
159
Data Stream Management Systems have been increasingly used to support a wide
range of real-time applications (e.g. military applications and network monitoring,
battlefield, sensor networks, health monitoring, financial monitoring) [115]. Most of
the above application, need to protect sensitive data from unauthorised accesses. For
example, in battlefield monitoring, the position of soldiers should only be accessible
to the battleground commanders. Even if data are not sensitive, it may still be of
commercial value to restrict access. So there is need of classification of types of
data/source accessible to end users. In another example, financial monitoring service,
stock prices are delivered to paying clients based on the stocks they have subscribed
to. Hence, there is a need to integrate access control mechanisms into a stream
manager. All the above applications also deal with stream data. As a first step to this
direction [154] presented a role-based access control model specifically tailored to
the protection of data streams. Objects to be protected are essentially views (or
rather queries) over data streams. The model supports two types of privileges: a read
privilege for operations such as selection, projection, and join and aggregate
privileges for operations such as min, max, count, avg, and sum. Another important
issue to be addressed is related to access control enforcement. This issue is further
complicated by the fact that access control mechanisms should process in real time
with high volume and velocity of the data. Nonetheless, one of our goals is to
develop a framework which is as lightweight as possible, independent from the
target stream engine.
One of the key decisions when developing an access control mechanism is the
strategy to be adopted to enforce access control. In this respect, three main solutions
can be adopted: preprocessing, post processing, and query rewriting. Preprocessing
is a naïve way to enforce access control according to which streams are pruned from
the unauthorised tuples before entering the user query. The main drawback of this
simple strategy is that it works well only for very simple access control models,
which, unlike ours, do not support policies that apply to views. We believe that this
is an essential feature to be supported, because it allows the specification of very
useful access control policies. For instance, if preprocessing is adopted, it is not
possible to enforce a policy authorizing a captain to access the average heartbeats of
his/her soldiers, but only during the time of a certain action and/or of those soldiers
positioned in a given region. In contrast, post processing first executes the original
160
user query, and then it prunes from the result the unauthorised tuples before
delivering the resulting stream to the user. Like preprocessing, this strategy has the
drawback that it does not support access control policies defined over portions of
combined streams. On support of the lattice structure, we design novel secure
operators (namely, Secure Read, Secure View, Secure Join, and Secure Aggregate)
that filter out from the results of the corresponding (not secure) operators those data
instances that are not accessible according to the specified access control policies.
The first article for access control over data streams supports a very expressive
access control model and, at the same time, is, as much as possible, independent
from the target DSMS [115]. In order to address the aforementioned challenge, we
have proposed an information flow control model using static lattices for source
sensors and users/query processors. Our method is based on typical lightweight to
handle big data streams. The main contributions of the chapter can be summarised as
follows:
We have developed and designed a novel information flow model to control the
access to big data streams using lattice structure.
Our proposed model uses two static lattice structures to map the data in a faster
way. Both of the lattice structures are divided with three level of data from the
SEEN method (from last chapter).
We validate our proposed method by theoretical analyses and experimental
results.
We evaluated the performance of the proposed model in realtime Kafka cluster.
The remainder of this chapter is organised as follows: background studies are
reviewed in the next section, Section 6.3 provides the system design consideration
including definitions, QoS requirements, Section 6.4 describes access control
mechanisms over big data streams, Section 6.5 evaluates the performance and
efficiency of this model through experimental results and Section 6.6 summarises
the contributions in this chapter.
161
6.2 Background Studies
In 2005, Stonebraker et al. [24] initially highlighted the eight requirements of real
time stream processing which makes stream processing research more challenging
and different to batch processing. In 2009, Nehme et al. [82] proposed a spotlight
architecture to highlight the need for security in data streams and differentiate the
security requirements of data (called data security punctuations) and query side
security policies (called query security punctuations).
6.2.1 Stream Processing
The Data Stream Management System, also known as STanford stREam data
Manager (STREAM), was initially developed by Arsu et al. in 2003 [23]. STREAM
is designed to deal with high velocity data rates and substantial numbers of
continuous queries through thoughtful resource allocation. Most of the work carried
out in the Data Stream Management System addresses different issues ranging from
theoretical modelling and analysis to executing comprehensive models to deal with
high speed data streams and response in real-time (near real-time). Research
methodologies include: STREAM [23], Aurora [155], and Borealis [26]. In data
stream management systems like STREAM [23], Aurora [155], and Borealis [26],
queries issued by the same client in the meantime can share Seq-window
administrators.
According to the STREAM framework, Seq-window administrators are reused by
queries on indistinguishable streams. Rather than developing the sharing of parts
between arrangements, Aurora research focuses on giving better execution over vast
numbers of queries. Aurora achieves this by clustering administrators as a basic
performance entity. In Borealis, the data on input information criteria from query
processing can be shared and changed by new approaching queries. StreamCloud is
a large scalable reliable streaming system to handle large scale data streams on
clouds [139]. StreamCloud utilises a new parallelization strategy that separates input
queries into subqueries apportioned to free arrangements of hubs to reduce the
circulation overhead. Even though numerous methodologies focus on scheduling
162
and revising for QoS, distributing execution and computation by the same user at
various times or by various users in the meantime are not supported in stream
processing engines. Other than common source Seq-windows as in DSMS, sharing
intermediate computation results is a superior approach to improving performance.
The focus of this research was on the performance of query processing, but not
much on the security issues in data streams. Nehme et al. [82] highlighted security
aspects of data streams, where the following subsection describes details about
security issues.
6.2.2 Stream Security
There have been several recent works on securing data streams
[82][156][157][158][154][159][160][161] focusing on query security punctuations
i.e. access control over data streams. In spite of the fact that these frameworks
support secure processing they are unable to avoid illegitimate data streams or data
security. Punctuation based enforcement of access control on streaming data is
proposed in [161]. Access control strategies are retransmitted each time, utilising
one or more security accentuations before the real data are transmitted. Both
punctuations have been prepared by streamshield (a unique filtration) for query plan.
Secure query processing in a shared manner is proposed [156]. From the
streamshield concept, the authors show a three-phase system to enforce access
control without presenting any unique operators, rewriting query, or influencing
QoS. Supporting role-based access control through query rephrasing strategies is
proposed in [154][158]. Query arrangements are reorganised and policies are
mapped to an arrangement of guide and filter operations to authorise access control
policies. The architecture in [159] utilises a post-query channel to implement access
control strategies at a stream level. The channel applies security arrangements before
a client gets the outcomes from SPE, but after query preparing. Designing SPEs
checking multilevel security imperatives has been tended to by authors in [157]. Xie
et al. [160] adopt a Chinese Wall policy to protect and avoid sensitive data
disclosure at DSMS.
163
The focus of this research was on query security punctuation, however data security
punctuation i.e. end-to-end security between source and SPE, is the mission. The
following subsection describes details of end-to-end security.
6.2.3 Chinese Wall Policy
Brewer and Nash [162] first demonstrated how the Chinese Wall policy can be used
to prevent consultants from accessing information belonging to multiple companies
in the same conflict of interest class. However, the authors did not distinguish
between human users and subjects that are processes running on behalf of users.
Consequently, the model proposed is very restrictive as it allows a consultant to
work for one company only. Sandhu [114] improves upon this model by making a
clear distinction between users, principals, and subjects, defines a lattice-based
security structure, and shows how the Chinese Wall policy complies with the Bell-
Lapadula model [163].
6.3 Design Consideration
In this section we give the system architecture, access control definitions, QoS
parameters and adversary model of the information flow control of big data streams.
6.3.1 System architecture
This section presents our example application that motivates the need for secure
stream processing in adaptive computing environments. We have a security
verification model that aims to prevent and detect attacks in real-time in the DSM
before data reaches the cloud. Such a service (i.e. security verification) provides
warning about various types of attacks, often involving multiple sources or data
stream in transit.
Figure 6-1 shows a multi-tier architecture of the big data stream access control
using a lattice model. This architecture figure includes source sensing devices to
transmit data to the DSM through wireless networks including a lattice model to
164
control the data stream access using information flow control model. Several
applications such as military monitoring and healthcare need to protect data against
unauthorised disclosure [138][140]. Various types of auditing may take place in the
data centre. The first level is the information flow control after security verification
at DSM, represented by the access control over data streams to specific users or
query processors. In this phase, the access control activities to data streams are
analysed in isolation. The next level is the data processing in the SPE and stream
query processing. The stream data processing is shown with connecting dark arrows,
which depicts the internal communication within the cloud. For further information
on stream data processing on data centre, refer to [128].
Along with this, we follow the SEEN method (from the last chapter) to deploy
Intrusion Detection Systems (IDS) at source and cloud data centre. Sensor based
IDS monitor a sensor’s behaviour and generates alerts on potentially malicious
activities onboard and network traffic [142]. IDS at source side can be set inline,
attached to a spanning port of a sensor [142]. IDS monitors a sensor’s behaviour and
generates alerts on potentially malicious sensor behaviour. The idea here is to allow
access to all packets we wish the IDS to monitor. IDS in a cloud data centre
generally computes inter and intra audit record patterns, this can guide the data
gathering process and simplify feature extraction from audit data [143]. This
technique analyses the system vulnerabilities quickly and accurately.
In our architecture, the data streams are always in the encrypted format when they
arrive at the DSM. DSM performs the security verification by using the shared keys
(initiated and distributed by the DSM from the SEEN method). The shared key
selection is based on the Flag value (FV) associated with incoming data packets.
After security verification, DSM sends data to the stream query processor and end
users. As data packets come with different sensitivity level or flag value, there must
be restriction on the data stream access.
Due to the characteristics of streaming data, there are a number of inherent
challenges that make continuous access control enforcement a challenging task. First,
a common characteristic of data streams is their high volume and high velocity of
data. It is not feasible to store all streaming data tuples with all their security
restrictions (which may be numerous and of fine granularity, as a result of large
165
number of users and various preferences for security) and take random accesses as
done in traditional databases. One scan of data and its security restrictions with
compact memory usage is required. Second, due to large data volume and velocity,
the speed of access control enforcement algorithms must be extremely fast, to be
synchronised with the incoming data. Third, given that continuous queries are
typically long running, and due to users’ mobility, wireless connectivity and changes
in preferences, data and its “sensitivity” are likely to change during the query
execution lifetime. Thus, the access control mechanism must be adaptive to possibly
very frequently changing and quite complex security policies. The foremost
challenge is the speed of enforcement. The security policies must take effect
immediately and with the correct precision to prevent any information leaks that
may occur, when access is no longer authorised, and to ensure that the access to data
is not denied, when an access privilege has in fact been granted, especially when it is
crucial to view the data (e.g. in case of an emergency). Finally, since the results in
streaming environments are expected to be produced in near real-time and security-
related processing is nothing but an added “overhead” compared to traditional
continuous query processing, the cost of the access control enforcement mechanism
must be as low as possible, not to decrease the utility of DSM.
By considering the above features and limitations of big data streams, we defined
our architecture. Figure 6-1 gives an overview of the access control architecture
using a lattice model. Here we divide the architecture into three parts while
describing the access control. Firstly, there is a set of sensors deployed at the source
area to sense data and send back to cloud data centre for analysis and decision
making. Every sensor is associated with a static lattice (we called it sensor lattice
(A1, B1, C1)) with predefined sensitivity level. Every sensed data packet associated
with a FV is sent to the cloud data centre. Secondly, DSM performs the security
verification before data reaches the data centre/query processor. DSM performs the
security verification over big data streams to ensure the data confidentiality,
integrity and authenticity. Once data is verified for its originality, DSM pushes data
streams to the query processor. Finally, control the user access of big data stream
using lattice model. At this step DSM pushes data to a set of users and query
processors. We are also maintaining a static lattice (we called it user lattice (A2, B2,
C2)) with predefined sensitivity level of users and query processors. This user and
166
query processor classification is based on the access to the sensitivity level of data
streams. This lattice mapping always satisfies the partial order relating, where a
class of users/query processors can access the same level of sensitivity data or less
sensitive data. They are not allowed to access higher sensitivity data.
6.3.2 Definition
In the following, we present an information flow model for big data streams to
protect against improper leakage and disclosure. We provide an information flow
model that is adapted from the lattice structure for the Chinese Wall proposed by
Sandhu [114]. We follow [160] to define the following definitions related to access
control over big data streams using a lattice model.
We have a set of data classes that provide access identification. These classes are
partitioned into conflict of interest classes based on the data access level. Classes
provide access to the same or lower level classes. Consequently, it is important to
protect against disclosure of sensitive information to unauthorised or users/query
processors. We begin by defining how the conflict of interest classes are represented.
Definition 1. [Conflict of Interest Class Representation:]
The set of companies providing service to the cloud are partitioned into a set of n
conflict of interest classes, which we denote by COI1, COI2, COI3. Each conflict of
interest class is represented as COI1 ≥ COI2 ≥ COI3.
Figure 6-1: Overview of access control of big data streams using a lattice model.
167
We next define the security structure of our model. Each data stream, as well as the
individual tuples constituting it, is associated with a security level that captures its
sensitivity. The security level associated with a data stream dictates which entities
can access it. An input data stream generated by sensors offering some service has a
security level that captures the organisational information. Input streams may be
processed by the stream processor to generate derived streams. Before describing
how to assign security levels to derived data streams, we show how security levels
are represented.
Definition 2. [Security Level Representation:]
A security level is represented as an n-element vector [i1, i2, . . . , in], where i j
COIj { } {T} and 1 ≤ j ≤ n. i j = signifies that the data stream does not
contain information from any sensors in COIj; i j COIj denotes that the data
stream contains information from the corresponding source sensor in COIj .
Consider the case where we have three COI classes, namely, COI1, COI2, and COI3.
The stream generated by sensor of COI1 has a security level of {1}. Similarly, the
stream generated by source sensors in COI2 has a security level {2}. Finally, the
stream from sensors in COI3 has a security level {3}. We next define the dominance
relation between security levels.
Definition 3. [Dominance Relation:]
Let L be the set of security levels, L1, L2 and L3 be three security levels, where
L1,L2, L3 L. We say security level L3 is dominated by L2, similarly security level
L2 is dominated by L1 and all these security levels satisfy partial order relation.
6.3.3 QoS Requirements
Constraint 1: Latency
Continuous big data streams and mapping to end user or query processor have to
satisfy certain QoS requirements with regards to performance, memory usage, and
accuracy. In this work we only consider the QoS performance metrics which is in
terms of latency. Latency is the amount of time it takes for a data instance to be
processed through lattice structure including any wait time incurred. Thus, it is the
duration from the time a data instance arrives at the leaf node of the operator tree to
168
reach the output buffer of the root node. Note that data instance latencies are
applicable only for data that are used in the output computation. We follow [164] to
design the latencies in our work which are computed as follows.
Let pi = <vi1, vi2, . . .vin> be one such path in an operator tree where vij , 1 ≤ j ≤ n,
denotes a vertex in the operator tree. Latency of vij , denoted by latency(vij), depends
on the operator type, specific algorithm for computing the operator, the waiting time
encountered in the queues, and the window size for the blocking operator. Latency
of (vik, vi(k+1)), denoted by latency(vik, vi(k+1)), depends on the size of the results sent
from vertex vik to vertex vi(k+1) and also on the bandwidth of the channel connecting
vik to vi(k+1). The tuple latency along this path pi, denoted by latency(pi), depends on
the processing latency at each vertex and the communication latency at each link in
this path.
Let m be the number of operator paths in the set of operator trees. The total tuple
latency of the system, denoted by system_latency, adapted from [165], is computed
as follows:
Where wi is the ratio of the number of output tuples along path pi to the number of
output tuples from the set of operator trees. QoS requirements may be that the
system latency be less than some given threshold value, denoted by threshold. Thus,
Constraint 2: Processing/Storage Capacity of Nodes
Since stream processing nodes are resource constrained, it is important to calculate
the resource utilisation of the nodes during information flow through lattice
generation. We use the notation nij to signify that the operator tree vertex vj is
executing at the node i. Recall that cpu(i), memory(i) indicate the available CPU and
memory for node i. Let proc(i), mem(i) denote the processing cost, storage cost
respectively incurred at node i due to the execution of the various operators. We
169
follow [164] to define the processing/storage. and
. Note that, proc(i) ≤ cpu(i) and mem(i) ≤ memory(i) at any
given point of time. In the context of our example, node N8 may not have the
capacity to perform multiple select operations.
6.3.4 Adversary Model
In our architecture from Figure 6-1, we assume that a large number of sensor nodes
are sources of big data streams. These sensors are fully connected and communicate
to DSM through a wireless medium. We assume that the DSM is aware of the
network topology and initially deployed node from the SEEN method (from the
previous chapter). We also assume that IDS is positioned at each source device and
at the cloud data centre so that source sensors and cloud data centre are capable of
detecting packet-loss attacks and data modifications [137]. The DSM is treated as
fully secured and protected in our model as it resides at the cloud data centre, where
we are performing security verification over big data streams. After successful
security verification, we implemented the access control mechanism over big data
streams to protect against information leakage and unauthorised access.
There can be several ways of attacking over big sensing data streams access:
Several applications such as health monitoring, the patient may not want any
unauthorised user or query processor to access his/her health data. Here privacy
protection of personal health data is crucial.
As we have created three sensitivity levels of data as well as users (i.e. high
sensitivity, low sensitivity, open access). Data should not be accessible to lower
level of users, rather it should accessible to the same and higher levels.
Data level in data streams should not be modified in the transit between source
sensor and DSM.
Each node whose IDS detects a packet loss attack will investigate the loss; we
assume the investigating source device to be trustworthy and not to report any false
response. This assumption is particularly important for the Majority Voting
algorithm adopted as part of our approach. However, we will also present a variant
of this algorithm able to relax this constraint, and thus able to tolerate up to a
170
confident number of colluding investigating source nodes. This is solved using the
SEEN method before access control mechanism implementation.
6.4 Access Control Model
According to Sandhu’s definition on lattice based access control, users are
defined as humans, subjects are processes and objects are files [114]. We follow the
same way to define our system, where users are humans and subjects are query
processors (QP) and objects are data blocks after security verification at DSM. We
use a standard five steps/stages process for the information flow control model. The
five stages are as follows:
Stage 0: structure module;
Stage 1: information flow between the levels;
Stage 2: recursive lattice construction;
Stage 3: conflict of interest;
Stage 4: decision over data access.
By following the above steps, information flow control policies specify under
which conditions information may be exchanged or accessed by the users and query
processor.
From the previous chapter’s description of security verification at DSM, sensors
always generate the data packets with the format {DATA; 1/0; Si, Si/DSM}, where
DATA means encrypted data packets, 1/0 means the flag value (FV) to define the
data sensitivity level, Si means the source of the data and finally Si/DSM shows who
has the influence to modify the data packets. As the shared keys are generated and
distributed by the DSM, DSM always has access and influence to modify the data
packets. Source sensor (Si) also has influence to access and modify the data packets
as it generates the data packets and encrypts using authenticated shared keys. After
security verification at DSM, we check the flow model (FM) to define the access
control. This flow model controls the access and information flow, and opens data
packets to only authenticated users and query processors. We made the flow model
171
simple and defined the static lattices for lightweight processing over big data
streams. There are three different ways of flow management, namely no
management, centralised management and distributed management. We follow
centralised management at DSM after security verifications. We defined our flow
model as follows
FM = <S, O, SC, →>
Where: S = Subjects
P = Processes
SC = Security Classes
= Can-flow relation on SC
Here we did not add an operations option in our FM, because our focus is only to
read or access the data stream instead of writing. We define a static lattice for
sensors, which will label incoming data streams and a static lattice for users to
define the access class for both user and query processor. The lattice structure with
access policy is shown in Figure 6-2. The lattice is a Directed Acyclic Graph (DAG)
with a single source and information is permitted to flow from a lower class to upper
class. We have divided our lattice into three classes i.e. {A, B, C}, where 1 is for
user lattice i.e. {A1, B1, C1} and 2 is for sensor lattice i.e. {A2, B2, C2}. We
defined A as the highest class (i.e. for high sensitivity information), followed by B
defined as lower class (i.e. low sensitivity data) and finally C is defined as lowest
class for open access information.
From previous chapter source sensor (Si) uses two different shared keys i.e.
KSH(1), KSH(0) for data packet encryption. Sensors use KSH(1) for strong encryption
Figure 6-2: Lattice model for data access
172
to protect high sensitivity data and append FV as 1 in data packet to define the data
sensitivity level and shared key used for data encryption, whereas sensors use KSH(0)
for weak encryption and append FV as 0 in data packets for low sensitivity data.
Finally, sensors do not encrypt the data packets for open access data. DSM identifies
the data packets with associated FV, which shared keys use to decrypt the data
packets and data sensitivity class for access control. There may be possibilities of
FV modification, when data is in transit between source sensor and DSM. In such
cases, DSM cannot decrypt the data packets using current shared key and drops
these data packets as they have already been modified before reaching the DSM. So
these data will not appear at the sensor lattice structure to map into the user lattice.
Figure 6-2 shows the access policy, where a class of user lattice has access to the
same and lower level classes of sensor lattice. We follow a modified Chinese Wall
model for information flow control by Snadhu [114] to define the conflict of interest
between the classes. This access policy always satisfies the properties of reflexive,
antisymmetric and transitive (i.e. partial order). Partial ordering on a set L is a
relation where:
a L, a a holds (reflexive)
a,b L, if a b, b a, then a = b (antisymmetric)
a,b,c L, if a b, b c, then a c (transitive)
Figure 6-3: Experiment Setups
173
We follow this partial order relation between the classes of lattice to define
access control of big data streams. This tends to query security punctuations (qsps)
of data streams.
6.5 Experimental Evaluation
6.5.1 System setup
This section presents the experimental evaluation of our access control
mechanism. The specification of the machine involved in the benchmarking is
depicted in Table 6-1. Since Kafka is currently in single machine cluster, the
benchmarking utilised only a single CPU core of the machine. The experiments have
been conducted using a Java application i.e. producer in Kafka cluster that we term
as the Dataset Reader. The Dataset Reader reads and separates text files and turns
them into data stream by using the producer’s API. We are using two sets of datasets
i.e. (1) HT sensor data (from home activity sensors) and (2) twin gas sensor datasets
[166]. HT sensor dataset size is approximately 2500MB and there are 1 million data
instances from 100 sensors. The twin gas sensor dataset includes the recordings of
Figure 6-4: Mapping time for HT Sensor Dataset
174
five replicates of an 8-sensor array. Each unit holds 8 MOX sensors and integrates
custom-designed electronics for sensor operating temperature control and signal
acquisition. Twin gas sensor dataset size is approximately 195 MB and there are
around 3.5 million data instances. The data descriptions are in Table 6-2. We have
conducted a set of performance experiments on the content-based broker, but did not
assess the runtime information for the actual filtering engine.
Table 6-1 Machine Specifications Hardware Description
CPU Intel(R) Core(TM) i5-6300U CPU @ 2.60 GHz 2.50 GHz
CPU Cores 8
RAM 8 GB
OS Ubuntu 15.10, Linux kernel 4.2.0-16
Kafka kafka_2.11-0.10.1.0
Java VM OpenJDK 64-Bit Server Java SE 8.0 65
JDK 1.8.0_111
Table 6-2 Dataset Information Dataset Count Size
HT Sensor Data 1000000 190 MB Twin Gas Sensor Data 2800000 2500 MB
Figure 6-5: Mapping time for twin gas sensor dataset
175
This complete architecture is implemented in a Kafka cluster, which gives a
distributed real-time streaming platform as shown in Figure 6-3. We input the
datasets (from Table 6-2) at producer’s API. The architecture uses three APIs i.e.
Producer API, Consumer API, Stream API, and Connector API. The Producer
API inputs datasets as a stream to a Kafka topic, the Connector API builds and runs
reusable producers and consumers to connect predefined Kafka topics of data
streams. We kept the consumer and stream API in our implementation environment
without any modification. We modified the broker to associate the mapping of data
instances with three levels (groups) of consumers. The broker maps the incoming
data streams (instances) with an appropriate level of consumer group. The data
mapping uses a queue and follows first-in-first-out (FIFO) model. In our
implementation architecture, the zookeeper works as a controller without any force
of modification in it.
6.5.2 Results Discussion
We have tested the proposed access control mechanism over big data streams in
the Kafka cluster system as described in the last subsection. The experiment was
conducted using two datasets i.e. HT sensor dataset and twin gas sensor dataset from
Table 6-2. These datasets map to static lattices from our model description to get the
performance. First we tested the performance of our proposed information flow
control mechanism using the HT sensor dataset. The performance is in terms of
mapping time or time taken to assign data instances to specific level of user and/or
query processor based on its FV. This is also termed as latency. The mapping time
using HT Sensor datasets is shown in Figure 6-4. This figure shows the data
mapping time with one million data instances, where we found that our static
information flow control model (three levels of data) takes around 70 seconds for
one million data instances. In the same figure, we also calculated the time taken for
single level, where we need around 55 seconds to map one million data instances. In
the same way we also evaluated the twin gas sensor dataset, which is around three
times larger than the HT sensor dataset. The results of the twin gas sensor dataset are
shown in Figure 6-5. From this figure, we found that around 200 seconds are
176
required to process around three million data instances for three-level data
sensitivity. From the same figure we also get that 150 seconds are needed for a
single level of data. From the above two experiments, we conclude that our three
levels of data do not take more time compared to the single level of data. So the
proposed method can be applicable for big sensing data streams.
6.6 Summary
This chapter proposed an information flow control method to control the access
over big sensing data streams, which aims to provide a real-time processing
infrastructure. The model has been designed based on a static lattice model to make
faster processing and to deal with high volume and velocity of big sensing data
streams. A static lattice is initialised for a source sensor with three levels of data
sensitivity and in the same way a static lattice for users with three levels of data
access. This lattice comparison works just after security verification of data packets
at DSM. This access control mechanism protects against information leakage and
unauthorised access. Several applications such as battlefield monitoring, health
monitoring need stream processing and access control over big sensing data streams.
The proposed information flow control model performs in near real time to control
unauthorised access over big sensing data streams. The performance of the proposed
information flow control model using a lattice structure for big sensing data streams
was evaluated in a real-time Kafka cluster. The performance of the proposed model
shows that it can be applied in big sensing data streams.
177
Chapter 7
Conclusion and Future Work
So far, the main modules of the proposed security solutions for big sensing data
streams have been framed in the last four chapters (from Chapter 3 to Chapter 6).
This chapter concludes the thesis by recalling the research contributions of each
chapter individually. Following that, we point out some major and promising future
work based on the contributions and those deserving further exploration. The
conclusions are drawn in Section 7.1, and the future work is presented in Section
7.2.
7.1 Conclusions
Big data streams analytics and its impact on batch processing in cloud
infrastructure have impacted significantly on current IT industries and computer
science research. Data security and access control has become one of the most
significant emerging issues to support the decision making system in the above two
trends. One of the problems is that traditional and existing data security techniques
are neither scalable nor efficient, due to the 4Vs properties of big sensing data
streams. These issues of big data streams prompt us to propose a new security
framework for big sensing data streams. In this thesis, we have proposed an end-to-
178
end and efficient security framework in terms of the real-time security verification
and employing a smaller buffer size without compromising security threats. In the
context of big data, the framework has provided a holistic conceptual foundation for
a security solution over big data streams and enables authenticated users and query
processors to access data without any kind of interruption. We have framed the big
sensing data streams format and highlighted its security issues in detail in Chapter 2.
Modules of the framework have been elaborated with proposed security solutions
from Chapter 3 to Chapter 6. We conclude these chapters as follows:
In Chapter 2, we have covered the background studies of big data streams and
cloud data centre and related security issues. As we are moving towards the
IoT, the number of smart sensing devices deployed around the world is
growing at a rapid pace. The communication medium between the source
sensing device and cloud data canter is wireless, and we know that wireless
communication medium is the most untrusted transmission medium. As
security will be a fundamental enabling factor of most IoT applications and
big data streams, mechanisms must also be designed to protect
communications enabled by such technologies. This chapter analyses existing
protocols and mechanisms to secure the IoT generated big data stream, as
well as open research issues. Along with the big data stream security survey,
we also highlighted the layers IoT security threats because in our survey we
took the source of big data stream as IoT. We analysed the existing
approaches to ensure fundamental security requirements and protect IoT
generated big data streams, together with the open challenges and strategies
for future research work in the area. We classified the big data stream security
based on the CIA triad features.
In Chapter 3, we proposed a novel authenticated key exchange scheme,
namely Dynamic Prime-Number Based Security Verification (DPBSV),
which aims to provide an efficient and faster (on-the-fly) security
verification scheme for big sensing data streams. The proposed scheme has
been designed based on a symmetric key block cipher. We update our shared
key at both source sensor and DSM independently by considering a random
prime number generation method. In the DPBSV scheme, we decrease the
179
communication and computation overhead by dynamic key initialisation at
both sensor and DSM ends, which in effect eliminates the need for rekeying
and decreases the communication overhead. We evaluated the proposed
security scheme in both theoretical analyses and experimental evaluations,
and showed that our DPBSV scheme has provided significant improvement
in processing time, required less buffer for processing and prevented
malicious attacks on authenticity, integrity, and partial confidentiality. DSM
implementation appears just before stream data processing as shown in our
main architecture diagram from Chapter 1. The proposed security
verification scheme (i.e. DPBSV) performs in near real time to synchronise
with the performance of the stream processing engine. Our main aim is not to
degrade the performance of stream processing such as Hadoop, S4, and
Spark etc.
In Chapter 4, we investigated a novel authenticated key exchange protocol,
namely Dynamic Key Length Based Security Framework (DLSeF), which
aims to provide a real-time security verification model for big sensing data
streams. DLSeF protocol is designed based on symmetric key cryptography
and dynamic key length to provide more efficient security verification of big
sensing data streams. This security model is designed by two dimensional
security i.e. not only the dynamic key but also the dynamic length of the key.
Later on, we proposed a synchronisation technique to get the key generation
properties from source neighbours. In such situations, source sensors are not
required to communicate to DSM in desynchronisation in shared key
generation. In our model, we decrease the communication and computation
overhead by performing dynamic key initialisation along with dynamic key
size at both source sensing devices and DSM, which in effect eliminates the
need for rekeying and decreases the communication overhead. The proposed
DLSeF model performs security verification in near real time to synchronise
with the performance speed of the stream processing engine. Our major
concern is not to degrade the performance of stream processing by
performing security verification in near real time. We demonstrated the
proposed DLSeF security model in both theoretical and experimental
evaluations. We showed that our DLSeF model has provided significant
180
improvement in the security processing time, and prevented malicious
attacks on authenticity, integrity and partial confidentiality.
In Chapter 5, we proposed a Selective Encryption (SEEN) method to
maintain confidentiality levels of big sensing data streams with data integrity.
In SEEN, a DSM independently maintains intrusion detection and shared key
management as the two major components. Our method has been designed
based on a symmetric key block cipher and multiple shared key use for
encryption. By employing the cryptographic function with selective
encryption, the DSM efficiently rekeys without retransmissions. SEEN uses
two different keys for encryption/decryption (i.e. strong encryption for
strong confidentiality, weak encryption for partial confidentiality and no
encryption for open access data). The rekeying process never disrupts
ongoing data streams and encryption/decryption processes. SEEN supports
source node authentication and shared key recovery without incurring
additional overhead. We evaluated the performance of SEEN by security
analyses and experimental evaluations. We found that our SEEN provides
significant improvement in processing time, buffer requirement and
protected data confidentiality and integrity from malicious attackers.
In Chapter 6, we have investigated an information flow control method to
control the access over big sensing data streams after security verification at
DSM. Our aim is to provide a real-time processing infrastructure that
synchronises with the stream processing engine. The model has been
designed based on a static lattice structure to make faster processing and to
deal with the 4Vs properties of big sensing data streams. A static lattice is
initialised for a source sensor with three levels of data sensitivity and in the
same way a static lattice is initialised for users with three levels of data
access. This lattice comparison works just after security verification of data
packets at DSM. This access control mechanism protects against information
leakage and unauthorised access. Several applications such as battlefield
monitoring and health monitoring need stream processing and access control
over big sensing data streams. The proposed information flow control model
performs in near real time to control unauthorised access over big sensing
181
data streams. The performance of the proposed information flow control
model using lattice structure for big sensing data streams was evaluated in a
real-time Kafka cluster. We conclude from the experiment that the proposed
model has shown that it can be applied in big sensing data streams, which will
not degrade performance for stream processing engines.
7.2 Future Work
Based on the roadmap of Figure 1-1 in Chapter 1, our research mainly focused on
security aspects of big sensing data streams. Based on the contributions in this thesis,
we point out several issues still worth being investigated in future in this section.
One direction is to further investigate security verification over big sensing
data streams from the perspectives of efficient and lightweight security
solution. We have investigated by proposing a symmetric key block cipher
for security verification at DSM. Our solution in this thesis is increasing
the performance by decreasing communication and computational
overhead. We compared our solution with AES technique. The foremost,
proposed security solution is to perform a comparative study of our work
with other symmetric key cryptographic techniques such as RC4, RC6. We
will further investigate new strategies to improve the efficiency of
symmetric-key encryption towards more efficient security-aware big data
streams. We are also planning to investigate using the technique to develop
a moving target defence strategy for the Internet of Things.
It is promising to extend our ideas and methods to complex and hybrid
data streams. In this thesis, we mainly investigate the security solutions for
data streams from one type of source i.e. sensors. But in the big data era,
more and more types of sources are involved and termed as sources of
data, e.g. sensors, mobile devices, cameras, social network data etc.
Privacy and security concerns exist in such data streams as well. It is a
challenging task to apply a simple security solution for different types of
data streams. It will also be a challenging task to get the synchronisation
182
properties from neighbours as stated in our previous chapters in the hybrid
sensing area. In such cases we need to apply different security solutions for
different types of big data streams or a hybrid security structure to support
hybrid big data sensing data streams.
We are also planning to further investigate the proposed selective
encryption technique to improve the efficiency of symmetric-key
encryption towards more efficient security-aware big sensing data streams.
We are also planning to introduce the access control model over big
sensing data streams, which will give access to the end user or query
processor based on the data level. Our proposed solution for information
flow control using a static lattice is to make the flow model lightweight to
deal with big sensing data streams. But there are several situations and
applications that exist to change the data sensitivity level based on time or
situation. For an example, in battlefield sensing some sensors may be send
very low sensitivity data throughout the day but suddenly change to high
sensitivity mode once they find any abnormal activity in the area. In this
situation we need to implement a dynamic lattice structure to deal with
data stream sensitivity level based on situations. And also we have to make
the solution lightweight for big sensing data streams.
With the contributions of this thesis, we are planning to further investigate
security architecture of big sensing data streams in real time cloud data
centre. We will integrate the security solution with recent advances of
stream processing engines such as Apache Storm while collecting data
from source sensors. We have already developed a testbed for data
collection and end-to-end security and information flow control to control
the access to big data streams. Finally we need to integrate all these setups
in a cloud to get the performance.
183
Bibliography
[1] S. Tsuchiya, Y. Sakamoto, Y. Tsuchimoto and V. Lee, "Big Data Processing
in Cloud Environments," FUJITSU Science and Technology Journal, vol. 48,
no. 2, pp. 159-168, 2012.
[2] “Big data: science in the petabyte era: Community cleverness Required,”
Nature, vol. 455 no. 7209, pp. 1, 2008.
[3] The Big Data Big Bang, https://en.wikipedia.org/wiki/Exabyte, accessed on
December 28, 2015.
[4] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "A Dynamic Prime Number
Based Efficient Security Mechanism for Big Sensing Data Streams." Journal
of Computer and System Sciences, vol. 83, no. 1, pp. 22- 42, 2017.
[5] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "DLSeF: A Dynamic Key
Length based Efficient Real-Time Security Verification Model for Big Data
Stream." ACM Transactions on Embedded Computing Systems, vol. 16, no. 2,
pp. 51:1-51:24, 2016.
[6] Big Data Definitions: http://dx.doi.org/10.6028/NIST.SP.1500-1
[7] Big Data Use Cases and
Requirements: http://dx.doi.org/10.6028/NIST.SP.1500-3
[8] Big Data Security and Privacy: http://dx.doi.org/10.6028/NIST.SP.1500-4
[9] Big Data Reference Architecture: http://dx.doi.org/10.6028/NIST.SP.1500-6
[10] D. Puthal and B. Sahoo, "Secure Data Collection & Critical Data
Transmission in Mobile Sink WSN: Secure and Energy efficient data
184
collection technique." LAP Lambert Academic Pubilishing: Germany, 2012.
ISBN: 978-3-659-16846-8.
[11] J. Deng, R. Han and S. Mishra. "INSENS: Intrusion-tolerant routing for
wireless sensor networks." Computer Communications, vol. 29, no. 2, pp.
216-230, 2006.
[12] M. A. Jan, P. Nanda, X. He, Z. Tan and R. P. Liu, "A robust authentication
scheme for observing resources in the internet of things environment. " In
13th International Conference on Trust, Security and Privacy in Computing
and Communications (TrustCom), pp. 205-211, 2014.
[13] D. Zissis and D. Lekkas, "Addressing cloud computing security issues,"
Future Generation Computer Systems, vol. 28, no3, pp. 583-592, 2012.
[14] C. Liu, X. Zhang, C. Yang and J. Chen, "CCBKE-Session key negotiation
for fast and secure scheduling of scientific applications in cloud computing."
Future Generation Computer Systems, vol. 29, no 5, pp. 1300-1308, 2013.
[15] M. Benantar, R. Jr and M. Rathi, "Method and system for maintaining client
server security associations in a distributed computing system." U.S. Patent
6,141,758, issued October 31, 2000.
[16] B. Kandukuri, V. Paturi and A. Rakshit, "Cloud security issues." IEEE
International Conference on Services Computing, (SCC'09), pp. 517-520,
2009.
[17] V. Borkar, M.J. Carey and C. Li, “Inside "Big Data Management": Ogres,
Onions, or Parfaits?,” In 15th International Conference on Extending
Database Technology (EDBT'12), pp. 3-14, 2012.
[18] S. Chaudhuri, "What Next?: A Half-Dozen Data Management Research
Goals for Big Data and the Cloud." In 31st Symposium on Principles of
Database Systems (PODS'12), pp. 1-4, 2012.
[19] A. Labrinidis and H. Jagadish, "Challenges and Opportunities with Big
Data." Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2032-2033,
2012.
[20] J. Granjal, E. Monteiro and J. Sá Silva, "Security for the Internet of Things:
A Survey of Existing Protocols and Open Research Issues." IEEE
Communications Surveys & Tutorials, vol. 17, no. 3, pp. 1294-1312, 2015.
185
[21] J. Dean and S. Ghemawat, "MapReduce: A Flexible Data Processing Tool,"
Communications of the ACM, vol. 53, no. 1, pp. 72-77, 2010.
[22] P. Mell and T. Grance, The Nist Definition of Cloud Computing (Version 15),
U.S. National Institute of Standards and Technology, Information
Technology Laboratory, 2009. URL: http://www.nist.gov/itl/cloud/upload/
cloud-def-v15.pdf, accessed on: 01 April, 2014.
[23] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein
and J. Widom. "STREAM: the stanford stream data manager (demonstration
description)." In ACM SIGMOD international conference on Management
of data, pp. 665-665, 2003.
[24] M. Stonebraker, U. C¸ etintemel and S.B. Zdonik, "The 8 Requirements of
Real-Time Stream Processing." SIGMOD Record, vol. 34, no. 4, pp. 42-47,
2005.
[25] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M.
Stonebraker, N. Tatbul and S. Zdonik, "Monitoring streams: a new class of
data management applications." 28th international conference on Very Large
Data Bases, pp. 215-226, 2002.
[26] D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M.
Stonebraker, N. Tatbul and S. Zdonik, "Aurora: a new model and
architecture for data stream management." The VLDB Journal—The
International Journal on Very Large Data Bases, vol. 12, no. 2, pp. 120-139,
2003.
[27] S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. M.
Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss and M. A.
Shah, "TelegraphCQ: continuous dataflow processing." In ACM SIGMOD
international conference on Management of data, pp. 668-668, 2003
[28] B. Albert, "Mining big data in real time." Informatica (Slovenia), vol. 37, no.
1, pp. 15-20, 2013.
[29] M. Dayarathna and S. Toyotaro, "Automatic optimization of stream
programs via source program operator graph transformations." Distributed
and Parallel Databases, vol. 31, no. 4, pp. 543-599, 2013.
186
[30] D. Puthal, B. Sahoo, S. Mishra and S. Swain, "Cloud Computing Features,
Issues and Challenges: A Big Picture." In International Conference on
Computational Intelligence & Networks (CINE), pp. 116-123, 2015.
[31] H. Demirkan and D. Delen, "Leveraging the capabilities of service-oriented
decision support systems: Putting analytics and big data in cloud." Decision
Support Systems,vol. 55, no. 1, pp. 412-421, 2013.
[32] J. Lu and D. Li, "Bias correction in a small sample from big data." IEEE
Transactions on Knowledge and Data Engineering, vol. 25, no. 11, pp.
2658-2663, 2013.
[33] J. M. Tien, "Big data: unleashing information." Journal of Systems Science
and Systems Engineering, vol. 22, no.2, pp. 127-151, 2013.
[34] J. Burke, , J. McDonald and T. Austin, "Architectural support for fast
symmetric-key cryptography." ACM SIGOPS Operating Systems Review,
vol. 34, no. 5, pp. 178-189, 2000.
[35] www.cloudflare.com (accessed on: 04.08.2014)
[36] A. Boldyreva, M. Fischlin, A. Palacio and B. Warinschi, "A closer look at
PKI: Security and efficiency." In 10th international conference on Practice
and theory in public-key cryptography (PKC '07), pp. 458-475, 2007.
[37] K. Park, S. Lim and K. Park, "Computationally efficient PKI-based single
sign-on protocol, PKASSO for mobile devices." IEEE Transactions
on Computers, vol. 57, no. 6, pp. 821-834, 2008.
[38] PUB, NIST FIPS, 197: Advanced encryption standard (AES), Federal
Information Processing Standards Publication 197 (2001): 441-0311.
[39] S. Heron, "Advanced Encryption Standard (AES)." Network Security, vol.
2009, no. 12, pp. 8-12, 2009.
[40] J. Aemen and V. Rijmen, "The design of Rijndael: AES-the advanced
encryption standard." Springer Springer Science & Business Media, 2013.
[41] Hu, Han, Y. O. N. G. G. A. N. G. Wen, T. Chua and X. U. E. L. O. N. G. Li,
"Towards Scalable Systems for Big Data Analytics: A Technology Tutorial."
IEEE Access, Vol. 2, pp. 652 – 687, 2014.
[42] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "A Dynamic Key Length based
Approach for Real-Time Security Verification of Big Sensing Data Stream."
187
in 16th International Conference on Web Information System Engineering ,
pp. 93-108, 2015.
[43] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "DPBSV- An Efficient and
Secure Scheme for Big Sensing Data Stream." in 14th IEEE International
Conference on Trust, Security and Privacy in Computing and
Communications (IEEE TrustCom-15), pp. 246-253, 2015.
[44] C. A. Ardagna, R. Asal, E. Damiani and Q. Hieu Vu, "From security to
assurance in the cloud: a survey." ACM Computing Surveys (CSUR), vol. 48,
no. 1, 2015..
[45] Z. Xiao and Y. Xiao, "Security and privacy in cloud computing." IEEE
Communications Surveys & Tutorials, vol. 15, no. 2, pp. 843-859, 2013.
[46] S. Subashini and V. Kavitha, "A survey on security issues in service delivery
models of cloud computing." Journal of network and computer
applications, vol. 34, no. 1, pp. 1-11, 2011.
[47] C. Modi, D. Patel, B. Borisaniya, A. Patel and M. Rajarajan, "A survey on
security issues and solutions at different layers of Cloud computing." The
Journal of Supercomputing, vol. 63, no. 2, pp. 561-592, 2013.
[48] C. Rong, S. T. Nguyen and M. G. Jaatun, "Beyond lightning: A survey on
security challenges in cloud computing." Computers & Electrical
Engineering, vol. 39, no. 1, pp.47-54, 2013.
[49] W. Huang, A. Ganjali, B. H. Kim, S. Oh and D. Lie, "The State of Public
Infrastructure-as-a-Service Cloud Security." ACM Computing Surveys
(CSUR), vol. 47, no. 4, 2015.
[50] C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel and M. Rajarajan, "A
survey of intrusion detection techniques in cloud." Journal of Network and
Computer Applications, vol. 36, no. 1, pp. 42-57, 2013.
[51] Niroshinie Fernando, Seng W. Loke and Wenny Rahayu, "Mobile cloud
computing: A survey." Future Generation Computer Systems, vol. 29, no. 1,
pp. 84-106, 2013.
[52] X. Chen, K. Makki, K. Yen and N. Pissinou, "Sensor network security: a
survey." IEEE Communications Surveys & Tutorials, vol. 11, no. 2, pp. 52-
73, 2009.
188
[53] Y. Zhou, Y. Fang and Y. Zhang, "Securing wireless sensor networks: a
survey." IEEE Communications Surveys & Tutorials, vol. 10, no. 3, pp. 6-28,
2008.
[54] T. Winkler and B. Rinner, "Security and privacy protection in visual sensor
networks: A survey." ACM Computing Surveys (CSUR), vol. 47, no. 1, 2014.
[55] D. Djenouri, L. Khelladi and N. Badache, "A survey of security issues in
mobile ad hoc networks." IEEE communications surveys, vol. 7, no. 4, pp. 2-
28, 2005.
[56] L. Abusalah, A. Khokhar and M. Guizani, "A survey of secure mobile ad hoc
routing protocols." IEEE Communications Surveys & Tutorials, vol. 10, no.
4, 78-93, 2008.
[57] D. Chopra, H. Schulzrinne, E. Marocco and E. Ivov, "Peer-to-peer overlays
for real-time communication: security issues and solutions." IEEE
Communications Surveys & Tutorials, vol. 11, no. 1, pp. 4-12, 2009.
[58] E. G. AbdAllah, H. S. Hassanein and M. Zulkernine, "A Survey of Security
Attacks in Information-Centric Networking." IEEE Communications Surveys
& Tutorials, vol. 17, no. 3, pp. 1441-1454, 2015.
[59] J. Cao, M. Ma, H. Li, Y. Zhang and Z. Luo, "A survey on security aspects
for LTE and LTE-A networks." IEEE Communications Surveys & Tutorials,
vol. 16, no. 1, pp. 283-302, 2014.
[60] C. M. Medaglia and A. Serbanati, "An overview of privacy and security
issues in the internet of things." In The Internet of Things, pp. 389-395.
Springer New York, 2010.
[61] R. Weber, "Internet of Things–New security and privacy challenges."
Computer Law & Security Review, vol. 26, no. 1, pp. 23-30, 2010.
[62] Kai Zhao and Lina Ge, "A survey on the internet of things security." In 9th
International Conference on Computational Intelligence and Security (CIS),
pp. 663-667. IEEE, 2013.
[63] J. Sun and C. Chen, "Initial Study on IOT Security." Communications
Technology, vol. 7, 2012.
[64] H. Kopetz, "Internet of things." In Real-time systems, pp. 307-323. Springer
US, 2011.
189
[65] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari and M. Ayyash,
"Internet of things: A survey on enabling technologies, protocols, and
applications." Communications Surveys & Tutorials, vol. 17, no. 4, pp.
2347-2376, 2015.
[66] S. Li, L. Xu and S. Zhao, "The internet of things: a survey." Information
Systems Frontiers, vol. 17, no. 2, pp. 243-259, 2015.
[67] L. Xu, W. He and S. Li, "Internet of things in industries: a survey." IEEE
Transactions on Industrial Informatics, vol. 10, no. 4, pp. 2233-2243, 2014.
[68] E. Ilie-Zudor, Z. Kemény, F. Blommestein, L. Monostori and A. Meulen, "A
survey of applications and requirements of unique identification systems and
RFID techniques."Computers in Industry, vol. 62, no. 3, pp. 227-252, 2011.
[69] Y. Wang, G. Attebury and B. Ramamurthy, "A survey of security issues in
wireless sensor networks." IEEE Communications Surveys & Tutorials, vol.
8, no. 2, pp. 2-23, 2006.
[70] IEEE Standard for Local and Metropolitan Area Networks—Part 15.4: Low-
Rate Wireless Personal Area Networks (LR-WPANs) Amendment 1: MAC
Sublayer, IEEE Std. 802.15.4e-2012 (Amendment to IEEE Std. 802.15.4-
2011), (2011) 1-225, 2012.
[71] P. Thubert, "Objective function zero for the routing protocol for low-power
and lossy networks (RPL)." (2012). RFC 6550.
[72] C. Bormann, A. P. Castellani and Z. Shelby, "Coap: An application protocol
for billions of tiny internet nodes." IEEE Internet Computing, vol. 16, no. 2,
2012.
[73] T. Zheng, A. Ayadi and X. Jiang, "TCP over 6LoWPAN for industrial
applications: An experimental study." In 4th IFIP International Conference
on New Technologies, Mobility and Security (NTMS), pp. 1-4. IEEE, 2011.
[74] D. Conzon, T. Bolognesi, P. Brizzi, A. Lotito, R. Tomasi and M. A. Spirito,
"The virtus middleware: An xmpp based architecture for secure iot
communications." In 21st International Conference on Computer
Communications and Networks (ICCCN), pp. 1-6. IEEE, 2012.
[75] S. Sicari, A. Rizzardi, L. Grieco and A. Coen-Porisini, "Security, privacy and
trust in Internet of Things: The road ahead." Computer Networks, vol. 76, pp.
146-164, 2015.
190
[76] S. Bandyopadhyay, M. Sengupta, S. Maiti and S. Dutta, "A survey of
middleware for internet of things." In Recent Trends in Wireless and Mobile
Networks, pp. 288-296. Springer Berlin Heidelberg, 2011.
[77] A. Gómez-Goiri, P. Orduña, J. Diego and D. López-de-Ipiña, "Otsopack:
lightweight semantic framework for interoperable ambient intelligence
applications." Computers in Human Behavior, vol. 30, pp. 460-467, 2014.
[78] M. Isa, N. Mohamed, H. Hashim, S. Adnan, J. A. Manan and R. Mahmod,
"A lightweight and secure TFTP protocol for smart environment." In IEEE
Symposium on Computer Applications and Industrial Electronics (ISCAIE),
pp. 302-306. IEEE, 2012.
[79] C. Liu, B. Yang and T. Liu, "Efficient naming, addressing and profile
services in Internet-of-Things sensory environments." Ad Hoc Networks, vol.
18, pp. 85-101, 2014.
[80] G. Colistra, V. Pilloni and L. Atzori, "The problem of task allocation in the
internet of things and the consensus-based approach." Computer Networks,
vol. 73, pp. 98-111, 2014.
[81] G. Fox, H. Gadgil, S. Pallickara, M. Pierce, R. L. Grossman, Y. Gu, D.
Hanley and X. Hong, "High performance data streaming in service
architecture." Technical Report, Indiana University and University of Illinois
at Chicago, 2004.
[82] R. V. Nehme, H. Lim, E. Bertino and E. Rundensteiner, "StreamShield: a
stream-centric approach towards security and privacy in data stream
environments." In ACM SIGMOD International Conference on Management
of data, pp. 1027-1030. ACM, 2009.
[83] P. Chen, X. Wang, Y. Wu, J. Su and H. Zhou, "POSTER: iPKI: Identity-
based Private Key Infrastructure for Securing BGP Protocol." In 22nd ACM
SIGSAC Conference on Computer and Communications Security, pp. 1632-
1634. ACM, 2015.
[84] S. Laury and S. Wallace, "Confidentiality and taxpayer compliance."
National Tax Journal, pp. 427-438, 2005.
[85] G. Bella and L. Paulson, "Kerberos version IV: Inductive analysis of the
secrecy goals." In Computer Security—ESORICS 98, pp. 361-375. Springer
Berlin Heidelberg, 1998.
191
[86] Dr G. Padmavathi and Mrs Shanmugapriya, "A survey of attacks, security
mechanisms and challenges in wireless sensor networks." International
Journal of Computer Science and Information Security, vol. 4, no. 1&2, pp.
1-9, 2009.
[87] W. Lou, W. Liu and Y. Fang, "SPREAD: Enhancing data confidentiality in
mobile ad hoc networks." In INFOCOM 2004. Twenty-third AnnualJoint
Conference of the IEEE Computer and Communications Societies, vol. 4, pp.
2404-2413. IEEE, 2004.
[88] Y. Jian, S. Chen, Z. Zhang and L. Zhang, "Protecting receiver-location
privacy in wireless sensor networks." In INFOCOM 2007. 26th IEEE
International Conference on Computer Communications, pp. 1955-1963.
IEEE, 2007.
[89] Y. Zhang, W. Liu, W. Lou and Y. Fang, "MASK: anonymous on-demand
routing in mobile ad hoc networks." IEEE Transactions on Wireless
Communications, vol. 5, no. 9, pp. 2376-2385, 2006.
[90] M. Saini, P. Atrey, S. Mehrotra and M. Kankanhalli, "Adaptive
transformation for robust privacy protection in video surveillance." Advances
in Multimedia, 2012.
[91] A. Perrig, R. Szewczyk, J. Tygar, V. Wen and D. E. Culler, "SPINS:
Security protocols for sensor networks." Wireless networks, vol. 8, no. 5, pp.
521-534, 2002.
[92] J. Luo, P. Papadimitratos and J. Hubaux, "GossiCrypt: wireless sensor
network data confidentiality against parasitic adversaries." In 5th Annual
IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc
Communications and Networks, 2008. SECON'08., pp. 441-450. IEEE, 2008.
[93] K. Chan and S. Chan, "Key management approaches to offer data
confidentiality for secure multicast." IEEE Network, vol. 17, no. 5, pp. 30-
39, 2003.
[94] S. Jiang, N. Vaidya and W. Zhao, "Preventing traffic analysis in packet radio
networks." In DARPA Information Survivability Conference &
Exposition II, 2001. DISCEX'01. Proceedings, vol. 2, pp. 153-158. IEEE,
2001.
192
[95] C. Karlof and D. Wagner, "Secure routing in wireless sensor networks:
Attacks and countermeasures." Ad hoc networks, vol 1, no. 2, pp. 293-315,
2003.
[96] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, "A survey on
sensor networks." IEEECommunications magazine, vol. 40, no. 8, pp. 102-
114, 2002.
[97] J. Newsome, E. Shi, D. Song and A. Perrig, "The sybil attack in sensor
networks: analysis & defenses." In 3rd international symposium on
Information processing in sensor networks, pp. 259-268. ACM, 2004.
[98] A. Laszka, M. Felegyhazi and L. Buttyan, "A Survey of Interdependent
Information Security Games." ACM Computing Surveys (CSUR), vol. 47, no.
2, 2014.
[99] A. Boukerche and D. Turgut, "Secure time synchronization protocols for
wireless sensor networks." IEEE Wireless Communications, vol. 14, no. 5,
pp. 64-69, 2007.
[100] X. Du and H. Chen, "Security in wireless sensor networks." IEEE Wireless
Communications, vol. 15, no. 4, pp. 60-66, 2008.
[101] M. Tubaishat, J. Yin, B. Panja and S. Madria, "A secure hierarchical model
for sensor network." ACM Sigmod Record , vol. 33, no. 1, pp. 7-13, 2004.
[102] H. Deng, W. Li and D. Agrawal, "Routing security in wireless ad hoc
networks." IEEE Communications Magazine, vol. 40, no. 10, pp. 70-75,
2002.
[103] S. Demurjian, H. Wang and L. Yan, "Implementation of Mandatory Access
Control in Role-based Security System with Oracle Snapshot Skill." (2001).
[104] S. Osborn, R. Sandhu and Q. Munawer, "Configuring role-based access
control to enforce mandatory and discretionary access control policies."
ACM Transactions on Information and System Security (TISSEC), vol. 3, no.
2, pp. 85–106, 2000.
[105] E. Bertino, P. Bonatti and E. Ferrari, "TRBAC: A temporal role-based access
control model." ACM Transactions on Information and System Security
(TISSEC), vol. 4, no. 3, pp. 191-233, 2001.
[106] G. Brose, "Access control management in distributed object systems." PhD
diss., Freie Universität Berlin, 2001.
193
[107] S. Oh and S. Park, "Task–role-based access control model." Information
systems, vol. 28, no. 6, pp 533-562, 2003.
[108] M. I. Sarfraz, M. Nabeel, J. Cao and E. Bertino, "DBMask: fine-grained
access control on encrypted relational databases." In 5th ACM Conference on
Data and Application Security and Privacy, pp. 1-11. ACM, 2015.
[109] A. Gupta, M. Kirkpatrick and E. Bertino, "A formal proximity model for
RBAC systems." Computers & Security, vol. 41, pp. 52-67, 2014.
[110] M. Nabeel and E. Bertino, "Fine-grained encryption-based access control for
big data." In 13th Annual Information Security Symposium, p. 3. CERIAS-
Purdue University, 2012.
[111] A. Kamra and E. Bertino, "Privilege states based access control for fine-
grained intrusion response." In Recent Advances in Intrusion Detection, pp.
402-421. Springer Berlin Heidelberg, 2010.
[112] Q. Ni, E. Bertino and J. Lobo, "Risk-based access control systems built on
fuzzy inferences." In 5th ACM Symposium on Information, Computer and
Communications Security, pp. 250-260. ACM, 2010.
[113] E. Bertino, C. Bettini and P. Samarati, "A discretionary access control model
with temporal authorizations." In workshop on New security paradigms, pp.
102-107. IEEE Computer Society Press, 1994.
[114] R. Sandhu, "Lattice-based enforcement of chinese walls." Computers &
Security, vol. 11, no. 8, pp. 753-763, 192.
[115] B. Carminati, E. Ferrari, J. Cao and K. Tan, "A framework to enforce access
control over data streams." ACM Transactions on Information and System
Security (TISSEC), vol. 13, no. 3, 2010.
[116] J. Deng, R. Han and S. Mishra, "INSENS: Intrusion-tolerant routing for
wireless sensor networks." Computer Communications, vol. 29, no. 2, pp.
216-230, 2006.
[117] I. Kaddoura and S. Abdul-Nabi, "On formula to compute primes and the nth
prime." Applied Mathematical science, vol. 6, no.76, pp.3751-3757, 2012.
[118] Contiki operating system official website, http://www.contiki-os.org/
[119] Scyther, [Online] http://www.cs.ox.ac.uk/people/cas.cremers/scyther/
194
[120] M. Pistoia, N. Nagaratnam, L. Koved and A. Nadalin, "Enterprise Java 2
Security: Building Secure and Robust J2EE Applications." Addison Wesley
Longman Publishing Co., Inc., 2004.
[121] Matlab, [Online] http://au.mathworks.com/products/matlab/
[122] N. Penchalaiah and R. Seshadri, "Effective Comparison and evaluation of
DES and Rijndael Algorithm (AES)." International Journal of Computer
Science and Engineering, vol. 2, no. 5, pp. 1641-1645, 2010.
[123] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A.
Byers. Big data: The next frontier for innovation, competition, and
productivity, 2011
[124] M. Bahrami and M. Singhal, "The Role of Cloud Computing Architecture in
Big Data." In Information Granularity, Big Data, and Computational
Intelligence, Springer International Publishing, pp. 275-295, 2015.
[125] A. McAfee, E. Brynjolfsson, T. Davenport, D. Patil and D. Barton, "Big
data." The management revolution. Harvard Bus Rev, vol. 90, no. 10, pp. 61-
67, 2012.
[126] A. Deshpande, Z. Ives and V. Raman, "Adaptive query processing."
Foundations and Trends in Databases, vol. 1, no. 1, pp. 1-140, 2007.
[127] T. Sutherland, B. Liu, M. Jbantova and E. Rundensteiner, "D-cape:
distributed and self-tuned continuous query processing." In 14th ACM
international conference on Information and knowledge management, ACM,
pp. 217-218, 2005.
[128] R. Ranjan, "Streaming Big Data Processing in Datacenter Clouds." IEEE
Cloud Computing. Vol. 1, no. 1, pp. 78-83, 2014.
[129] J. Walters, Z. Liang, W. Shi and V. Chaudhary, "Wireless sensor network
security: A survey." Security in distributed, grid, mobile, and pervasive
computing. Vol. 2007, no. 1, 2007.
[130] D. Carman, P. Kruus and B. Matt, "Constraints and approaches for
distributed sensor network security." Technical Report 00-010, NAI Labs,
Network Associates, Inc., Glenwood, MD, 2000.
[131] L. Eschenauer and V. Gligor, "A key-management scheme for distributed
sensor networks." In 9th ACM conference on Computer and communications
security, ACM, pp. 41-47, 2002.
195
[132] J. Daemen and V. Rijmen, "AES the advanced encryption standard." In The
Design of Rijndael, 2002.
[133] K. Akkaya and M. Younis, "An energy-aware QoS routing protocol for
wireless sensor networks." In 23rd International Conference on
Proceedings Distributed Computing Systems Workshops, pp. 710-715, 2003.
[134] S. Nepal, J. Zic, D. Liu and J. Jang, "A mobile and portable trusted
computing platform." In EURASIP Journal on Wireless Communications
and Networking, vol. 1, pp. 1-19, 2011.
[135] J. Kulik, W. Heinzelman and H. Balakrishnan, "Negotiation-based protocols
for disseminating information in wireless sensor networks." In Wireless
networks, vol. 8. No. 2/3, pp. 169-185, 2002.
[136] H-S. Lim, Y-S. Moon and E. Bertino, "Provenance-based trustworthiness
assessment in sensor networks." In Seventh International Workshop on Data
Management for Sensor Networks, pp. 2-7, 2010.
[137] S. Sultana, G. Ghinita, E. Bertino and M. Shehab, "A lightweight secure
provenance scheme for wireless sensor networks." In 18th International
Conference on Parallel and Distributed Systems (ICPADS), pp. 101-108,
2012.
[138] G. Selimis, L. Huang, F. Massé, I. Tsekoura, M. Ashouei, F. Catthoor, J.
Huisken, J. Stuyt, G. Dolmans, J. Penders, and H. De Groot. "A lightweight
security scheme for wireless body area networks: design, energy evaluation
and proposed microprocessor design." Journal of medical systems, vol. 35,
no. 5, pp. 1289-1298, 2011.
[139] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente and P.
Valduriez, "Streamcloud: An elastic and scalable data streaming
system." IEEE Transactions on Parallel and Distributed Systems, vol. 23, no.
12, pp. 2351-2365, 2012.
[140] R. A. Shaikh, S. Lee, M. AU Khan and Y. J. Song, "LSec: lightweight
security protocol for distributed wireless sensor network." In IFIP
International Conference on Personal Wireless Communications, pp. 367-
377. Springer Berlin Heidelberg, 2006.
[141] N. Tsikoudis, A. Papadogiannakis and E. P. Markatos, "LEoNIDS: a Low-
latency and Energy-efficient Network-level Intrusion Detection
196
System." IEEE Transactions on Emerging Topics in Computing, vol. 4, no. 1,
pp.142-155, 2016.
[142] M. Roesch,"Snort: Lightweight Intrusion Detection for Networks." LISA, vol.
99, no. 1, pp. 229-238. 1999.
[143] W. Lee and S. J. Stolfo, "Data Mining Approaches for Intrusion Detection."
In Usenix security. 1998.
[144] Y. Xie, D. Feng, Z. Tan and J. Zhou, "Unifying intrusion detection and
forensic analysis via provenance awareness." Future Generation Computer
Systems, vol. 61, pp.26-36, 2016.
[145] A. S. Wander, N. Gura, H. Eberle, V. Gupta and S. C. Shantz, "Energy
analysis of public-key cryptography for wireless sensor networks." In Third
IEEE international conference on pervasive computing and communications,
pp. 324-328, 2005.
[146] T. Park and K. G. Shin, "LiSP: A lightweight security protocol for wireless
sensor networks." ACM Transactions on Embedded Computing Systems
(TECS), vol. 3, no. 3, pp. 634-660, 2004.
[147] A. Bogdanov, L. Knudsen, G. Leander, C. Paar, A. Poschmann, M. Robshaw,
Y. Seurin, and C. Vikkelsoe, "PRESENT: An ultra-lightweight block
cipher." In International Workshop on Cryptographic Hardware and
Embedded Systems, pp. 450-466, 2007.
[148] T. A. Zia and A. Y. Zomaya, "A Lightweight Security Framework for
Wireless Sensor Networks." JoWUA, vol. 2, no. 3, pp. 53-73, 2011.
[149] K. Van Laerhoven, "Combining the self-organizing map and k-means
clustering for on-line classification of sensor data." In International
Conference on Artificial Neural Networks, pp. 464-469. Springer Berlin
Heidelberg, 2001.
[150] P. Ferreira and P. Alves, Distributed context-aware systems. Springer, 2014.
DOI 10.1007/978-3-319-04882-6
[151] D. Ganesan, D. Estrin and J. Heidemann, "DIMENSIONS: Why do we need
a new data handling architecture for sensor networks?." ACM SIGCOMM
Computer Communication Review, vol. 33, no. 1. pp. 143-148, 2003.
197
[152] R. M. Metcalfe and D. R. Boggs, "Ethernet: distributed packet switching for
local computer networks." Communications of the ACM, vol. 19, no. 7, pp.
395-404, 1976.
[153] Crypto++ Benchmarks. Available:
http://www.cryptopp.com/benchmarks.html (accessed on: 30.07.2016)
[154] B. Carminati, E. Ferrari and K. Tan, "Specifying access control policies on
data streams." In 12th International Conference on Database Systems for
Advanced Applications (DASFAA ’07). Springer, Berlin, pp. 410–421, 2007.
[155] H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, M. Cherniack, C.
Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, N. and R. Tibbetts,
"Retrospective on aurora." The VLDB Journal, vol. 13, no. 4, pp. 370-383,
2004.
[156] R. Adaikkalavan and T. Perez, "Secure shared continuous query processing."
In ACM Symposium on Applied Computing, pp. 1000-1005, 2011.
[157] R. Adaikkalavan, X. Xie and I. Ray, "Multilevel secure data stream
processing: Architecture and implementation." Journal of Computer
Security, vol. 20, no. 5, pp. 547-581, 2012.
[158] J. Cao, B. Carminati, E. Ferrari and K.-L. Tan, "Acstream: Enforcing access
control over data streams." In IEEE 25th International Conference on Data
Engineering, pp. 1495-1498, 2009.
[159] W. Lindner and J. Meier, "Securing the borealis data stream engine." In 10th
International Database Engineering and Applications Symposium
(IDEAS'06), pp. 137-147, 2006.
[160] X. Xie, I. Ray, R. Adaikkalavan and R. Gamble, "Information flow control
for stream processing in clouds." In 18th ACM symposium on Access control
models and technologies, pp. 89-100. ACM, 2013.
[161] R. V. Nehme, E. A. Rundensteiner and E. Bertino, "A security punctuation
framework for enforcing access control on streaming data." In IEEE 24th
International Conference on Data Engineering (ICDE), pp. 406-415, 2008.
[162] D. F. C. Brewer and M. J. Nash, "The Chinese Wall Security Policy." In
IEEE Symposium on Security and Privacy (S & P), pp. 206–214, 1989.
[163] D. E. Bell and L. J. LaPadula, "Secure Computer System: Unified Exposition
and MULTICS Interpretation." Technical Report MTR-2997 Rev. 1 and
198
ESD-TR-75-306, rev. 1, The MITRE Corporation, Bedford, MA 01730,
1976.
[164] I. Ray, S. Madria, and M. Linderman, "Query Plan Execution in a
Heterogeneous Stream Management System for Situational Awareness."
In IEEE 31st Symposium on Reliable Distributed Systems (SRDS), 2012, pp.
424-429. IEEE, 2012.
[165] S. Chakravarthy, and Q. Jiang, "Stream data processing: a quality of service
perspective: modeling, scheduling, load shedding, and complex event
processing." Springer Science & Business Media, Vol. 36, 2009.
[166] UCI Machine Learning Repository, [Online]. Available:
http://archive.ics.uci.edu/ml/datasets/
[167] D. Puthal, S. Nepal, R. Ranjan and J. Chen, " A Secure Big Data Stream
Analytics Framework for Disaster Management on the Cloud." in 18th IEEE
International Conferences on High Performance Computing and
Communications (HPCC 2016), pp. 1218-1225, 2016.