ABBA: A quasi-deterministic Intrusion Detection System for ...

8
1 ABBA: A quasi-deterministic Intrusion Detection System for the Internet of Things Dr. Raoul Guiazon Faculty of Engineering and Physical Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom. Abstract—An increasing amount of processes are becoming automated for increased efficiency and safety. Common examples are in automotive, industrial control systems or healthcare. Automation usually relies on a network of sensors to provide key data to control systems. One potential risk to these automated processes comes from fraudulent data injected in the network by malicious actors. In this article we propose a new mechanism of data tampering detection that does not depend on secret crypto- graphic keys - that can be lost or stolen - or accurate modelling of the network as is the case with existing machine learning based techniques. We define and analyse the mathematical structure of the proposed technique called ABBA and propose an algorithm for implementation. Index terms— Internet of Things, IoT, Cyber security, Intrusion detection system, Autonomous systems. I. I NTRODUCTION A vast amount of our modern infrastructure relies on a net- work of sensors and actuators to automatically perform various tasks. The purpose of this automation is generally to increase efficiency and decrease waste and human risk. This has typically been the case for industrial control systems (ICS) that often form the core of critical national infrastructures (CNI). In recent years the topology of the networks underpinning these infrastructures has changed, moving from somewhat isolated networks using dedicated and often proprietary technologies to transport and process data to more standard internet connected devices that often rely on cloud processing to make critical decisions (figure I). Although this new topology increases the capabilities and flexibility of these networks by allowing more interoperability between systems and a better understanding and monitoring of assets and processes, it also opens up more vulnerabilities for cyber criminals to enter networks. Cyber attacks on CNIs are not theoretical, in 2015 the world witnessed the first known power outage caused by a malicious cyber attack that happened when utility companies in Ukraine were hit by the BlackEnergy malware. In February 2021 a water treatment facility in Florida was attacked, the attacker remotely increased the levels of sodium hydroxide content from 100 parts per million to 11,100ppm putting at risk 15000 Thanks to the Royal Academy of Engineering and the Office of the Chief Science Adviser for National Security for supporting this work under the UK Intelligence Community Postdoctoral Fellowship Programme. Thank you also to Dr. Daphne Tuncer for her help in writing this article. Contact: [email protected] people relying on this plant for clean water. More recently on May 7 2021, the US issued emergency legislation after Colonial Pipeline which carries almost half of the East Coast supply of diesel petrol and jet fuel was hit by a ransomware cyber-attack. In the Siemens report ”Caught in the Crosshairs: Are utilities Keeping Up with the Industrial Cyber Threat?” it is found that 30% of attacks on OT (Operational Technologies) are not detected. Given the complexity of the systems utilised to automate our infrastructures, cars and cities with often millions of lines of codes running on top of various hardware, it is impossible to guarantee that no vulnerabilities will ever be found and be exploited by adversaries. Cyber criminals exploit multiple routes into their target sys- tems, from phishing attacks to software vulnerabilities. Bad practices from a supplier also can have important repercus- sions further down the chain. For example, hard-coded admin credentials on a device or a key server breach at a device manufacturer can enable an attack on the end user’s network. The weakest link will be the point of entry into the network. The purpose of this work is to help secure the fleet of small devices that often relay sensing data to a network controller to be processed an relied upon for critical decision making, this could be for the Advance driver-assistance system (ADAS) of a vehicle or an ICS. We consider that the aim of the attacker is to modify the behaviour of the automated system under attack by feeding fraudulent data to the decision making unit. In essence, we are interested in developing a method to detect such an attack even in the case where the attacker has a copy of the secret key used by a legitimate device to authenticate with the network. Multiple approaches are often utilised to protect digital networks from cyber threats and all participate in making those infrastructures safer. The most common layer of protection is often based on cryptographic techniques to ensure confidentiality, integrity or authenticity of the traffic [1]. This layer requires the generation and distribution of secret keys and the management of these keys over the life of a device. Often enough this first layer is inexistent, poorly implemented [2] or become obsolete due to new vulnerabilities found in key protocols [3]. Other methods focus on the physical layer of the communication stack using channel state estimation to generate secret keys [4] or using jamming to reduce the signal quality of an eavesdropper [5, 6]. These methods are not adapted for situations where arXiv:2108.03942v1 [cs.CR] 9 Aug 2021

Transcript of ABBA: A quasi-deterministic Intrusion Detection System for ...

Page 1: ABBA: A quasi-deterministic Intrusion Detection System for ...

1

ABBA: A quasi-deterministic Intrusion DetectionSystem for the Internet of Things

Dr. Raoul GuiazonFaculty of Engineering and Physical Sciences,

University of Leeds,Leeds LS2 9JT, United Kingdom.

Abstract—An increasing amount of processes are becomingautomated for increased efficiency and safety. Common examplesare in automotive, industrial control systems or healthcare.Automation usually relies on a network of sensors to provide keydata to control systems. One potential risk to these automatedprocesses comes from fraudulent data injected in the network bymalicious actors. In this article we propose a new mechanism ofdata tampering detection that does not depend on secret crypto-graphic keys - that can be lost or stolen - or accurate modellingof the network as is the case with existing machine learning basedtechniques. We define and analyse the mathematical structure ofthe proposed technique called ABBA and propose an algorithmfor implementation.

Index terms— Internet of Things, IoT, Cyber security,Intrusion detection system, Autonomous systems.

I. INTRODUCTION

A vast amount of our modern infrastructure relies on a net-work of sensors and actuators to automatically perform varioustasks. The purpose of this automation is generally to increaseefficiency and decrease waste and human risk. This hastypically been the case for industrial control systems (ICS) thatoften form the core of critical national infrastructures (CNI). Inrecent years the topology of the networks underpinning theseinfrastructures has changed, moving from somewhat isolatednetworks using dedicated and often proprietary technologies totransport and process data to more standard internet connecteddevices that often rely on cloud processing to make criticaldecisions (figure I).Although this new topology increases the capabilities andflexibility of these networks by allowing more interoperabilitybetween systems and a better understanding and monitoringof assets and processes, it also opens up more vulnerabilitiesfor cyber criminals to enter networks.Cyber attacks on CNIs are not theoretical, in 2015 the worldwitnessed the first known power outage caused by a maliciouscyber attack that happened when utility companies in Ukrainewere hit by the BlackEnergy malware. In February 2021 awater treatment facility in Florida was attacked, the attackerremotely increased the levels of sodium hydroxide contentfrom 100 parts per million to 11,100ppm putting at risk 15000

Thanks to the Royal Academy of Engineering and the Office of the ChiefScience Adviser for National Security for supporting this work under the UKIntelligence Community Postdoctoral Fellowship Programme.

Thank you also to Dr. Daphne Tuncer for her help in writing this article.Contact: [email protected]

people relying on this plant for clean water. More recentlyon May 7 2021, the US issued emergency legislation afterColonial Pipeline which carries almost half of the East Coastsupply of diesel petrol and jet fuel was hit by a ransomwarecyber-attack.In the Siemens report ”Caught in the Crosshairs: Are utilitiesKeeping Up with the Industrial Cyber Threat?” it is foundthat 30% of attacks on OT (Operational Technologies) arenot detected. Given the complexity of the systems utilised toautomate our infrastructures, cars and cities with often millionsof lines of codes running on top of various hardware, it isimpossible to guarantee that no vulnerabilities will ever befound and be exploited by adversaries.Cyber criminals exploit multiple routes into their target sys-tems, from phishing attacks to software vulnerabilities. Badpractices from a supplier also can have important repercus-sions further down the chain. For example, hard-coded admincredentials on a device or a key server breach at a devicemanufacturer can enable an attack on the end user’s network.The weakest link will be the point of entry into the network.The purpose of this work is to help secure the fleet of smalldevices that often relay sensing data to a network controller tobe processed an relied upon for critical decision making, thiscould be for the Advance driver-assistance system (ADAS) ofa vehicle or an ICS. We consider that the aim of the attackeris to modify the behaviour of the automated system underattack by feeding fraudulent data to the decision making unit.In essence, we are interested in developing a method to detectsuch an attack even in the case where the attacker has a copyof the secret key used by a legitimate device to authenticatewith the network.

Multiple approaches are often utilised to protect digitalnetworks from cyber threats and all participate in makingthose infrastructures safer. The most common layer ofprotection is often based on cryptographic techniques toensure confidentiality, integrity or authenticity of the traffic[1]. This layer requires the generation and distribution ofsecret keys and the management of these keys over thelife of a device. Often enough this first layer is inexistent,poorly implemented [2] or become obsolete due to newvulnerabilities found in key protocols [3]. Other methodsfocus on the physical layer of the communication stack usingchannel state estimation to generate secret keys [4] or usingjamming to reduce the signal quality of an eavesdropper[5, 6]. These methods are not adapted for situations where

arX

iv:2

108.

0394

2v1

[cs

.CR

] 9

Aug

202

1

Page 2: ABBA: A quasi-deterministic Intrusion Detection System for ...

2

Fig. 1. Modern network topology

channel variations are limited or where jamming could affectneighbouring networks.To complement the techniques mentioned before, somesystems also implement intrusion detection systems (IDS).IDS often function at a higher level of abstraction, monitoringaccess permissions and traffic patterns which they compareto a baseline behaviour considered normal for a specificnetwork. Current IDS techniques are often built usingmachine learning techniques or rules based methods [7].With machine learning, the IDS needs reliable training datato learn the “normal patterns” on the sensor network, in thiscase the problem is that defining what “normal” means isa challenge. Any bias in the training data will increase therisk of false positives and false negatives. With rule-basedmethods, attack signatures are listed in the IDS memory toenable it to identify similar attacks in the future. This methodfails against zero-day attacks and requires the IDS to be keptup to date when new attacks are discovered. Moreover, IDSsystems are often checking the traffic from the internet intothe internal network and less so the traffic coming from thesensors.The technique devised in this paper does not rely on traininga model, maintaining an attack signature list or managingsecure keys although these can be complementary. Wecall this method Artificial Behaviour Based Authentication(ABBA), it is a mechanism by which a device or multipledevices generate a pattern in their network to facilitate thedetection of anomalies and intruders on the network. Thepattern created is the artificial behaviour of the networkwhich is built using a time code defined in this document.

In the following sections we build the theoretical frameworkfor a robust Intrusion Detection System (IDS) that monitorsthe communication link from the sensors to the core networkfor signs of cyber attacks and enables early detection ofcyber incidents. We put in place the framework to assess thedetection probability of such IDS and provide an algorithmto implement this technique on a physical system. In thiswork we purposely do not dive into the exact implementationof the technique as this depends on multiple parametersthat can be optimised for each individual application.One such implementation will be described in our github

repository https://github.com/abbaiot. Here we focus ondescribing ABBA and demonstrating its ability to reliablydetect anomalies such as data losses or data injections andtampering due to a third party.The structure of this paper is as follows, section II lays outthe structure of ABBA and the different key elements thatare required to make it work. Section III describes the theorybehind the time encoder that generates the signatures usedto authenticate a device. Section IV introduces the encodingused for ABBA and describes its properties. In section Vwe put together all the pieces of the IDS and describe itsbehaviour under different types of attacks. Section VI presentsthe practical algorithm behind ABBA and finally, section VIIconcludes this paper.

NotationsN is the set of natural numbers.R+ is the set of positive real numbers.{·} represents a set.(h) represents a sequence with elements hn with n ∈ N.(x, y) is a pair.((x, y)) is a sequence of pairs.

II. SYSTEM MODEL

We consider an information source S that produces eventsfrom a finite alphabet X = {x1, ..., xN} according to theprobability distribution PX and two parties Alice and Bobwhere Alice observes the events produced by the informationsource and communicates that information to Bob over an un-trusted communication channel. As is conventional in securityresearch we also refer Eve as the eavesdropper or attacker onthe network.We are not concerned with distortions or losses during trans-missions from Alice to Bob therefore we consider the com-munication channel perfectly error corrected. Only the actionsof Eve are of concern to us.We devise a mechanism with either of the following twodesired properties,

1) Bob can detect malicious actions by Eve with a com-putable detection probability.

2) Alice and Bob can trap Eve in the network for agiven duration during which she has to spend computingresources to remain undetected whilst increasing chancesof her being detected using other techniques.

We want that detection probability and duration to only dependon the specific parameters chosen by Alice and Bob as theyimplement the intrusion detection scheme.

The basic principle of the IDS proposed is described onfigure II, where Alice sends messages to Bob over 2 channels.Channel 1 is the ”high bandwidth” communication channelthat carries the payload and Channel 2 is the ”Low bandwidth”intrusion detection channel (IDC). The reason for this archi-tecture is to decouple the message from Alice to Bob to thesignatures that allow Bob to verify the source of the message.This could be understood as - although not completely accurate- sending a message authentication code for a payload into adifferent channel than the payload. The aim would be for Bob

Page 3: ABBA: A quasi-deterministic Intrusion Detection System for ...

3

Fig. 2. Top level principle

to detect Eve’s actions by matching the signature received onthe IDC with the payload received.The model described above cannot in itself prevent an attackerto infiltrate the network undetected. As they can tamper withboth channels in a way that remains consistent to the receiver.To solve this issue, we design an IDS that relies on only twoassumptions for security that we argue are simple to guaranteeor verify in any specific application.

1) Eve cannot prevent messages on the IDC from reachingBob.

2) Eve cannot delay or speed-up messages transiting on theIDC.

For example, assumption 1 can be easily checked in awireless link by monitoring the noise levels for jammingin the intrusion detection channel between Alice and Bob.A higher level of noise would decrease confidence in theIDS in a way reminiscent of what is used in Quantum keydistribution systems. Assumption 2 depends on the topologyof the network, for example, a wireless link between Alice andBob doesn’t allow for Eve to manipulate signals flight time.This is also applicable to some wired networks if the threatmodelling discounts a Man-in-The-Middle type attack.

The figure II illustrates the internal structure of the systemat both Alice and Bob. transmitter and receiver both rely on alocal clock to implement the intrusion detection algorithm. Wewill assume that both clocks measure time at the same rate -compensating for local drifts would be required in practice butis not the focus of this paper. The key component of interest tous is the ”Time encoder” that we will design in the followingsections of this article. The time encoder produces pulses thatare sent to Bob using a modulation as simple as ON/OFFkeying (OOK) [8, 9] requiring low bandwidth and accessibleto all devices.

III. TIME ENCODER

To describe how the time encoder used in the IDS works,we need an appropriate formalism which we develop in thissection. We will start by defining how we represent informa-tion sources, then describe how one could devise a way tocommunicate the entire information content of an informationsource using an encoding solely based on the passage of time.We will then use our new formalism to define the time encoderthat is core to the intrusion detection system proposed.

A. Complete description of Information Sources

In information theory, an information source S is usuallydescribed using a random variable X with values in a given

set of the same name X . In this work we will limit ourselvesto discrete sets, meaning that there exists a one to onemapping between X and N the set of natural numbers.In a typical experiment with an information source, asequence of events from the set X is produced, for example(x1, ..., xp) where the index of each element represents theposition of the event in the sequence. Each event xi could bethe outcome of a coin toss.Because all our experiments are done with time evolvingin the background, that sequence can always be implicitlyredefined as ((t1, x1), ..., (tp, xp)) where ti are the timeinstants at which these events occurred with the origin oftime that can be set arbitrarily before the experiment. Oftenthis information about time is not useful, for example whenwriting a document, the time at which a key was presseddoesn’t matter only the order and value of the keys do. Toprovide a complete physical description of the source it ishowever important to include time.Once observed, the sequence (t) = (t1, ..., tp) is an increasingfunction from N to R+ the positive real number line andthe sequence (x) = (x1, ..., xp) is a function from N to Xwith no particular properties besides that the frequencies ofelements in the sequence will converge to their probabilitiesas defined by the random variable X when p increases.The sequence ((t, x)) is thus a function from N to R+ ×X .

If we were able to observe the entire sequence of eventsproduced by our information source, we would build a po-tentially infinite sequence (t, x) = ((t1, x1), ..., (tp, xp), ...) atwhich point the information source would have been describedentirely. In this document we define the information source tobe that sequence S = ((t, x)).

B. A time-based encoding

The foundation of modern digital communications is theencoding of information as binary digits. Hence, before in-formation from our source S is transmitted, it is translatedinto a sequence of bits by a data encoder that are then usedto modulate a carrier signal and sent onto the communicationchannel.Our objective is to build an intrusion detection mechanism thatrequires minimum bandwidth utilisation. To achieve this goalwe define a new encoding based on the passage of time atboth transmitter and receiver. The code generated using thatencoding can be used to modulate a carrier using ON-OFFkeying to transmit data to Bob.The main idea behind our time encoding is to define charactersas time intervals of a given duration starting from a fixedorigin in time. As an analogy, let’s say we could representtime intervals as pieces of strings of various lengths, wecould then use a substitution mechanism as described in figure4 to encode text written in english. However, this simplesubstitution mechanism wouldn’t work with time intervals ifa single origin of time was defined for all letters becauserepetitions such as with the letter ”l” in the word ”Hello”would not be possible to represent and information about theorder of these letters would be lost too. In the example shown

Page 4: ABBA: A quasi-deterministic Intrusion Detection System for ...

4

Fig. 3. Internal architecture of the IDS.

Fig. 4. This figure shows how the word ”Hello” can be written in pieces ofstrings of various length

Fig. 5. Projection of the different words onto a single dimension

in figure 4 we represent every letter independently using astring with variable length for every letter of the word ”Hello”.Using a single dimension and marking the different lengths onthe one string, the same word would look like figure 5 whereinformation about the order of the strings and their count islost.

In the next section we show the existence of a time-codethat preserves all information about the message transmitted.

C. Information preserving time-code

Before we take a look at our information source, let’s firstwork out how we could define a natural time encoding of anypositive real number as a time interval.If we denote T the set of all time intervals in time units. Theencoding of any positive real number is given by the one toone mapping φ : R+ → T . We will denote elements of Twith the greek letter τ .This means that for example, the number 10.234 is representedby a segment of time τ10.234 of a fixed duration in time units(whatever unit is chosen).With this encoding one can represent numbers using chunksof time, for example, numbers from 1 to 5 are τ1, τ2, τ3, τ4, τ5.Now let’s go back to the complete description of our informa-tion source by the sequence (t, x) = ((t1, x1), ..., (tp, xp), ...),and discuss why this sequence can be mapped to a unique

sequence of time intervals (τ) = (τ1, ..., τp, ...).The space R+ × N can be mapped one to one with R+ andwe have shown a one to one mapping φ between R+ and T .For example the following map f : R+×N→ R+ defined by

f(t, n) = n+1

1 + expt(1)

With the inverse being

f−1(y) =

(ln

(1

y − floor(y)− 1

), f loor(y)

)(2)

We denote Θ the one to one mapping Θ : (R+×X)→ T . Inwhich case

(τ) = Θ ((t, x)) (3)

Now we realise that the sequence ((t, x)) =((t1, x1), ..., (tp, xp), ...) doesn’t need to be written asan ordered list as the time information is already containedwithin each element. Instead, it can be represented as aset S = {(t1, x1), ..., (tp, xp), ...} containing every element.Similarly, the time sequence (τ) = (τ1, ..., τp, ...) can berepresented by the set T = {τ1, ..., τp, ...}. The set T isa codeword of time intervals that uniquely describes theinformation source S.

Note that we can define a superset S of sets like S whereeach element represents a possible sequence of events withelements in X . S is a source of information sources andgenerates all possible sequences of events. Θ can then be usedto map each element of S to elements of the superset Ψ oftime sequences.The mapping Θ is the dictionary used to describe elements ofS in terms of codewords in Ψ.We can now describe an information source as an element ofS or equivalently as a codeword in Ψ.

IV. COMMUNICATION USING TIME-CODE

Communicating using the dictionary Θ described in pre-vious sections is not practical because it requires Alice toknow the entire history of the information source S to devisea codeword T that she can transmit to Bob. If she was able todo that then, a simple thing for Alice to do would be to useOOK modulation to transmit a pulse at the end of every timeinterval in the set T - counting from a time t0 when Aliceand Bob synchronised their clocks. If Bob recorded the timeof detection of pulses from time t0 then he would be able toreconstruct T after a possibly infinite amount of time. Then

Page 5: ABBA: A quasi-deterministic Intrusion Detection System for ...

5

using Θ−1 he would be able to translate from Ψ back to S.Because Alice and Bob can only move forward in time andusually have a limited lifetime, they need a coding schemewhere the instants at which pulses are produced only dependon events that happened in their past and decoding can bedone on the fly.In the next section, we will build a causal time-code thatAlice can use to convert her information source output intotime intervals in a way that doesn’t require her to know thefuture states of the source. The cost of that will be Bob’suncertainty about the information sent by Alice. It is becauseof that uncertainty that the IDS we propose is probabilistic innature although, with a detection probability that approaches1 in some cases as time increases. This will be discussed insection V.

A. A causal time-code

The aim of this section is to build a practical encodingthat both Alice and Bob can use to communicate enoughinformation about the source to detect tampering using theintrusion detection channel.Let’s define a family of sequences (gs) (with values in R+

and s ∈ R+) and the family of functions Ox : R+ → R+

with x ∈ X .

The encoding starts with two initial parameters, a numbers0 and time origin t0. Then the set T is created based on((t, x)) = ((t1, x1), ..., (tp, xp), ...) as follows.

1) Define the sequence (s) with first term s0 and ∀n >0, sn = Oxn(sn−1).

2) Define the sequences (τk), s.t τk,0 = tk + gsk,0 andτk,n+1 = τk,n + gsk,n,∀k, n ≥ 0.

3) Now the set T = {τk,n; τk,n < tk+1,∀k, n} is the code-word describing the source S represented by ((t, x)).

It is easy to check that with this encoding, any two differentcodewords T1 and T2 would necessarily come from differentsources S1 and S2. This property is critical for our intrusiondetection system.It is also true that two sources S1 and S2 could be mapped tothe same output T . However, the parameters used in the encod-ing can be tuned to reduce the likelihood of such collisions andmitigate the risk of this being used by a potential attacker. Theproperties needed for these parameters are defined in appendix.In summary the properties of this encoding are:

1) Unlike any general encoding over the space Ψ, eventsproduced by a source at time t do not influence any letterτ of the corresponding codeword such that τ < t.Thismeans that this encoding can be generated by Alice andBob.

2) Neither Alice, Bob nor Eve can predict the next lettersof the codeword sequence (τ) that encodes the sourceS within a big enough time window if the entropy ofthe source is non zero.

3) Transmitting or receiving the sequence (τ) only requiresa simple physical device that can measure time andgenerate or measure a pulse.

4) The pulses produced by this encoding will never collidein time with an event generated by the original source.

5) This encoding is not one to one in both directions.Two different sources can be translated into the samecodeword.

V. INTRUSION DETECTION SYSTEM

In this section we put together the time encoding that wehave developed in the previous section together with the modeldescribed in the system model to define the protocol of ourintrusion detection system. We also describe the different typesof attacks that can be perpetrated by Eve and discuss how thosecan be detected by Bob using the IDS based on the proofs andanalyses given in appendix A and B.

A. IDS protocol

The intrusion detection mechanism that Alice and Bobutilise works as follow:

1) Alice and Bob agree on an initial s0 and t0. In thegeneral case, these do not need to be kept secret.

2) Alice sends messages x ∈ X to Bob using the commu-nication channel and the corresponding τ ∈ T throughthe intrusion detection channel.

3) Bob receives x̂ and τ̂ assumed to be from Alice. He thencomputes τ̃ and compares it to τ̂ .

4) If τ̃ 6= τ̂ then he knows there is an anomaly in the streamcoming from Alice.

In the following sections we look at different types ofattacks that Eve can perform on the communication channeland discuss their implications.

B. Message injection by Eve

In this section, we consider the case where Eve injectsmessages and pulses on both the communication channel andthe IDC.Corollary 1 in appendix A shows that with well chosenparameters for the encoder, if Eve performs this attack, shecan only remain undetected if she continues to send messagesafter the latest transmission of Alice for a time duration fixeddepending solely on the encoding. This traps Eve in a positionwhere she needs to correct her influence on the data streamby adjusting for future messages that Alice could transmit onboth channels, otherwise Eve will be detected.The consequence is that after her first data injection, Evedoesn’t have full control over the messages she is sending toBob to conceal her presence as she needs to account for pastand future messages from Alice. Because Eve will have to sendadditional payload to conceal her presence in the network, thesequence of messages received by Bob could look abnormalbased on some ”normal” metric defined for his expectedmessages. For example if Bob was expecting a sentence, theperturbations of Eve could generate non-sensical sentences atBob’s end. Additionally in this case, the attacker is required tocontinuously spend energy to maintain their presence on thenetwork hidden to the IDS. This is an additional constraint puton them compared to other IDS.

Page 6: ABBA: A quasi-deterministic Intrusion Detection System for ...

6

C. Message tampering and deletion by Eve

Eve could attempt to change the value of a message x onits way to Bob. In this case, Proposition 3 shows us that shewill not be detected only if the new value x′ that she wantsto transmit is so that given the current value of s used byAlice and Bob, Ox(s) = Ox′(s). If the information source ofAlice has non zero entropy then Eve cannot know ahead oftime which x′ to send to match the value x sent by Alice. Thismeans that the detection of Eve depends on the probability thatshe selects a compatible x′ at random. This probability is alsoinfluenced by the functions and sequences used to build thetime-encoder and can be computed given the specific choiceof parameters for the encoder.Eve could also avoid detection if her tampering with the datais randomly covered by further data sent by Alice. We studythis case in Appendix B and show that the parameters of theencoding can be chosen so that the probability of Eve avoidingdetection becomes vanishingly small as time goes on. Thismeans that the actions of Eve will be detected given enoughtime.Another possible attack by Eve can be to remove a messagesent by Alice from the communication channel. If this happensthen corrolary 2 shows that this would also be detected ifthe functions Ox are chosen not to have a fixed point. Andsimilarly to the previous attack, further transmissions by Alicemight delay the detection of Eve however with a probabilitythat can be made to vanish with time (see appendix B).

VI. PRACTICAL IMPLEMENTATION

To implement the IDS proposed in section V, we need analgorithm to build the codewords T out of the sequence ofmessages produced by the source. We create such an algorithmthat we refer to as Time Encoder in the following section.

A. Time Encoder Algorithm

1) Initialisation• Random value → s• 0→ count• 0→ rnd time• Create an empty list τ• False→ interrupt• Create a variable x to store a source message• Set a timer current time counting up from 0• start function ”Generate interrupt”• Go to next step

2) Generate random time• gs(count) + current time→ rnd time # Gener-

ating a candidate τ value.• count+ 1→ count• Go to next step

3) Selecting τ• While (interrupt == False ANDcurrent time < rnd time){Wait}

• IF interrupt == FalseTHEN(τ, rnd time)→ τ # Adding rnd time to the list

τ .Go to step 2ELSEGo to next step

4) Generating s (new seed).• Ox(s)→ s #Note that Ox depends on the messagex.

• 0→ count # Resetting count.• False→ interrupt # Resetting interrupt flag.• Go to step 2

Function : Generate interruptWhile (1) {• Listen to the Source S• IF a new message msg is detected

THENTrue→ interrupt # Changing interrupt flag.msg → x # Storing the message.

}

B. Remarks

The implementation described above is a generic algorithmthat can be implemented choosing specific functions that areadapted to the device and the information source.Moreover, different applications might have different require-ments such as, the speed of detection of an intrusion or theenergy consumption of the encoding that will affect the choiceof the functions and parameters chosen for the encoding.

VII. CONCLUSION

In this work we have developed a new theoreticalframework for an intrusion detection system that detects datatampering and data injection on a communication channel.This framework allows to build a detection mechanism thatdoesn’t rely on prior knowledge of the network behaviouror knowledge of possible attack signatures. It is alsoadvantageous compared to message authentication codesin the quality that it doesn’t require the prior exchangeor management of secret keys for the receiver to detecttampering with the received data. We also detail an algorithmto adapt the IDS defined in this paper to any device.

APPENDIX ACHARACTERISATION OF THE TIME ENCODING

Proposition 1. Let’s consider 2 sequences of pairs ((t, x))and ((t′, x′)) of finite length p1 and p2 respectively. Let’sdenote s and s′ the last terms of the respective sequences(s) and (s′).Assuming that tp1 ≥ tp2 , we show that (τ) = (τ ′)=⇒ ∃P ∈ N s.t ∀n ≥ 1, gs(n) = gs′(n + P ) andtp1

= t′p2+∑P+1

k=0 gs′,k − gs,0

Page 7: ABBA: A quasi-deterministic Intrusion Detection System for ...

7

Proof. (τ) = (τ ′) and tp1≥ tp2

=⇒ ∃M ∈ N such that∀n ≥ 0, τp1,n = τ ′p2,n+M which means

τp1,0 − τ ′p2,M = 0

τp1,1 − τ ′p2,M+1 = 0

· · · = · · ·τp1,M − τ ′p2,2M = 0

τp1,M+1 − τ ′p2,2M+1 = 0

· · · = · · ·

Which we rewrite as

tp1+ gs,0 − τ ′p2,M = 0

tp1+ gs,0 + gs,1 − (τ ′p2,M + gs′,M ) = 0

tp1 + gs,0 + gs,1 + gs,2 − (τ ′p2,M + gs′,M + gs′,M+1) = 0

· · · = · · ·

tp1 +

M−1∑k=0

gs,k − (τ ′p2,M +

2M−1∑k=M

gs′,k) = 0

tp1 +

M∑k=0

gs,k − (τ ′p2,M +

2M∑k=M

gs′,k) = 0

· · · = · · ·

From consecutive pairs of equations listed above, we getthat

gs,1 = gs′,M

gs,2 = gs′,M+1

· · · = · · ·gs,M = gs′,2M

· · · = · · ·

This means ∃P ∈ N s.t ∀n ≥ 1, gs(n) = gs′(n + P ) withP = M − 1.For the second partτ ′p2,M

= t′p2+∑M

k=0 gs′,k and τp1,0 = tp1 + gs,0Knowing that τp1,0 = τ ′p2,M

leads to

tp1= t′p2

+

M∑k=0

gs′,k − gs,0 (4)

Proposition 2. Let’s consider 2 sequences ((t, x)) and((t′, x′)) of length p1 and p2 respectively. Let’s assumethat the last terms s and s′ of the respective sequences(s) and (s′) are equal s = s′.We show that if tp1

> t′p2then (τ) = (τ ′) =⇒

(gs)n>0 is a periodic sequence with period P > 0 andtp1 = t′p2

+∑P+1

k=1 gs,k.

Proof. The proof is similar to that of proposition 1, with thedifference that the equivalent set of equation yields

gs,1 = gs,M

gs,2 = gs,M+1

· · · = · · ·gs,M = gs,2M

· · · = · · ·

We also show below that M > 1.By construction,τp1,0 = tp1 + gs,0τp1,1 = tp1

+ gs,0 + gs,1τ ′p2,0 = t′p2

+ gs,0τ ′p2,1 = t′p2

+ gs,0 + gs,1Because tp1

6= t′p2and gs,0, gs,1 ≥ 0 we know that τp1,0 6=

τ ′p2,0 and τp1,0 6= τ ′p2,1

Similarly, τ ′p2,0 6= τp1,0 and τ ′p2,0 6= τp1,1. This means that theshift between those sequences is strictly greater than 1.This completes the proof that (gs)n>0 is periodic with periodP = M − 1 > 0.And finally, from proposition 1 we get the result

tp1= t′p2

+

P+1∑k=1

gs,k (5)

Definition 1 (Non-maximally correlated). A family of infinitesequence {(gs), s ∈ R+} is said to be non-maximally corre-lated if ∀(sa, sb) with sa 6= sb, @P ∈ N, s.t ∀n > 0, gsa,n =gsb,n+P .

Corollary 1. If the family of sequence (gs) chosen for theencoding is non-maximally correlated then two sequences((t, x)) and ((t′, x′)) of length p1 and p2 respectively suchthat tp1 6= t′p2

, can be mapped to the same sequence (τ) iffs = s′ (with s and s′ the last terms of (s) and (s′)). In thiscase tp1

= t′p2+∑P+1

k=1 gs,k (assuming tp1> t′p2

).

Proof. This result derives from proposition 1 and 2.

Corollary 2. Consider that the functions in the familyOx, x ∈ X do not have a fixed point sx such thatOx(sx) = sx and the family of sequences (gs) are non-maximally correlated.Let’s denote (τ ′) the time code corresponding to the sequence((t′, x′)) constructed by deleting the last term of ((t, x)) and(τ ) the time code corresponding to ((t, x)). We show that(τ) 6= (τ ′).

Proof. Let’s consider the last 2 elements sp−1 and sp of thesequence (s) corresponding to ((t, x)) with its last element(tp, xp).By construction sp = Oxp(sp−1) and because Oxp doesn’thave a fixed point we find that sp−1 6= sp. Corrolary 1 then

Page 8: ABBA: A quasi-deterministic Intrusion Detection System for ...

8

tells us that in this circumstance (τ) 6= (τ ′).

Proposition 3. Consider that the family of sequences (gs)are non-maximally correlated.Let’s denote (τ ′) the time code corresponding to the sequence((t′, x′)) constructed by changing the last term of ((t, x)) from(tp, xp) to (tp, x′p) and (τ ) the time sequence correspondingto ((t, x)). (τ) = (τ ′) implies Oxp

(sp−1) = Ox′p(sp−1).

Proof. We can easily show the negation of that implication ((τ) = (τ ′) and Oxp

(sp−1) 6= Ox′p(sp−1) ) is false.

Oxp(sp−1) 6= Ox′p(sp−1) is sp 6= s′p and means that (gsp) 6=

(gs′p) and due to the property of the family of sequence (gs)we have that (τ) 6= (τ ′).

APPENDIX BDETECTION PROBABILITY OF EVE

Characterisation of the detection probability in the case ofan infinite transmission sequence.Consider an infinite sequence ((t,x)). Let’s assume that thedth element has been changed from (td, xd) to (td, x

′d) or

(td, ∅) (deleted). We want to characterise the probability ofthis event not being detected by the receiver. We’ll focus onthe case where xd is changed to x′d but the case of deletioncould be treated similarly adjusting indices accordingly.

We have shown previously that if we truncate the originaland the modified sequences after the dth element, then theywould both be mapped onto different time codes (τ) 6= (τ ′).Even though (τ) 6= (τ ′) it is possible that ∃M such thatτi = τ ′i , ∀i ≤M and τM > td. Meaning that both time codesstill match beyond the time when the message was tamperedwith. If a new message is received from Alice before τM thenit is possible that the newly received message could erase theinfluence of Eve on the time code. In the following we lookat the probability of that happening.

Given the structure of the encoding, the change made byEve would not be detected by the time td+1 when xd+1 isreceived if

∀k s.tk∑

i=0

td +gsd,i < td+1; gsd,i = gs′d,i for i = 0 to k (6)

This means that the corresponding elements of (τ) and (τ ′)with values between td and td+1 are equal.Similarly, the change isn’t detected by the time td+2 if itwasn’t detected beforehand and

Oxd+1(sd) = Oxd+1

(s′d), or (7)

∀k s.tk∑

i=0

td+1 + gsd+1,i < td+2; gsd+1,i = gs′d+1,ifor i = 0 to k

(8)

The events described by the statements 6, 7 and 8 areprobabilistic by nature as they depend not only on the coding

parameters but also on the information received from the infor-mation source. We define the list of probabilities (P0, P1, ...)that statements of type (6, 8,...) are true and (Q1, Q2...) thatstatements of type 7 are true.

We can build the following probability tree to determinethe probability of Eve avoiding detection.

(td, xd)

notdetected

byt = td+1

undetectable after td+1

Q1

notdetectedby td+2

undetectable after td+2

Q2

not detected by td+3...P2

P1

P0

Note that if the functions Ox are invertible, then Qi = 0,∀i. In this case if Pi < 1 then the probability of Eve avoidingdetection will become vanishingly small with time.

REFERENCES

[1] K.-A. Shim, “A survey of public-key cryptographic primi-tives in wireless sensor networks,” IEEE CommunicationsSurveys Tutorials, vol. 18, no. 1, pp. 577–601, 2016.

[2] G. Nebbione and M. C. Calzarossa, “Security of iotapplication layer protocols: Challenges and findings,”Future Internet, vol. 12, no. 3, 2020. [Online]. Available:https://www.mdpi.com/1999-5903/12/3/55

[3] Y. Liu, Z. Jin, and Y. Wang, “Survey on security schemeand attacking methods of wpa/wpa2,” in 2010 6th Interna-tional Conference on Wireless Communications Network-ing and Mobile Computing (WiCOM), 2010, pp. 1–4.

[4] F. Rottenberg, T.-H. Nguyen, J.-M. Dricot, F. Horlin, andJ. Louveaux, “Csi-based versus rss-based secret-key gener-ation under correlated eavesdropping,” IEEE Transactionson Communications, vol. 69, no. 3, pp. 1868–1881, 2021.

[5] S. Iellamo, R. Guiazon, M. Coupechoux, and K. Wong,“Securing iot uplink communications against eavesdrop-ping,” in 2018 3rd Cloudification of the Internet of Things(CIoT), 2018, pp. 1–8.

[6] Z. Liu, J. Liu, N. Kato, J. Ma, and Q. Huang, “Divide-and-conquer based cooperative jamming: Addressing multipleeavesdroppers in close proximity,” in IEEE INFOCOM2016 - The 35th Annual IEEE International Conferenceon Computer Communications, 2016, pp. 1–9.

[7] A. Borkar, A. Donode, and A. Kumari, “A survey onintrusion detection system (ids) and internal intrusiondetection and protection system (iidps),” in 2017 Interna-tional Conference on Inventive Computing and Informatics(ICICI), 2017, pp. 949–953.

[8] D. Zhang, Y. Zhu, and Y. Zhang, “Multi-led phase-shiftedook modulation based visible light communication sys-tems,” IEEE Photonics Technology Letters, vol. 25, no. 23,pp. 2251–2254, 2013.

[9] Mi Li, Tingguang Li, Xuping Zhang, Y. Song, and YangLiu, “Bit error rate analysis of ook modulation schemeunder non-coherent demodulation for space uplink opticalcommunication systems,” in 2015 IEEE 16th InternationalConference on Communication Technology (ICCT), 2015,pp. 695–698.