A game theoretic approach to detect and co-exist with ...mainak/papers/co-exist-comnet.pdfpresents a...

Computer Networks 71 (2014) 63–83

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/ locate/comnet

A game theoretic approach to detect and co-existwith malicious nodes in wireless networks q

http://dx.doi.org/10.1016/j.comnet.2014.06.0081389-1286/� 2014 Elsevier B.V. All rights reserved.

q Approved for Public Release; Distribution Unlimited: 88ABW-2014-2735 dated 05 June 2014.⇑ Corresponding author. Tel.: +1 4078235793.

E-mail addresses: [email protected] (W. Wang), [email protected] (M. Chatterjee), [email protected] (K. Kwiat), [email protected] (Q. Li).

Wenjing Wang a, Mainak Chatterjee b,⇑, Kevin Kwiat c, Qing Li a

a Blue Coat Systems, Inc., Sunnyvale, CA, United Statesb EECS, University of Central Florida, Orlando, FL, United Statesc Air Force Research Laboratory, Rome, NY, United States

a r t i c l e i n f o

Article history:Received 12 February 2013Received in revised form 19 May 2014Accepted 14 June 2014Available online 26 June 2014

Keywords:Malicious nodeGame theoryCoexistenceBayesian gamesMarkov Bayes–Nash Equilibrium

a b s t r a c t

Identification and isolation of malicious nodes in a distributed system is a challengingproblem. This problem is further aggravated in a wireless network because the unreliablechannel hides the actions of each node from one another. Therefore, a regular node canonly construct a belief about a malicious node through monitoring and observation. In thispaper, we use game theory to study the interactions between regular and malicious nodesin a wireless network. We model the malicious node detection process as a Bayesian gamewith imperfect information and show that a mixed strategy perfect Bayesian Nash Equilib-rium (also a sequential equilibrium) is attainable. While the equilibrium in the detectiongame ensures the identification of the malicious nodes, we argue that it might not be prof-itable to isolate the malicious nodes upon detection. As a matter of fact, malicious nodescan co-exist with regular nodes as long as the destruction they bring is less than the con-tribution they make. To show how we can utilize the malicious nodes, a post-detectiongame between the malicious and regular nodes is formalized. Solution to this game showsthe existence of a subgame perfect Nash Equilibrium and reveals the conditions that arenecessary to achieve the equilibrium. Further, we show how a malicious node canconstruct a belief about the belief held by a regular node. By employing the belief aboutthe belief system, a Markov Perfect Bayes–Nash Equilibrium is reached and the equilibriumpostpones the detection of the malicious node. Simulation results and their discussions areprovided to illustrate the properties of the derived equilibria. The integration of the detec-tion game and the post-detection is also studied and it is shown that the former one cantransit into the latter one when the malicious node actively adjusts its strategies.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

In a distributed wireless system where multiple nodeswork towards individual or common goals, cooperative

behavior among the nodes (such as controlling the trans-mit power level, reducing interference for each other,revealing private information, adhering to network poli-cies) is highly desired for increasing system capacity.Though this desirable property makes it easy to analyze asystem due to state space reduction; in reality, thisassumption might be too strong. For example, there mightbe entities in the network (also called nodes) which mightact in a selfish manner. These selfish nodes, governed bytheir utility function, care about their own payoffs and

http://crossmark.crossref.org/dialog/?doi=10.1016/j.comnet.2014.06.008&domain=pdf

http://dx.doi.org/10.1016/j.comnet.2014.06.008

mailto:[email protected]






http://dx.doi.org/10.1016/j.comnet.2014.06.008

http://www.sciencedirect.com/science/journal/13891286

http://www.elsevier.com/locate/comnet

64 W. Wang et al. / Computer Networks 71 (2014) 63–83

choose corresponding strategies to maximize them. Usu-ally, the payoffs are the benefits a node can derive fromother nodes or the network. However, it is possible thatthere are some nodes whose objective is to cause harmand bring disorder to the network. These nodes, referredas malicious nodes, do not reveal their identities whiledisrupting network services. The objective of suchmalicious nodes is to maximize the damage before theyare detected and isolated. They are also rational, and theirpayoff is determined by the amount of damage they causeto the network.

In order to minimize the impact of the malicious nodes,detection mechanisms need to be in place. Thus, a regularnode should monitor its surroundings and distinguish amalicious node from a regular one. However, the detectionprocess has challenges. First, active monitoring can becostly. To identify malicious behaviors, a regular nodehas to listen to the channel and/or process the informationsent by the nodes being monitored. These monitoringactivities consume resources and hence, an ‘‘always on’’monitoring scheme is not efficient even if plausible. Sec-ond, the malicious node can disguise itself. To reduce theprobability of being detected, a malicious can behave likea regular node and choose longer intervals betweenattacks. Third, the randomness and unreliability of thewireless channel bring more uncertainty to the monitoringand detection process.

In spite of the above challenges, mechanisms to detectmalicious nodes can always be designed. However, theimportant question is ‘what should the regular node doupon detecting a malicious node?’ Though the reasonableresponse would be to immediately isolate the maliciousnode, there might be situations where malicious nodescan be kept and made use of. The most straightforwardreason for the coexistence is that a malicious node hasno idea whether it has been identified or not, and it willcontinue to operate like a regular node to avoid detection.During this time, i.e., when the malicious node cooperatesin disguise, it can be exploited for normal network opera-tions. This ‘‘involuntary’’ help from the malicious nodemay be valuable, especially when the network resource islimited. As a matter of fact, from the perspective of themalicious nodes, coexistence gives them a longer lifetimein the network and the opportunity to launch futureattacks. As far as the regular nodes are concerned, theyhave a criteria to evaluate the benefit from the maliciousnodes. The criteria also determine when to terminate thecoexistence and isolate the malicious nodes.

To make the process of detection even more difficult,the malicious nodes do not act passively and wait to bedetected. Instead, they also study the interaction they havewith the rest of the network and adjust their subsequentactions accordingly. It is also possible that a maliciousnode is wise enough to learn and predict the actions ofthe regular nodes to assist itself in making its owndecisions on how to behave. The options available to themalicious nodes complicate the solution space and mosttraditional control theoretic approaches fail to find theequilibrium strategies for both the regular and maliciousnodes. In particular, these problems fall more appropri-ately in the domain of static and dynamic distributed

games and thus the application of game theory is anelegant way to tackle such problems. It is important thatsolution concepts from game theory are used to guidethe protocol design process such that nodes working in adistributed manner can co-exist, even with differentintents.

Game theory [7,24] has been successfully applied tosolve various problems in wireless networks includingcooperation enforcement [5,6,11,21,26], routing protocols[10,22,30,34] and other system design issues[2,13,17,20,25,32,33]. Recently, much work has been donethat investigates the interactions between the regular andmalicious nodes using game theory. Kodialam et al. for-mally propose a game theoretic framework to model howa service provider detects an intruder [14]. However, theirassumptions of zero-sum game and complete, perfectknowledge have limitations. Agah et al. study the non-zero-sum intrusion detection game in [1]; their resultsinfer the optimal strategies in one-stage static game withcomplete information. In [19], Liu et al. propose a Bayesianhybrid detection approach to detect intrusion in wirelessad hoc networks. They design an energy efficient detectionprocedure while improving the overall detection power.The intrusion detection game with networked devices areinvestigated in [35], where Zhu et al. introduce anN-person non-cooperative game to study incentive com-patibility of the collaborative detection. [18] models theintention and strategies of a malicious attacker throughan incentive-based approach. The importance of the topol-ogy on the payoffs of the malicious nodes is investigated in[28]. An interesting flee option for the malicious node isproposed in [16]. In that analysis, a malicious node decidesto flee when it believes it is too risky to stay in the net-work. While the approach focuses on how the flee actionaffects the result of the game, it does not consider thenoises in observation.

There have been some recent researches that focus onthe effects of imperfect and/or incomplete information innetworking and communications security. In [23], theattacker defender game is modeled as a fictitious play(FP) game, and the authors study the effect of observationerrors on the convergence of Nash Equilibrium when theerror probability in the channel is unknown. They showedthat in a stochastic FP game, the attacker can conceal itstrue strategy by including an entropy term in the payofffunctions. The authors in [8] propose an interestingapplication of physical layer security game, where thesource node pays the surrounding friendly jammer nodesto interfere the eavesdropper, so that the eavesdroppercan be masked. The focus is on how to apply game theoryto set the price charged by the friendly jammers. Theresearch in [3] deals with malicious jammers when theuser does not know how the jamming efforts are distrib-uted among sub-carriers or the fading gains with certainty.The equilibrium strategies in closed form are derived andthe range of sub-carriers where the transmitter can expectthe jamming attack is specified. The jamming game inmulti-band covert timing networks is considered in [25],where the camouflaging resources in the covert timenetwork introduce uncertainty. In their modeling, a sens-ing game is played so that covert timing network can

W. Wang et al. / Computer Networks 71 (2014) 63–83 65

determine the amount of camouflaging resources to bedeployed. In another subsequent game, the maliciousattacker finds the optimal transmit powers on eachspectral band it chooses to attack. The existence of Nashequilibria in both games leads to a more effective defensemechanism against the attacker.

The research presented in this paper differs fromexisting literature in two aspects. First, our researchpresents a systematic analysis not only for the maliciousnode detection process, but also the interactions amongnodes after detection. Unlike [19,35], our game theoreticalanalysis does not stop when the malicious node isdetected. Instead, we propose the notion of co-existencewith malicious node and extend the games after detection.Second, we empower the malicious node with counter-measures to learn from the games. In our game, the mali-cious node is intelligent enough to learn from theoutcomes of the games and adjusts its strategies accord-ingly. We integrate the learning process of malicious nodeinto the detection process and post-detection games.

In this paper we use game theory to model and analyzethe interactions between a malicious node and a regularnode. In particular, the malicious node is the active node(e.g., sending packets) and the regular node is the observernode (e.g., receiving packets). We formalize the interac-tions into two cascaded games. The first game, namelymalicious node detection game, is a Bayesian game withimperfect information. The information is hidden becausethe malicious node can disguise as a regular node andthe actions are hidden due to the noise and imperfectobservation. The second game, called post-detection game,is played when the regular node knows confidently thatits opponent is a malicious node. In the latter game, theregular node observes and evaluates the actions of themalicious node, and decides whether to keep it or isolateit. For both games, we show the existence of equilibriaand derive the conditions that achieve them. To addressthe possible countermeasures the malicious node mighttake, we propose a belief about the belief model. In thismodel, the malicious node learns from its private observa-tions and predicts if the regular node has accumulatedenough information to make the detection. Associatedwith the belief, we show that a Markov Perfect Bayes–NashEquilibrium emerges in the detection game. We also pro-vide simulation study to support the efficiency and otherproperties of the equilibria.

The main contributions in this paper can be categorizedinto three parts.

� We model the malicious node detection game underunreliable channels as a Bayesian game with imper-fect monitoring and show a mixed strategy perfectBayesian Nash Equilibrium is attainable. The strategyprofile is also shown to give a sequential equilibriumsolution. As a special case, the equilibrium is applica-ble in a multihop fashion if no consecutive nodesalong the route are malicious. Results show how theequilibrium strategy profiles are affected by parame-ters like channel noise, successful attack rate, success-ful detection rate, attack gain, detection gain, andfalse alarm rate.

� We propose the notion of coexistence after detection inorder to utilize the malicious node. A coexistence indexis designed to evaluate the helpfulness of a maliciousnode. We derive the conditions under which a subgameperfect Nash Equilibrium is achieved. Through simula-tion, we also show how the malicious node can be usedto improve the network throughput and extend networklifetime.� We introduce a novel belief about belief model employed

by the malicious node. A Markov Perfect Bayes–NashEquilibrium is induced when both nodes constantlyupdate their beliefs. This equilibrium is shown to delaythe detection of the malicious node and help the mali-cious node actively adjust its strategy to avoid detection.This model also helps to integrate the detection and post-detection games with effective transition.

The rest of the paper is organized as follows. In Sec-tion 2, we introduce and solve the Bayesian game of mali-cious node detection. Section 3 presents the post-detectiongame and discusses how malicious and regular nodes cancoexist after detection. Section 4 explores the countermea-sures available to the malicious node. Simulation resultsare presented in Section 5 that illustrate our findings onthe detection and post-detection games as well as discus-sions how to integrate two games. The last section con-cludes the paper.

2. Malicious nodes detection game

2.1. Network model

We consider a wireless network consisting of Regularand Malicious nodes. By regular node we mean a node thatworks towards the common goal of the network. Also, it isrational and its actions are governed by an underlyingutility function. On the other hand, a malicious node aimsto hamper, disturb, and even attack the network. Althoughthe actions of a malicious node are also determined by cer-tain utility functions, such functions are designed to bringdamages to the network. In addition, regular nodes are will-ing to cooperatively detect the malicious nodes in the net-work, even if the detection process might consume theirown resources. On the contrary, malicious nodes do notwork with regular nodes to detect other malicious ones.

Despite the two types of nodes, the identity (type) of amalicious node is not directly revealed to others. Instead,the types can only be estimated or conjectured throughobserving actions. To identify the attacks and maliciousnodes in the network, a regular node can monitor theactions of others. However, such monitoring is costly(e.g., consumes the receivers’ own resource) and a nodecannot afford to monitor all the time. Moreover, the obser-vations might not be accurate because of the noise, e.g.,wireless channel loss. Thus, the regular nodes do notmonitor the network all the time and during those times,attacks cannot be identified.

Our research focuses on a two-node interaction processin a wireless network. In particular, we are interested inthe packet forwarding process. In this model, we consider


a single hop between a source and a destination, or a partof (one hop) a packet forwarding chain (e.g., ad hocnetworks). We assume that node i, or the sender node,has a packet to send to the next-hop node. Such a packetcan be generated by node i itself, or relayed to node i fromanother node. If the sender node’s type is regular, it onlytakes the action ‘‘Forward’’.1 If the sender node is malicious,it can choose to ‘‘Attack’’ with a risk of being identified or‘‘Forward’’ (not attack) to disguise. The action Attack refersto a general set of actions that harm the network and disruptnormal network operation, e.g., intentional dropping ofpackets or altering the payload of packets. It is noted thatunlike other research that tries to exploit the techniques ofthe Attack option, we generalize our approach such that itcan be applied to a number of different kinds of attack,regardless of the unique features that an attack might have.Discussions on the applicability and attack types will be pre-sented in the last section. The opponent of node i is node jwhich has the capacity to monitor the actions taken by nodei. In the context of wireless networks, it refers to listening onthe same channel node i is on. If node j takes Monitor, itturns on the radio and listens to what node i has to send.It is noted that for now we assume node j is not the intendedrecipient of node i’s packet. We will relax this assumptionlater in Section 2.5. We further assume that time is dividedinto slots and nodes take their actions within each slot, i.e.,one hop of packet forwarding takes one slot.

There are a number of well-known attacks that can beabstracted as our network model. Byzantine attack [31] islaunched by a set of malicious or compromised nodes thatbehave arbitrarily to disrupt the network. in Byzantineattack, malicious nodes can selectively drop packets, whichresults in disruption or degradation of the routing services[4]. Byzantine attacks are hard to detect because maliciousnodes drop packets selectively. Our model can also bedirectly applied to analyze the packet dropping in anotherattack called Black hole attack [27]. In this attack, the mali-cious node attracts the packets from neighboring nodes,intercepts them and consumes without any forwarding.In addition, the malicious node can selectively forwardthe packets or even modify the packets. Although attract-ing packets by advertising false routes is not covered inthis research, our model can nevertheless characterizethe packet forwarding process, where the attack action isa packet failure/drop and the selectiveness of not launch-ing attacks is packet forwarding.

2 It’s the receiver of the game signal, not the packet.3 We could have defined monitor success rate (w) to show the probability

2.2. Game model

To abstract the interactions among the nodes, we con-sider a two-player non-zero sum game played by the nodesi and j. The types of these nodes, hi and hj, are private infor-mation. Since the type of each player is hidden, and theobservation is not accurate, it is a Bayesian game withimperfect information [24].

1 Forward only applies to the case when the sender node is forwardingothers’ packets. If the sender node is the origin of a packet, such action iscalled Send. For the simplicity of presentation, we use Forward to refer tothis case.

To model the process of detecting the malicious nodesin the network, we apply a special category of Bayesiangame called the signaling game. A signaling game is playedbetween a sender and a receiver. The sender has a certaintype and a setM of available messages to be sent. Based onits knowledge on its own type, the sender chooses a mes-sage from M and sends it to the receiver.2 However, thereceiver does not know the type of the sender and can onlyobserve the message but not the type. Through observation,the receiver then takes an action in response to the messageit observed. In the malicious node detection game, the sen-der, node i can be either regular hi ¼ 0 or malicious hi ¼ 1.The receiver node can also be regular or malicious. However,because the receiver node is the observer, a malicious nodeas receiver will not help to detect other malicious nodes inthe network. Therefore, for now, we assume the receiver,node j is always a regular node, i.e., hj ¼ 0. We will revisitthis assumption and relax it in Section 2.5.

The action profiles ai available to node i are based on itstype. For hi ¼ 0; ai ¼ fForwardg. For hi ¼ 1; ai 2 fAttack;Forwardg. The receiver node j has the option to monitor ifnode i is attacking or not, thus aj 2 fMonitor; Idleg.

To further construct the game, we define the followingvalues. Let gA be the payoff of a malicious node if it suc-cessfully attacks. The cost associated with such an attackis cA. For the receiver node j, the cost of monitoring is cM

and 0 if it is idle. Hence, for the action profileðai; ajÞ ¼ ðAttack; IdleÞ, the net utility for a successful attack-ing node i is gA � cA, the loss for node j is �gA due to theattack. Similarly, if the action profile is ðai; ajÞ ¼ ðAttack;MonitorÞ, the attacking malicious node i loses gA þ cA, andthe net gain for node j is gA � cM . However, if a maliciousnode chooses not to attack, the cost to forward a packetis cF , which is the same cost to a regular sender node.We make �gA � cA < �cF so that the malicious node hasincentive to forward packet when it’s under monitoring.Based on the types of node i and node j, the payoffsmatrices are presented in Table 1. For quick reference,the notations used in this paper are tabulated inTable A.1 in Appendix A.

In addition, in our model, we introduce pe as the chan-nel loss rate. The channel unreliability implies that moni-toring can be accurate with probability 1� pe. We alsodenote c as the attack success rate for a malicious node.c represents the rate at which attack can be successfullylaunched when a malicious node intends to. c also consid-ers the physical limitation of the malicious node.3 It isnoted that with this model, unsuccessful attacks can beaccurately monitored with reliable channel. When the chan-nel is unreliable, the monitoring node cannot tell Forwardfrom unsuccessful attacks.

of a regular node to successfully monitor an attack considering its physicallimitations. However, w has a similar effect as pe , and we omit this variablefor the sake of simplicity. If both w and pe need to be considered, we candefine a nominal channel loss rate p0e ¼ 1� ð1� peÞw to replace pe in ourmodel. Such linear transform will not alter the current form of our analysisand results.

Table 1Payoff matrix of two player malicious node detection game.

Node j

Monitor Idle

(a) hi ¼ 1, Malicious senderNode i Attack �gA � cA gA � cM gA � cA �gA

Forward �cF �cM �cF 0

(b) hi ¼ 0, regular senderNode i Forward �cF �cM �cF 0

= 0i θ = 1i

= 0j θ = 0j θ = 0j

Fc Mc(− , − ) Fc ), 0−(

(−g −A Ac , A Mg c− ) AcA ,g −( A−g )

Fc Mc(− , − ) Fc ), 0−(

ForwardAttack

φ1− φ

IdleIdle Idle

Forward

MonitorMonitor

Monitor

Nature

Fig. 1. Stage malicious node detection game tree.


2.3. Equilibrium analysis for the stage game

We begin our analysis on the malicious node detectiongame from the extensive form of the static Bayesian gameas illustrated in Fig. 1. To solve this game, we are interestedin finding the possible Bayesian Nash Equilibrium (BNE). In astatic Bayesian game, the BNE is the Nash Equilibriumgiven the beliefs of both nodes. In our case, node i knowsfor sure that for node j; hj ¼ 0. However, node j is not clearof node i’s type. Although for node i, its type is determinis-tic and not probabilistic, to consider both games in Table 1and the hidden nature of its identity to node j, in Fig. 1, weillustrate that node j’s belief about node i’s type hi ¼ 1happens with probability /.

First, let us consider pure strategies only. Based on hi,the pure strategies (ri) available for node i are dependenton its type. If hi ¼ 1, node i can either play Attack or For-ward. However when hi ¼ 0, node i can only play Forward.To categorize the actions available based on node type, wesummarize the strategies available to node i as ri={(Attackif malicious, Forward if regular), Always Forward}. For node j,the strategy set is rj={Monitor, Idle}. To find the BNE, we letri and rj play with each other and derive the conditionsunder which neither node can increase its utility byunilaterally changing its strategy.

Lemma 1. In the malicious node detection game, there is amalice belief threshold /0, such that no pure strategy BNEexists if / > /0.

Proof. We start by eliminating a trivial pure strategy pairðri;rjÞ=(Always Forward, Monitor). From Table 1(a), weknow that for both nodes, they can improve their payoffsby deviating from the strategy pair. We further analyzethe following two cases.

Case 1: ri = (Attack if malicious, Forward if regular). Fornode j, if rj ¼ Monitor, the expected payoff is

ujðMonitorÞ ¼ /ð1� peÞcðgA � cMÞ þ /ð1� peÞð1� cÞ� ðgA � cMÞ þ /pecð�gA � cMÞ þ /peð1� cÞð�cMÞ � ð1� /ÞcM; ð1Þ

where each term represents perfect monitoring of the suc-cessful attack, perfect monitoring of the unsuccessfulattack, failing to monitor the successful attack, failing tomonitor the unsuccessful attack and node i is regularrespectively. If rj ¼ Idle, the expected payoff is

ujðIdleÞ ¼ �/cgA: ð2Þ

If (2) > (1), the dominant strategy for node j is Idle. Corre-spondingly, for node i, the best response would be (Attack ifmalicious, Forward if regular). Thus ðri;rjÞ = {(Attack if mali-cious, Forward if regular), Idle} is a BNE under the conditionthat / < cM

ð1�peÞð1þcÞgA. If (2) < (1), or / > cM

ð1�peÞð1þcÞgA, the dom-

inant strategy for node j is Monitor, however, the bestresponse to Monitor for node i is Always Forward. Henceðri;rjÞ = {(Attack if malicious, Forward if regular), Monitor}is not a BNE under the condition that / > cM

ð1�peÞð1þcÞgA.

Case 2: ðri;rjÞ = {Always Forward, Idle}. If node j choosesnot to monitor, the best response for node i is to Attack ifmalicious. This will lead to the previous case when/ < cM

ð1�peÞð1þcÞgA. Therefore, there is no BNE if

ðri;rjÞ = {Always Forward, Idle}.

To sum up, the pure strategy BNE exists if and only if/ < cM

ð1�peÞð1þcÞgA. The equilibrium strategy profile is

ðri;rjÞ={(Attack if malicious, Forward if regular), Idle}. Inother words, we can find /0 ¼ cM

ð1�peÞð1þcÞgA, such that no

pure strategy BNE exists if / > /0. �

Although pure strategy BNE exists, it is not practicalbecause the equilibrium requires node j to be Idle at alltimes, and hence the malicious nodes cannot be detected.It is also called Pooling Equilibrium [24] in which thereceiver has no clue about sender’s type. Therefore, it isdesirable to seek a mixed-strategy BNE, and obviously,such BNE exists when / > /0.

Let us denote p as the probability with which node i oftype hi ¼ 1 plays Attack and q as the probability with whichnode j plays Monitor. To find the mixed strategy BNE of thisgame, we need to find the values of p and q such thatneither node i nor j can increase payoff by altering theactions.

Lemma 2. The malicious node detection game has a mixed

strategy BNE when ri;rj ¼ Attack with cM/ð1þcÞð1�peÞgA

if�n

hi ¼ 1; Forward if hi ¼ 0Þ; Monitor with cgA�cAþcFð1�peÞð1þcÞgA

g.

Proof. For the mixed strategy played by node i, the payoffof node j playing Monitor is

ujðMonitorÞ ¼ /p½ð1� peÞcðgA � cMÞþ ð1� peÞð1� cÞðgA � cMÞ� peð1� cÞcM � pecðgA þ cMÞ� � /ð1� pÞcM

� ð1� /ÞcM ¼ /½1� peð1þ cÞ�pgA � cM: ð3Þ

If node j plays Idle,


ujðIdleÞ ¼ �/cpgA: ð4Þ

Thus, in the mixed BNE strategy, ujðMonitorÞ ¼ ujðIdleÞ.Thus p ¼ cM

/ð1�peÞð1þcÞgA. Similarly, when node j plays the

mixed strategy, the payoff of node i playing Attack is

uiðAttackÞ ¼ �ð1� peÞqðgA þ cAÞ þ cð1� qÞðgA � cAÞþ pecqðgA � cAÞ � peð1� cÞqcA � ð1� cÞ� ð1� qÞcA

¼ ðpe � 1Þð1þ cÞqgA þ cgA � cA: ð5Þ

When node i plays Forward,

uiðForwardÞ ¼ �cF : ð6Þ

Hence, to obtain q;uiðAttackÞ ¼ uiðForwardÞ, andq ¼ cgA�cAþcF

ð1�peÞð1þcÞgA. �

Lemmas 1 and 2 provide us with the conditions underwhich BNE can be attained. One of the conditions is thebelief of malice threshold /0. As suggested in Lemma 1,this threshold is related to the channel reliability (1� pe),attack success rate (c) and detection gain (gA=cM). In thepure strategy BNE, node i always attacks if it is malicious.The belief of node j on node i’s malice is very low sincethe detection gain is usually very large as pe; c 2 ½0;1�.However, when the belief grows and eventually exceedsthe threshold, the mixed strategy BNE requires node i tobe less aggressive in attacking. In other words, the equilib-rium implies node i should know about node j’s beliefwhen making the decision. When node j is absolutely sureabout node i’s type, node i’s equilibrium attack probabilitydrops to the value of the belief threshold.

4 An alternative way to represent this process is using Naive Bayesestimation. Although the calculation complex is less, the representation isless intuitive and illustrative to the dynamics of the subgames.

5 An information set is a set of all the possible moves that could havetaken place in the game so far, for a particular player, given what thatplayer has observed. In an imperfect information game, an information setcontains all possible states in the history, e.g., in Fig. 1, the dotted linesshow the information set available to node j.

2.4. Belief update and dynamic Bayesian games

So far, the analysis on the malicious node detectionstage game has shown that the equilibrium is associatedwith node j’s belief on node i’s type. However, the difficultylies in the assignment of the belief as a priori informationavailable to node j. Thus, it is desirable that this beliefcan be accurately presented and dynamically updated.We apply dynamic Bayesian game theory to discuss howthe belief is updated.

We assume that the static malicious node detectiongame is repeatedly played at every time slot, and we con-sider the infinite repeated game without discounting (i.e.,payoffs in every stage/slot have equal weight). In additionto the notation defined in the stage game, we introducelðtÞj ðhi ¼ �hiÞ as the belief node j holds about hi ¼ �hi at thetth stage of the subgame. Since node j is always a regularnode, lðtÞi ðhj ¼ 0Þ=1 for all t > 0. We further define aiðtÞ asthe action node i plays at tth stage. Node j may monitornode i’s actions through the observed signal aiðtÞ. The rea-sons for the discrepancy between aiðtÞ and aiðtÞ are theobservation error caused by the channel unreliability andthe false alarm rate (a) caused by the inaccuracy andlimitation in the detection of node j.

Based on Bayes’ theorem, we construct our beliefupdate rule. If node j is continuously monitoring, its beliefon hi can be calculated with the belief it holds at the

immediate previous stage and the actions it observed.We write the belief at the ðt þ 1Þth stage as:

lðtþ1Þj ðhiÞ ¼

lðtÞj ðhiÞPðaiðtÞjhiÞP~hi2HlðtÞj ð~hiÞPðaiðtÞj~hiÞ

; ð7Þ

where H is the space of all possible values hi can take; inour case H ¼ f0;1g.4

For each of the terms in Eq. (7), we have the followingequations.

PðaiðtÞ ¼ Attackjhi ¼ 1Þ ¼ ð1� peÞpþ að1� pÞ; ð8ÞPðaiðtÞ ¼ Attackjhi ¼ 0Þ ¼ a; ð9ÞPðaiðtÞ ¼ Forwardjhi ¼ 1Þ ¼ pepþ ð1� aÞð1� pÞ; ð10ÞPðaiðtÞ ¼ Forwardjhi ¼ 0Þ ¼ 1� a: ð11Þ

Since node j does not monitor node i’s actions at everystage, but rather with probability q. When node j is notmonitoring, its belief remains the same at the next stage.Thus, Eq. (7) is revised as:

lðtþ1Þj ðhiÞ ¼ q

lðtÞj ðhiÞPðaiðtÞjhiÞP~hi2HlðtÞj ð~hiÞPðaiðtÞj~hiÞ

þ ð1� qÞlðtÞj ðhiÞ: ð12Þ

The concept of belief system is hence introduced todescribe the aforementioned belief building and updatingprocess. A belief system is a function that assigns eachinformation set5 a probability distribution over the histories(i.e., past moves and states) in that information set [24].Although in our discussions above, we did not explicitlystate how history is accounted for in Eqs. (7) and (12), it iseasy to observe that every updated belief is determined bythe actions node j observes in the current stage and thebelief it holds. The beliefs are further determined by theactions in the previous stages and it can be backtracked tothe initial belief and the subsequent actions. Thus, thecurrent belief and observed action can fully represent thehistories in the information sets, and those information setscan be reached with positive probabilities if the strategiesare carefully designed.

With the belief system, the games are played in asequential manner. These games are independent of eachother and as the game evolves, neither nodes can stick tothe very same strategy at every stage to yield the mostpayoffs. Thus, the best response strategies are dependenton the current beliefs held by the nodes. Perfect BayesianEquilibrium (PBE) can be applied to characterize the afore-mentioned dependency. In PBE, the belief system isupdated by Bayes’ rule. PBE also demands the optimalityof subsequent play given the belief. Next, we show howto construct a PBE in the dynamic malicious node detectiongame.


We first show the existence of a mixed strategy equilib-rium and then argue the unfeasibility of the pure strategyequilibrium. Consider an arbitrary stage k of the game; wedenote pðkÞ as the probability node i of type hi ¼ 1 playsAttack; qðkÞ as the probability node j plays Monitor. In theequilibrium, uðkÞi ðAttackÞ ¼ uðkÞi ðForwardÞ and uðkÞj ðMonitorÞ¼ uðkÞj ðIdleÞ. The analysis is in similar form as the proof ofLemma 2. In particular,

uðkÞi ðaðkÞi ¼ Attack; aðkÞj ¼ MonitorÞ ¼ �ð1� peÞqðkÞðgA þ cAÞ

þ cð1� qðkÞÞðgA � cAÞ þ cqðkÞpeðgA � cAÞ� peð1� cÞqðkÞcA � ð1� cÞð1� qðkÞÞcA; ð13Þ

uðkÞi ðaðkÞi ¼ Forward; aðkÞj ¼ MonitorÞ ¼ �cF : ð14Þ

uðkÞj ðaðkÞj ¼ Monitor; aðkÞi ¼ AttackÞ

¼ lðkÞj ðhi ¼ 1ÞpðkÞ½ð1� peÞcðgA � cMÞþ ð1� peÞð1� cÞðgA � cMÞ� peð1� cÞcM � pecðgA þ cMÞ�� lðkÞj ðhi ¼ 1Þð1� pðkÞÞcM � lðkÞj ðhi ¼ 0ÞcM ; ð15Þ

uðkÞj ðaðkÞj ¼ Idle; aðkÞi ¼ AttackÞ ¼ �lðkÞj ðhi ¼ 1ÞcpðkÞgA: ð16Þ

The solutions to the above equations are

pðkÞ ¼ cM

lðkÞj ðhi ¼ 1Þð1� peÞð1þ cÞgA

; ð17Þ

qðkÞ ¼ cgA � cA þ cF

ð1� peÞð1þ cÞgA: ð18Þ

What pðkÞ and qðkÞ suggest is an equilibrium profile

ðrðkÞi ;rðkÞj Þ. This profile shows the sequential rationality[7,24], that is, each node’s strategy is optimal wheneverit has to move, given its belief and the other node’s strat-egy. In other words, at any stage k, for any alternative strat-egies ri0ðkÞ and rj0ðkÞ,

uðkÞi ððrðkÞi ;rðkÞj Þjhi; aiðtÞ;lðkÞj ðhiÞÞP

uðkÞi ððri0ðkÞ;rðkÞj Þjhi; aiðtÞ;lðkÞj ðhiÞÞ;ð19Þ

uðkÞj ððrðkÞi ;rðkÞj Þjhi; aiðtÞ;lðkÞj ðhiÞÞP

uðkÞj ððrðkÞi ;rj0ðkÞÞjhi; aiðtÞ;lðkÞj ðhiÞÞ:

ð20Þ

Besides sequential rationality, a PBE also demandsthat the belief system satisfies the Bayesian conditions[7].

Definition 1 [7], pp. 331–332. The Bayesian conditionsdefined for PBE are

B(i): Posterior beliefs are independent. For historyhðkÞ;liðh�ijhi;h

ðkÞÞ ¼Q

j–iliðhjjhðkÞÞ.B(ii): Bayes’ rule is used to update beliefs whenever

possible.B(iii): Nodes do not signal what they do not know.B(iv): Posterior beliefs are consistent for all nodes with a

common joint distribution on h given hðkÞ.

Our proposed belief system satisfies the Bayesian condi-tions. B(i) is satisfied because hj ¼ 0 all the time. Eq. (7) isderived from Bayes’ rule, and hence B(ii) is also satisfied.B(iii) is fulfilled because node i’s signal is determinedby its action and if aiðkÞ ¼ aiðkÞ;ljðhijaiðkÞ;hðkÞj Þ ¼ljðhijaiðkÞ;hðkÞj Þ. B(iv) is trivial in our game because no thirdplayer exists.

The analysis on Bayesian conditions and sequentialrationality serves as the proof of the following theorem.

Theorem 1. The dynamic malicious node detection game hasa perfect Bayesian equilibrium that can be attained withstrategy profile ðrðkÞi ;rðkÞj Þ ¼ ðp

ðkÞ; qðkÞÞ.

Remark 1. The infeasibility of pure strategy PBE is provedas follows: If node i attacks, the best response for node j isMonitor, which makes node i non-profitable to play Attack.If node i plays Forward; pðkÞ ¼ 0, the best response for node jis Idle (i.e., qðkÞ ¼ 0). However, the sequential rationalityrequires qðkÞ P cgA�cAþcF

ð1�peÞð1þcÞgA, which leads to a contradiction.

Therefore, no pure strategy PBE exists in the dynamicmalicious node detection game. It is noted that the infeasi-bility of the pure strategy PBE in the dynamic settingsshould not be confused with the existence of a purestrategy BNE in a static game because the pure strategyBNE in a static game is always an artifact.

Remark 2. The proved PBE can be further refined toSequential Equilibrium [15]. In the sequential equilibrium,the Bayesian conditions are extended as belief sensibilityand consistency. The belief sensibility requires the informa-tion sets can be reached with positive probabilities (l)given the strategy profile r. The consistency demands anassessment (r;l) should be a limit point of a sequence ofthe mixed strategies and associated sensible beliefs, i.e.,ðr;lÞ ¼ limn!1ðrn;lnÞ. In our game, belief sensibility issatisfied because our proposed belief system updates thebeliefs according to Bayes’ rule and it assigns a positiveprobability to each of the information set. Theorem 8.2 in[7] states that in incomplete information multi-stagegames, if neither player has more than two types, Bayesiancondition is equivalent to belief consistency requirement.In our game, hi ¼ 0;1; hj ¼ 0, and hence consistency is ful-filled. Together with the sequential rationality, the PBE inour game is also a sequential equilibrium. Since everyfinite extensive-form game has at least one sequentialequilibrium, which is a refinement to PBE, it also impliesthe existence of PBE in our game.

2.5. Detection game beyond one hop

We now extend our discussion to beyond one hop. Asan illustrative example, we consider a multi-hop packetforwarding chain which consists of multiple one hop send-ing processes. A series of malicious node detection gametake place as the packet is relayed from one node toanother along a pre-defined route. We describe the mali-cious node detection in a multi-hop scenario as follows.


(1) The source node generates the packet and sends it tothe first forwarding node. In the event that thesource node is malicious, as long as it does not playAttack, we still regard the packet generated from amalicious node as useful and harmless.6

(2) For an intermediate node, it is node j at nth hop, andit becomes the node i of nþ 1th hop game as long asnode i in nth hop does not successfully play Attack.The series of cascaded detection game will terminateonce an Attack is successful or packet is lost due tochannel unreliability.

(3) Once packet reaches the destination node, no matterwhat type of node the destination is, multi-hoppacket forwarding is regarded as successful andcomplete. There will be no more detection game,even if the destination node is malicious.

(4) Although when multiple nodes are in the game, it ispossible for them to have differentiated payoffs, formathematical tractability, in this research, we applythe same payoff structure to all nodes.

In multi-hop forwarding, the receiver at nth hop is thesender at nþ 1th hop. This requires us to relax the limiton the types of node i and j at single shot detection game.In particular, there are be four different combinations.

(1) Node i is malicious, node j is regular. This is the ori-ginal case we discussed when hi=1.

(2) Both nodes i and j are regular. This is the case wealready discussed when hi=0.

(3) Both nodes i and j are malicious. In this case, if node idoes not forward packet, the packet forwardingends. Since no packets can be forwarded from eithernode, the rest of the network can treat nodes i and jas a wormhole [9].

(4) Node i is regular, node j is malicious. This is an inter-esting setup as the game at this stage does not pro-vide any useful results. When the malicious node isreceiving, it cannot launch attack and the sendernode cannot observe either. The implication of thiscase really depends on the next hop game. If the nextnode is malicious, then packet forwarding will endand create a wormhole. If the next node is regular,then detection game will resume at next hop.

By studying the detection game at nth hop, we concludethat we can apply the analysis of single hop detectiongame to an individual hop at multi-hop malicious nodedetection game. The cascade games can be played sequen-tially as long as no two malicious nodes play with eachother at any hop. In addition, one node j can play the detec-tion games with different nodes is on different packet for-warding paths. However, if only one packet forwardingpath is considered, whether there is a detection game inthe nþ 1th hop is determined by whether the sendingnode in the nth hop plays Attack. If Attack is played, the ser-

6 Technically, the malicious sender may choose to generate some packetswith false or ill-intended data; in our modeling, such packets are regardedas Attack.

ies of detection game on that path are terminated; none-theless, detection games on other paths are not affected.

To abstract the dynamic malicious node detection gamesbeyond one hop, and clarify the applicability of Theorem 1on such games, we provide the following conclusion.

Corollary 1. The dynamic malicious node detection game canbe extended to the format of spacial cascaded two-playergames in multiple hops. Perfect Bayesian equilibrium is attain-able in each of the hops until attack action is taken or observed.

3. Post-detection game and coexistence

In the previous section, we have discussed how toupdate node j’s belief system based on Bayes’ rule. It is nat-ural that through observation, although imperfect at everystage game, node j can accumulate a better estimationabout hi. Eventually, after repeated monitoring, there willbe a stage at which node j can predict with confidencewhether node i is regular or malicious.

3.1. Game model

Traditionally speaking, after node j has identified node ias a malicious node, it will try to report and isolate node ifrom the rest or the network immediately to prevent futureattacks. However, there are also situations where ‘‘isola-tion’’ may not be a good choice. Let us consider a wirelessnetwork which operates on a limited resource budget. Inorder to prolong the lifetime of the network, every regularnode has to be economical towards packet forwarding.Hence, if a malicious node can be used to handle some ofthe traffic, it is beneficial not to isolate it.

Although the idea of ‘‘making malicious node benefi-cial’’ might sound counter-intuitive, it is backed by the fol-lowing reasoning. In the malicious node detection game,we explained that the malicious node needs to be coopera-tive and not attack in order to camouflage. Furthermore,the malicious node is not aware of the outcome of thedetection process employed by the regular node. There-fore, the regular nodes can exploit the fact that the mali-cious node sometimes are involuntarily cooperative toavoid detection. In the context of packet forwarding asthe underlying application, when a malicious node playsthe cooperative strategy of Forward, the packet drop is lesscompared to the Attack action. Lower packet drop meansnetwork output improves, and it comes from the ‘‘helpful-ness’’ of a malicious node, even if its intention is to camou-flage, the action is indeed useful and helpful.

However, there is a trade-off between how much bene-fit a malicious node can bring and what damage it can do.We denote nF and nA as the number of successful forward-ing actions and number of attacks taken by a maliciousnode. Recall the cost of forwarding is cF and the loss dueto an attack to the network is gA.7 Thus, for a regular node,

7 Due to different settings of the network, gA need not be a constant. Itchanges with time, topology, and the network traffic pattern. In order tokeep the analysis tractable, we regard gA as an average value of loss due toan attack to the network.


if it observes that the total saving due to forwarding (nFcF) amalicious node contributes is greater than the total cost dueto its attack (nAgA), then keeping that node in the network isprofitable. It is also worthwhile to mention that although thevalues of cF and gA vary from one application to another, fora given application, the values are constant and measurable.

To further analyze the conditions under which a mali-cious node can be kept and coexist with the regular ones,we formally define the post detection game. The gamehas two players: node i and node j. Unlike the Bayesiannature of the detection game in Section 2, in post-detectiongame, both nodes know the types of their opponent, i.e.,node j knows that node i is malicious but has not takenany action to isolate it. Thus, hi ¼ 1; hj ¼ 0. The actionsavailable for node i is ai 2 fAttack; Forwardg, while theactions for node j is aj 2 fMonitor; Idleg. When node j mon-itors, it keeps a record of what node i has done since thebeginning of the game. It also calculates a coexistenceindex Ei ¼ Eð0Þi þ nFcF � nAgA for node i, where Eð0Þi is aninitial value of the index, nF is the observed number of

p�ðkÞ ¼ cM

½ð1� pe þ cpeÞgA � cM�PrðEi P sÞ þ ð1� peÞðgA � cMÞPrðEi < sÞ þ cM þ cgA: ð23Þ

forwarding actions and nA is the observed number ofattacks. If E i falls under a certain threshold s, node j willisolate node i and terminate the post-detection gamebecause keeping node i is no longer beneficial. If Ei P s,the game will be played in a repeated manner. The payoffmatrix for the post-detection game is the same as thedetection game for hi ¼ 1 as was shown in Table 1(a).

3.2. Searching for a coexistence equilibrium

Let us explore the strategies that both nodes cantake to reach the equilibrium of coexistence. To avoid

q�ðkÞ ¼ cA � cgA � cF

�ð1� peÞðgA þ cAÞPrðEi < sÞ þ ð1� peÞðcgA � cAÞðPrðEi P sÞ � 1Þ : ð26Þ

confusion, we denote p�ðtÞ and q�ðtÞ as the probabilitynode i plays Attack and node j plays Monitor respec-tively with time. It is noted that these probabilitiesare different from the ones we obtained in Section 2.4.Also, since this game is no longer Bayesian, we aremore interested in obtaining a subgame perfect NashEquilibrium.

We first derive the Nash Equilibrium using indifferenceconditions. Suppose the post-detection game is played atkth repetition, i.e., subgame k. The expected payoff forplayer j playing Monitor is

uðkÞj ðMonitorÞ ¼ fp�ðkÞ½ð1� peÞcðgA � cMÞ þ ð1� peÞ

� ð1� cÞðgA � cMÞ � peð1� cÞcM

� pecðgA þ cMÞ�gPrðEi

P sÞ þ ð1� peÞp�ðkÞðgA � cMÞPrðEi

< sÞ � ð1� p�ðkÞÞcM

¼ ½ð1� pe þ cpeÞgA � cM �p�ðkÞPrðEi

P sÞ þ ð1� peÞp�ðkÞðgA � cMÞPrðEi

< sÞ � ð1� p�ðkÞÞcM : ð21Þ

If node j plays Idle, the expected payoff is always

uðkÞj ðIdleÞ ¼ �cp�ðkÞgA: ð22Þ

Thus, the indifference condition requireuðkÞj ðMonitorÞ ¼ uðkÞj ðIdleÞ, and hence p�ðkÞ is obtained as inEq. (23) on next page.

Similarly, we can apply the indifference condition tonode i as:

uðkÞi ðAttackÞ ¼ q�ðkÞf�ð1� peÞðgA þ cAÞPrðEi < sÞþ ð1� peÞ½cðgA � cAÞ � ð1� cÞcA�PrðEi P sÞþ pecðgA � cAÞ � peð1� cÞcAg� ð1� q�ðkÞÞ½ð1� cÞcA � cðgA � cAÞ�¼ q�ðkÞf�ð1� peÞðgA þ cAÞPrðEi < sÞþ ðcgA � cAÞ½ð1� peÞPrðEi P sÞ þ pe�gþ ð1� q�ðkÞÞðcgA � cAÞ: ð24Þ

uðkÞi ðForwardÞ ¼ �cF : ð25Þ

Therefore, q�ðkÞ can be expressed as Eq. (26) above.The problem is then reduced to obtaining the probabil-

ity distribution of E i. Let us assume at the beginning of thepost-detection game Eð0Þi ¼ c0 P s. For the sake of discus-sion, we also assume that node j is constantly monitoring.Hence, if we consider l subgames, in each of the subgame,Ei is updated.

We denote a random variable y ¼ Ei ¼ c0 þ nF cF � nAgA.Since the mixed strategy profile requires node i to chooseAttack with probability p�ðtÞ; nF and nA are binomially dis-tributed as:


PrðnF ¼ bNFÞ ¼ CbNFl ½ð1� p�ðtÞÞð1� peÞ�

bNF

� ½1� ð1� p�ðtÞÞð1� peÞ�l�bNF ; ð27Þ

PrðnA ¼ bNAÞ ¼ CbNAl ½p

�ðtÞð1� peÞ�bNF ½1� p�ðtÞð1� peÞ�

l�bNF :

ð28Þ

Since y ¼ c0 þ nF cF � nAgA ¼ c0 þ nFcF � ðl� nFÞgA ¼ ðcFþgAÞnF � lgA þ c0 and l; cF ; gA; c0 are constants, to get thedistribution of y, we first get the distribution ofw ¼ yþ lgA � c0.

We use the probability generation function (pgf). Fordiscrete random variable x, its pgf is defined as

GXðzÞ ¼ E½zX � ¼X1x¼0

zxPrðX ¼ xÞ: ð29Þ

The pgf for w is

GW ðzÞ¼ E½zW � ¼ E zbNF ðcFþgAÞ

� �

¼Xl

nf¼0

znCnf

l ½ð1�p�ðtÞÞð1�peÞ�nf

n

�½1�ð1�p�ðtÞÞð1�peÞ�l�nf

oðcFþgAÞ

¼ ð1�p�ðtÞÞð1�peÞþ ½1�ð1�p�ðtÞÞð1�peÞ�zf gðcFþgAÞl:

ð30Þ

Let f ðnÞðxÞ ¼ @nf ðxÞ@xn ,

Prðw ¼ xÞ ¼ GðxÞW ð0Þx!

: ð31Þ

Therefore, we could obtain the probability terms in Eqs.(23) and (26) as,

Prðy ¼ E i P sÞ ¼ Prðw P lgA þ s� c0Þ ¼X

nPlgAþs

GðnÞW ð0Þn!

;ð32Þ

PrðEi < sÞ ¼ 1�X

nPlgAþs�c0

GðnÞW ð0Þn!

: ð33Þ

To relax the assumption of node j’s constant monitor-ing, the current stage t for the analysis is t ¼ l=q�ðtÞd e.Therefore, we have obtained the equilibrium strategyparameter p�ðtÞ and q�ðtÞ for every subgame.

So far, we have shown that for the mixed strategy pro-file, attaining a Nash Equilibrium is feasible. As a matter offact, every game has a mixed strategy Nash Equilibrium. Tofurther refine the equilibrium, we apply the One-ShotDeviation Property to derive the condition for subgameperfect Nash Equilibrium. The property states:

Definition 2. One-Shot Deviation Property (OSDP) [24]: Noplayer can increase her payoff by changing her action at thestart of any subgame in which she is the first-mover, giventhe other player’s strategies and the rest of her own strategy.

We take node j as an example and assume the repeatedgame has no discount. In our previous equilibrium analysisusing the indifference condition, we have proved thatdeviation from p�ðtÞ or q�ðtÞ will not increase the payoffs.Hence, in the following derivation, we show the deviationstrategy is related to Ei.

From Eqs. (21) and (22), we can express the expectedpayoff for node j as:

Uj ¼XT

t¼0

q�ðtÞf½ð1� pe þ cpeÞgA � cM �p�ðtÞPrðEi P sÞ

þ ð1� peÞp�ðtÞðgA � cMÞPrðEi < sÞ � ð1� p�ðtÞÞcMg� cð1� q�ðtÞÞp�ðtÞgA: ð34Þ

Suppose node j deviates at rth stage and r 6 T . The devi-ation can be either of the following two cases.

Case 1: Isolate node i while E i P s. In this case, if node iattacks and is successfully observed, it will be isolated. Theexpected payoff at this stage for node j is

UðrÞj;dev;1 ¼ fq�ðrÞfð1� peÞp�ðrÞðgA � cMÞ � pecp�ðrÞðgA þ cMÞ

� ½peð1� cÞp�ðrÞ þ ð1� p�ðrÞÞ�cMg� cð1� q�ðrÞÞp�ðrÞgAgPrðEi P sÞ: ð35Þ

Case 2: Keep node i while E i < s. Since node j only devi-ates one stage, node i will be isolated in the next stage. Theexpected payoff for node j at this stage is the same as aboveexcept for the last probability term.

UðrÞj;dev;2 ¼ fq�ðrÞfð1� peÞp�ðrÞðgA � cMÞ � pecp�ðrÞðgA þ cMÞ

� ½peð1� cÞp�ðrÞ þ ð1� p�ðrÞÞ�cMg� cð1� q�ðrÞÞp�ðrÞgAgPrðEi < sÞ: ð36Þ

In this way, the total expected payoff for node j underdeviation is

Uj;dev ¼Xr�1

t¼0

UðtÞj þ UðrÞj;dev;1 þ UðrÞj;dev;2 þXT

t¼rþ1

UðtÞj : ð37Þ

OSDP require Uj;dev 6 Uj. After algebraic manipulation,we have

cgAðq�ðtÞpe þ 1Þ þ q�ðtÞpeðccM þ 1� cÞP cð1� q�ðtÞÞgA

þ q�ðtÞ½cgAPrðEi < sÞ þ pecMPrðEi P sÞ�; ð38Þ

or

cgA½pe þ 1� PrðEi < sÞ�P pe½cMPrðEi P sÞþ c� 1� ccM �: ð39Þ

To sum up, for the equilibrium on the post-detectiongame, we state the following theorem.

Theorem 2. The post-detection game has a mixed strategyNash Equilibrium when node i attacks with p�ðtÞ and node jmonitors with q�ðtÞ. This strategy is also subgame perfect ifgAc½pe þ 1� PrðEi < sÞ�P pe½cMPrðEi P sÞ þ c� 1� ccM�.

3.3. Convergence of the coexistence equilibrium

The post-detection game described above ends whenEi < s. Since PrðEi < sÞ > 0, the game is of finite stages. Inthis subsection, we try to derive the expected length (num-ber of stages) of the game.

We focus on the random variable E i. As we mentionedearlier, E i ¼ c0 þ nFcF � nAgA. Again, we assume node j is


constantly monitoring. After one stage game, the probabil-ity of nF ¼ nF þ 1 is ð1� p�ðtÞÞð1� peÞ, and the probabilityof nA ¼ nA þ 1 is p�ðtÞð1� peÞ. Thus, we model the evolu-tion of Ei as a random process similar to a 1-dimensionalrandom walk, where the value of E i increases by cF withprobability ð1� p�ðtÞÞð1� peÞ, and decreases by gA withprobability p�ðtÞð1� peÞ. The 1� pe term comes from theunreliability of the channel. To obtain the expected lengthof the post-detection game, it is equivalent to calculatingthe expected first hitting time of the random process withthe absorbing boundary E i ¼ s.

Theorem 3. The expect length of the post-detection game is

Pg>0g

gnF

� ��PnF�1

d

ðc0�sÞ=cFþdgA=cF

þd

d

� �g�ðc0�sÞ=cFþd

gA=cF�d

nF�d

� �2g .

Proof. Please refer to Appendix A. �

4. Countermeasures for the malicious node

Let us revisit the malicious node detection game. In ourdiscussions so far, we haven shown that it is feasible todesign strategies in order to achieve the proposed PBE inthe malicious node detection game. However, there arestill some issues that must be resolved before the equilib-rium strategies can be applied and followed by practitio-ners. These issues can be categorized into two aspects.First, the PBE requires the malicious node perfectly knowthe belief held by the regular node. However, in practice,the belief information is never shared. Second, the mali-cious node may not remain passive in the detection game;instead, it can also form its belief about the current statusin the game and adjust its strategy accordingly.

It is natural that not only the regular node but also themalicious node (node i) study the game through observa-tion. In particular, node i understands that although theunreliable channel makes the observations inaccurate,the more often it attacks, the quicker node j can form a cor-rect belief about its malicious type. Thus, node i shouldtake different strategies when different beliefs are heldby node j. These strategies (e.g., the PBE strategy in Eq.(17)) are Markovian when we view the beliefs as a set ofstates. The Markovian strategy adopted by node i is onlydetermined by the current state of the belief, i.e., whenthe belief update process takes place. Therefore, if weregard the strategy taken by node i as decision making, itis similar as a Markovian Decision Process (MDP). How-ever, the belief held by node j is its private information,and by no means can node i access this information. There-fore, it is essential for node i to construct its own belief sys-tem, which is the belief on the belief node j holds towardsnode i and we call this belief developed by node i beliefabout belief.

We denote liðljðhiÞÞ as the belief node i holds aboutnode j’s belief about node i, i.e., liðljðhiÞÞ is the belief aboutljðhiÞ. It is noted that for node j, its belief ljðhiÞ is used todetermine whether the node i is malicious and accordinglyswitch to the post-detection game. Hence, the belief does

not change its strategy. For node i, its belief liðljðhiÞÞ isused to initiate the countermeasure in order to reach theequilibrium strategy. For the game we presented inTable 1(a), depending on the actions nodes i and j take,the payoff of node i;ui, can be one of the three different val-ues: �gA � cA; gA � cA or �cF . While the observations of thepayoffs are node i’s private information, given a specificobservation oi, node i can predict the actions taken by nodej, despite the prediction may be inaccurate. For example,when oi ¼ �gA � cA, node i knows for sure aj ¼ Monitor.However, when oi ¼ �cF , node i cannot tell what node ihas done. Further, based on the prediction of the actionsnode j takes, node i can update its belief liðljðhiÞÞ onhow node j’s belief ljðhiÞ has changed due to aj. Continuingwith the same examples, when oi ¼ �gA � cA; aj ¼ Monitor,so node j observes the Attack launched by node i and it willupdate ljðhiÞ according to Eq. (7). Similarly, whenoi ¼ gA � cA, node i knows that node j is idle and ljðhiÞ willnot change. However, the uncertainty comes whenoi ¼ �cF , where node i cannot accurately update its beliefabout ljðhiÞ.

To construct the belief update system for node i, weemploy the Bayes’ Theorem. At stage t of the game, basedon the observation oðtÞi , node i’s belief liðhiÞ is updated as:

lðtþ1Þi ðljðhiÞÞ ¼

lðtÞi ðljðhiÞÞPðoðtÞi jhiÞP~hi2HlðtÞi ðljð~hiÞÞPðoðtÞi j~hiÞ

; ð40Þ

where H={0,1}.The conditional probabilities of observing oi given its

type hi can be calculated as follows. To distinguish fromthe strategy profiles we used previously, we denote ~p asthe probability node i launches attacks, and ~q as the prob-ability node j monitors. Therefore, the probabilities thatarise due to the different observations and node i’s typeare:

PðoðtÞi ¼ �gA � cAjhi ¼ 1Þ ¼ ð1� peÞ~p~qþ að1� ~pÞ~q; ð41ÞPðoðtÞi ¼ �gA � cAjhi ¼ 0Þ ¼ a~q; ð42ÞPðoðtÞi ¼ gA � cAjhi ¼ 1Þ ¼ ~p½pe~qþ ð1� ~qÞ�; ð43ÞPðoðtÞi ¼ gA � cAjhi ¼ 0Þ ¼ 0; ð44ÞPðoðtÞi ¼ �cF jhi ¼ 1Þ ¼ ð1� ~pÞ½ð1� aÞ~qþ ð1� ~qÞ�; ð45ÞPðoðtÞi ¼ �cF jhi ¼ 0Þ ¼ ð1� aÞ~q: ð46Þ

With the above equations, for each of the observationsoi 2 O, where O ¼ f�gA � cA; gA � cA;�cFg;lðtþ1Þ

i ðhiÞ isupdated independently. Since for the malicious node i, itstype hi ¼ 1 is known to itself, the overall belief is henceupdated considering each of the possible observations withPðoðtÞi j1Þ representing PðoðtÞi jhi ¼ 1Þ.

lðtþ1Þi ðljðhiÞÞ ¼

Xoi2O

PðoðtÞi j1ÞlðtÞi ðljðhiÞÞPðoðtÞi jhiÞP

~hi2HlðtÞi ðljð~hiÞÞPðoðtÞi j~hiÞ: ð47Þ

Further, with the belief system of node i, the maliciousnode detection game can be solved again to obtain thesequential rationality. The derivation process is similar towhat we have presented in Eqs. 13 and 16, with theexception that lðtÞi ðljðhiÞÞ will be considered. The equilib-


rium strategy profiles that reaches sequential rationalityare,

~pðtÞ ¼ cM

lðtÞi ðljðhi ¼ 1ÞÞð1� peÞð1þ cÞgA

; ð48Þ

~qðtÞ ¼ gAc� cA þ cF

ð1� peÞð1þ cÞgA: ð49Þ

Moreover, it is easy to justify that the belief update pro-cess for node i also satisfies the Bayesian condition in Defini-tion 1. In addition, Eq. (48) suggests that node i’s strategy ispurely dependent on the current belief it holds. Thus, we canfurther refine the PBE in malicious detection game.

Theorem 4. With the belief about belief system for node i, thedynamic malicious node detection game has a Markov PerfectBayes–Nash Equilibrium (MPBNE) when the strategy profilesare ðrðtÞi ;rðtÞj Þ ¼ ð~p

ðtÞ; ~qðtÞÞ.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Attack success probability (γ)

BNE

mon

itor r

ate

(q)

pe=0.01pe=0.1pe=0.001pe=0.2

Fig. 2. Equilibrium strategy q vs. the attack success rate in malicious nodedetection game.

The equilibrium is called Markov because the strategiesassociated are Markovian based on the beliefs. It is notedthat the PBE obtained in Theorem 1 is also a MPBNE, how-ever, the strategy profile has limited applicability becausethe equilibrium profile for node i requires the knowledgeof node j’s state (belief). On the contrary, the profiles inTheorem 4 only rely on the private information availableto the nodes themselves. Our analysis of node i’s belief sys-tem can be also interpreted as an Interactive PartiallyObserved Markovian Decision Process (IPOMDP) solutionto a Partially Observed Stochastic Game (POSG) [12].

A special case for the strategy profile ri is ‘‘Always attack

when lðtÞi ðljðhi ¼ 1ÞÞ < �l and forward otherwise, for a pre-defined threshold �l 2(0, 1)’’. In this strategy, when

lðtÞi ðljðhi ¼ 1ÞÞ < �l, node i will attack with ~p ¼ 1. In thisway, node j will progressively update its belief when it mon-itors because node i is always behaving maliciously. How-ever, when the belief threshold is reached, node i willrefrain from launching attacks, and hence its payoff willdecrease. It is clear that the strategy deviates from theMPBNE because ~p does not adhere to the equilibrium. As aresult, node i will be identified quickly and it will be dor-mant for the rest of the time. While this strategy is favorableto node j and the network, from node i’s perspective, thisstrategy will limit its attacks and hence it is not desirable.

5. Simulation model and results

In this section, we study the properties of the perfectBayesian Nash equilibrium in the malicious node detectiongame and the post-detection subgame perfect Nash equi-librium through simulations. In our simulator, two playersplay the games repeatedly; the payoffs and strategy pro-files for each of the subgames are recorded to analyze theproperties of the equilibria. To set up the simulation envi-ronment, unless otherwise redefined, the default values ofthe parameters are pe ¼ 0:01; c ¼ 0:95 and a ¼ 0:01. Thegoal of the simulation is to show how certain parameterwill effect the property of the equilibrium strategies giventhe rest of the parameters fixed.

5.1. Malicious node detection game

We first present the simulation results on the maliciousnode detection game. In Fig. 2, we show how the monitor-ing probability in PBE strategy increases with the maliciousnode attack success rate. The plots infer that the equilib-rium require node j to increase its monitoring frequencyas the attack success rate increases. Also, as the channelbecomes more unreliable, node j must play Monitor morefrequently.

Fig. 3 compares the convergence of node j’s belief sys-tem when different attack gains are presented. InFig. 3(a), we show how the belief system forms a correctbelief on the type of node i when only Attack is observed.The convergence of the belief system under PBE is illus-trated in Fig. 3(b). The plots suggest that the lower theattack gain is, the quicker the belief system converges. Thisproperty can be explained as follows. A smaller attack gainrequires node i to attack more often in order to get morepayoff, and increasing the attack frequency also increasesthe risk of being successfully observed. With more obser-vations, the belief is updated more frequently and accu-rately. Belief system converges slower in Fig. 3(b) than inFig. 3(a) because in the PBE, instead of constantly monitor-ing, node j only monitors with probability q.

A more complete study on the convergence of the beliefsystem is shown in Fig. 4. Plots in Fig. 4(a) indicate the lar-ger the disguise cost cF=cA is, the less time it takes to con-verge. This is because, with a larger disguise cost, it isunprofitable for node i to disguise by forwarding packets.Instead, it will launch more attacks, thus increasing thechances to be identified. Fig. 4(b) shows a quicker con-verged belief system for a smaller detection gain becausenode j needs to monitor more often to be profitable.Figs. 4(c) and (d) relate the convergence with less errorsand uncertainties in the system. As expected, with errorsand uncertainties (i.e., low channel loss, high attack suc-cess rate and low false alarm rate), the belief system con-verges quickly.

Finally, the parameters affecting the PBE attack proba-bility p are investigated in Fig. 5. The attack gain is a veryimportant factor in determining the value of p as shown

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

number of games played (t)

μ j(t)( θ

i=1|a

i(t−1)

=Atta

ck)

gA/cA=10gA/cA=20gA/cA=50

(a) constant monitoring

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j(t)(θ

i=1)

gA/cA=10gA/cA=20gA/cA=50

(b) PBE

Fig. 3. Belief system update with different attack gains.

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j(t)(θ

i=1)

cA/cF=10cA/cF=1cA/cF=0.2

(a) disguise cost cF /cA

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j(t)(θ

i=1)

gA/cm=10gA/cm=20gA/cm=50

(b) detection gain A /cM

0 5 10 15 200.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j(t)(θ

i=1)

pe=0.1, γ =0.95

pe=0.1, γ =0.8

pe=0.01, γ =0.95

pe=0.01, γ =0.8

(c) channel unreliability and attacksuccess rate

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j(t)(θ

i=1)

α=0.01α=0.1α=0.05

(d) false alarm rate

Fig. 4. Effects of parameters on belief system update.


in Fig. 5(a). A large attack gain means more payoff gainedfrom an attack, which implies less number of attacks areneeded. Hence p should be smaller. Figs. 5(b) and (c)indicate that node i should attack less frequently under a

reliable channel as every attack is more likely to be suc-cessful. However, as suggested in Fig. 4(d), if the falsealarm rate is high for the regular node, the malicious nodecan take advantage of it and attack more often.

0 5 10 150.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12


pgA/cA=10gA/cA=5

(a) attack gain A /cA

0 5 10 150.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13


p

pe=0.01pe=0.1

(b) channel unreliability

0 5 10 150.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12


p

γ =0.8γ =0.99

(c) attack success rate

0 5 10 150.05

0.06

0.07

0.08

0.09

0.1

0.11


p

α =0.01α =0.1


Fig. 5. Effects of parameters on the PBE strategy.


5.2. Post-detection game

After the belief system of node j converges (hi P 0:99),we can safely conclude that node j has detected themalicious node. Therefore, the post-detection game starts.To show the continuity, at the beginning of the post detec-tion game, node i sticks to its PBE strategy.

Fig. 6 presents how the attack probability p�ðtÞ evolvesto the SPNE strategy from the PBE. It is clear in the plotsthat in the SPNE, node i should decrease its attack probabil-ity to avoid isolation. Fig. 6(a) shows a larger detection gainthat corresponds to a smaller attack rate; thus in the equi-librium, the payoffs for node j will not increase due to thelarge detection gain. Fig. 6(b) states that if the channel islossy, node i should attack more often. The reason behindthis claim is that the more unreliable the channel is, theless probable node j can accurately observe an attack. Plotsin Fig. 6(c) are obtained from detection gain equals to 5.This figure shows that the equilibrium is not sensitive tothe initial value and threshold of the coexistence index Ei.

The expected length of the post-detection game isshown in Fig. 7. First, the figure states that the less errors(i.e., less channel loss and more successful attack) in the

system, the longer the post-detection games can be played.Second, the length of the game grows with the attack gain.This interesting phenomenon can be explained in the fol-lowing way. The larger attack gain enables the maliciousnode to attack less while keeping its payoff high. Thus,more often, the malicious node will play as a regular nodeto avoid isolation. This will increase the time for the regu-lar and malicious nodes to coexist. This property can beused to extend the lifetime of the network.

Last but not the least, we show how the networkthroughput can benefit from coexistence in Fig. 8. Weuse the throughput of no co-existence as the baselineand define throughput gain as the ratio of added networkthroughput over the baseline when co-existence strategyis played. Similar observations can be made as the gamelength property. With a larger attack gain, the maliciousnode decreases its attack rate and does more packet for-warding as a regular node. Therefore, the malicious nodescan be utilized to increase the throughput more often asthe attack gain grows. The throughput gain property illus-trates clearly that malicious and regular nodes can coexist,and the coexistence equilibria improve the throughput ofthe network.

0 5 10 15 20 25100

200

300

400

500

600

700

800

Attack gain (gA/c

A)

expe

cted

leng

th o

f the

pos

t−de

tect

ion

gam

e

pe=0.01,γ=0.99

pe=0.1,γ=0.99

pe=0.01, γ=0.8

Fig. 7. Expected length of the post-detection game.

0 5 10 15 20 250.75

0.8

0.85

0.9

0.95

1

Attack gain (gA/c

A)

Thro

ughp

ut g

ain

pe=0.01,γ=0.99

pe=0.1,γ=0.99

pe=0.1,γ=0.8

Fig. 8. Throughput gain.

0 5 10 15 200.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11


p*

gA/c

M=20

gA/c

M=10

gA/c

M=5

(a) detection gain A /cM

0 10 20 30 40 500.05

0.051

0.052

0.053

0.054

0.055

0.056

0.057

0.058

0.059


p*

pe=0.01

pe=0.1

pe=0.15

(b) channel unreliability

0 5 10 15 200.1006

0.1007

0.1008

0.1009

0.101

0.1011

0.1012

0.1013

0.1014

0.1015

0.1016


p*

c0=g

A=τ

c0=g

A=2τ

c0=2g

A=2τ

c0=2g

A=τ

(c) coexistence index

Fig. 6. Effects of parameters on the SPNE strategy.


5.3. Characteristics of MPBNE

We study the characteristics of the Markov PerfectBayes–Nash Equilibrium. In particular, we are interestedin the properties of node i’s belief update system (i.e., belief

about belief) and how the introduction of node i’s beliefwould affect the results we obtained in Section 5.1.

In Fig. 9, we study node i’s belief system in the MPBNE.To better show the properties of node i’s belief system inthe MPBNE, we also present node j’s belief system. In

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j/i(t) (θ

i=1)

μj, c

A/c

F=10

μj*, c

A/c

F=10

μi, c

A/c

F=10

μi, c

A/c

F=1

μi, c

A/c

F=0.2

(a) disguise cost cF /cA

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j/i(t) (θ

i=1)

μj, g

A/c

m=50

μi, g

A/c

m=50

μj*, g

A/c

m=50

μi, g

A/c

m=20

μi, g

A/c

m=10

(b) detection gain A /cM

0 10 20 30 40 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j/i(t) ( θ

i=1)

μj, p=0.01, γ=0.95

μj*, p=0.01, γ=0.95

μi, p=0.01, γ=0.95

μi, p=0.1, γ=0.95

μi, p=0.1, γ=0.8

μj*, p=0.1, γ=0.8

μj*, p=0.1, γ=0.95

(c) channel unreliability and attacksuccess rate

0 5 10 15 20 25 30 35 40 45 500.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


μ j/i(t) (θ

i=1)

μj, α=0.05

μj*, α=0.05

μi, α=0.05

μi, α=0.01

μj*, α=0.01

μi, α=0.1

μj*, α=0.1


Fig. 9. Node i’s belief system update in the Markov Perfect Bayes–Nash Equilibrium.


particular, we plot li as node i’s belief system in MPBNEaccording to Eq. (47), lj as node j’s belief system in PBEas stated in Eq. (12) and l�j as node j’s belief system updatein the MPBNE as a result of node i’s actions in the beliefabout belief model. A common observation is that nodei’s belief li converges much faster than the belief lj inPBE, which means that node i holds a false belief that nodej can identify its malice quicker than node j actually could.As a result of the inaccuracy in node i’s belief, it takeslonger time for node j to form a belief on node i. This is evi-dent from the plots that show l�j converges much slowlythan it does in PBE, when node i does not employ any beliefsystem.

In addition, Fig. 9 shows some similar properties ofnode i’s belief system to what we have observed in Fig. 4.For example, Fig. 9(b) indicates a larger detection gain willforce node i’s belief system converge quicker. Fig. 9(c) and(d) infer that reliable channel, high attack success rate andaccurate detection (low false alarm rate) will also induce afast convergence of li. However, the only discrepancy iswith the disguise cost; for node i, a high disguise costmakes update of li slow, while for node j, a high disguisecost helps li converge faster. The reason lies in the

inaccuracy of node i’s belief system. From our previous dis-cussion, it is stated that when the observed payoff is �cF ,node i cannot predict what node j’s action is. Thus, aninternal error resides in node i’s belief system, and thiserror is amplified when cF is large (i.e., cF , takes a highweight in the payoff), which corresponds to a large dis-guise cost.

The properties of the MPBNE strategy are further inves-tigated in Fig. 10. Once again, similarity is found betweenFigs. 10(a) and 5(a), as well as Fig. 10(b) and Fig. 5(b)–(d). Both the MPBNE strategy attack probability, denotedas pM and the PBE strategy attack probability in Fig. 5 willincrease with smaller attack gain and attack success rate,as well as larger channel error rate and false alarm rate.Moreover, it is noted that pM is smaller than what node jbelieves it would be (denoted as pj�i in the Figs. 10(b)). Inaddition, pM is larger than the PBE strategy attack probabil-ity pPBE in the first several stage games, however, as thegames repeat, pM drops below pPBE. This interesting obser-vation implies that when node i implements the belief sys-tem, it attacks more aggressively (than without the beliefabout belief model, i.e., in PBE) in the first several games,because it believes node j is far from reaching a successful

0 5 10 150.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11


p

pPBE

, gA/c

A=10

pM

, gA/c

A=10

pM

, gA/c

A=5

pj*i

, gA/c

A=10

(a) attack gain A /cA

0 5 10 150.02

0.04

0.06

0.08

0.1

0.12

0.14


p

pPBE

, pe=0.01, α=0.01, γ=0.95

pM

, pe=0.01, α=0.01, γ=0.95

pM

, pe=0.01, α=0.1, γ=0.95

pM

, pe=0.1, α=0.1, γ=0.95

pM

, pe=0.01, α=0.1, γ=0.8

(b) channel unreliability, attack sucrate and false alarm rate

cess

Fig. 10. Effect of parameters on the Markov Perfect Bayes–Nash Equilibrium strategy.

5 10 15 20 25 30 35 40 45 500.0255

0.026

0.0265

0.027

0.0275

0.028

0.0285

0.029

number of games played

p

pSPNE

pM

pPBE

Fig. 11. Comparison of node i’s strategy profiles.


detection. As the game unfolds, node i adjusts its attackrate to prevent from detection. The difference between pM

and pPBE also explains why node j’s belief system alters inthe MPBNE as shown in Fig. 9.

5.4. Transition from detection game to post-detection games

Our discussions above are focused on how the involve-ment of node i’s belief system would make the MPBNE dif-ferent from the PBE. However, since the detection of themalicious node is not the only aim of this research, weare also motivated to find the link between the MPBNEand/or PBE in the detection game and the SPNE in the postdetection game. Fig. 11 shows the equilibrium strategy pro-files in terms of attack probability. It is clearly evident fromthe plot that although in MPBNE, node i attacks less often inPBE, in order to reach SPNE, node i still needs to furtherlower its attack probability. As a matter of fact, the post-detection game is initialized by node j when its belief aboutnode i’s malice reaches a threshold value (ljðhiÞ > 0:99 inour setting). However, this information is never revealedto node i, so that node i has no idea if the post-detectiongame has started or not. When node i is also equipped withthe belief system, it can make a prediction on when thepost-detection game starts based on its belief about nodej’s belief. For example, if node i’s belief liðljðhiÞÞ > 0:99,node i might assume that the post-detection game hasbegun and adjust its strategy profile accordingly.

Fig. 12 examines the transition process through simula-tion. In our simulation, once liðljðhiÞÞ reaches 0.99, node i’sattack probability is set to be the same as the probability inSPNE (denoted as pSPNE). Despite the change, node j stillsticks to its criteria and does not start the post-detectiongame until its belief crosses the threshold. In other words,in this setting, node i deviates from the MPBNE and playsthe SPNE strategy even when node j still plays the detec-tion game. Fig. 12(a) shows although node i deviates fromMPBNE, the attack probability node j believes node i takes(pi�j) and pPBE are very close. Furthermore, Fig. 12(b) indi-cates that when node i adheres to pSPNE, node j’s beliefupdates are slightly slower than that in the PBE. The

simulation results suggest that with node i’s belief system,malicious node detection game and the post-detectiongame can be integrated with an effective transitionprocess.

6. Discussions and conclusions

6.1. Discussions

There are many types of attacks in wireless networks.Each attack may have its unique features and require dif-ferent techniques to detect and defend. One of the goalsof this research is to present a general model and approachto analyze the detection process. In particular, in order toapply the detection game model, the monitoring node jmust be able to observe and identify attacks. The abilityto observe means the physical capability to sense theoccurrence of potential attacks. In the context of wirelessnetworks, such ability generally translates to the locationproximity falling within the communication range of mali-cious node. However, being close to the attack does notnecessarily lead to identification of the attack. In order toidentify the attack, the node j must be able to analyze

5 10 15 20 25 30 35 40 45 500.0255

0.026

0.0265

0.027

0.0275

0.028

0.0285

0.029

0.0295


p

pSPNE

, pM

pPBE

pj*i

(a) Node ’s strategy profiles.

5 10 15 20 25 30 35 40 45 500.88

0.9

0.92

0.94

0.96

0.98

1


μ j(θi=

1)

μj,M

(θi=1)

μj,new

(θi=1)

(b) Node ’s belief update.

Fig. 12. Effects of the integration of the detection and post-detection games.


the observed signal (i.e., the potential attacking action) anddeduce if it is an attack or not. An example of analyzing theobserved signal is observing data dropping and alternationin the context of data forwarding. In this regard, the pro-posed detection model cannot be applied directly to anumber of attacks that are not identifiable at the scene,e.g., Sybil attack, man-in-the-middle attack and traffichijacking attack (advertising false routes). In order to usethe model, the node j should be furnished with appropriatemechanism to verify certain attacks.

As far as the post-detection game is concerned, itrequires that the attacker has the ability to camouflageitself as a benign node. According to the game model, ifevery action the attacker takes harms the utility of its oppo-nent, the post-detection game will be terminated soon,because the attacker is not useful at all and cannot beexploited. A lot of the attack types fall into this categoryand cannot apply the post-detection game model, includingthe aforementioned Sybil attack, man-in-the-middle attackand traffic hijacking attack (advertising false routes).

6.2. Conclusions

In this paper, we apply game theory to study the coexis-tence of malicious and regular nodes in a wireless networkwith unreliable channels. We formulate a malicious nodedetection game and a post-detection game played by theregular and malicious nodes. While both games are ofimperfect information type, we show that the former gamehas a mixed strategy perfect Bayesian Nash equilibrium andprovide a solution to achieve that equilibrium. For the lattergame, a coexistence index is proposed. We also prove thatkeeping the coexistence index above a threshold, thepost-detection game has a subgame perfect Nash Equilib-rium which is also the coexistence equilibrium for mali-cious and regular nodes. We also propose a belief aboutbelief system that can be used by the malicious node to pre-dict if it has been detected. We also prove the existence of aMarkov Perfect Bayes–Nash Equilibrium when both nodesconstantly update their beliefs. The properties of the equi-librium are studied and it is shown that how the detectionof the malicious node can be delayed. Simulations are

provided to illustrate the properties of the equilibria. Inparticular, we show how the system parameters such asattack gain, attack success rate, detection gain, and channelloss affect the convergence of the games and the equilib-rium strategies. Simulation results also state that the coex-istence equilibrium helps to extend the length of the gamesand improves the throughput of the network. With the helpof the proposed belief about belief system, the maliciousnode is able to adjust its strategy in the game and finallythe detection game and post-detection game are integratedwith effective transition.

Acknowledgements

This research was sponsored by the Air Force Office ofScientific Research (AFOSR) under the Federal Grant No.FA9550-07-1-0023. A preliminary version of this workappeared in International Conference on Game Theory forNetworks (GameNets) 2009 [29]. Approved for PublicRelease; Distribution Unlimited: 88ABW-2014-2735 dated05 June 2014.

Appendix A

Table A.1.

Theorem 3. The expect length of the post-detection game is

Xg>0

g

gnF

� ��PnF�1

d

ðc0�sÞ=cFþdgA=cF

þd

d

� �g�ðc0�sÞ=cFþd

gA=cF�d

nF�d

� �2g :

Proof. Let g be a random variable representing the firsthitting time. We assume that time is divided into slotsand each slot represent a stage game. It is easy to see thatg ¼ nF þ nA. At every slot, the random process has 2 possi-ble evolution directions, i.e., nF þ 1 or nA þ 1. Therefore, forg slots, there are 2g possible realizations.

We try to calculate how many paths hit the boundaryexactly on the gth slot. The following notations are made.Let m ¼ gA

cF; s ¼ ðc0 � sÞ=cF and m; s are integers. In Fig. A.1,

Table A.1Notations used.

Node i (potential) malicious node, attackerNode j regular node, monitorhi type of node i, 1 for malicious, 0 for regularui;uj payoff of node i or j in the stage gamegA gain of successfully attack for node icA cost of any attack for node icF cost of forward (not attack) for node icM cost of monitoring for node j/ the belief of node i being malicious in stage gamec attack success rate for node ipe channel error ratea false alarm rate for node jai; aj action profile for node i or jaiðtÞ node i’s action observed by node j in stage toi node i’s observation of its payoffsljðhiÞ belief node j holds about node i’s type in the dynamic gameliðljðhiÞÞ belief node i holds about node j’s belief in the dynamic game (used to derive the MPBNE)ri;rj node i or j’s strategy profilep; q random variable of probability node i attacks or node j monitors in the malicious node detection gamep�ðtÞ; q�ðtÞ random variable of probability node i attacks or node j monitors in the post-detection game~p; ~q random variable of probability node i attacks or node j monitors in the malicious node detection game with

node i’s belief about belief modelCi coexistence indexl�j node j’s belief about node i’s type when node i uses belief about belief model

pj�i node j’s belief about node i’s attack probability when node i uses belief about belief modelpM the value of node i’s attack probability in MPBNEpPBE the value of node i’s attack probability in PBEpSPNE the value of node i’s attack probability in SPNE

Fig. A.1. Realizations of the random walk.


we interpret how to move on a grid according to a randomprocess. Consider a random walk from the left bottompoint. If nF increases, move one block right. If nA increases,move m blocks up. While each block is a squarelet with thelength cF , the width of the grid is nF cF , the height is gAnA,and diagonal line represents Ei ¼ s. Each walk consists of gmoves and must end on or beyond the upper rightmostcorner. What we are interested in is the number ofmonotonically increasing paths that wholly fall under thediagonal line, because each of those paths is a realization ofthe random process which hits the boundary for the firsttime at the gth slot.

While counting the number of realizations under thediagonal line might be difficult, we calculate the realiza-tions that do cross the line. Let the number of realizationscrossing the line be M, the number of realizations under

the line is then CnFn �M, where the combinatorial number

CnFn denotes the total number of possible realizations on the

grid. Consider a sample realization crossing the line asshown in Fig. A.1. Let d be the number of horizontal stepstaken in the path before hitting the diagonal line. To hit theline, at least sþd

m vertical steps should be taken, covering atotal height of ðdþ sÞcF . The total number of such paths isP

dCdsþdm þd. After hitting the line, the rest of the path should

consist of nF � d vertical steps and the total number ofmoves left is g� sþd

m � d. So, the total number of paths that

cross the diagonal line is M ¼PnF�1

d Cdsþdm þdCnF�d

g�sþdm �d

.

To sum up, out of 2g realizations, CnFn �

PnF�1d

Cdsþdm þdCnF�d

g�sþdm �d

realizations hit the diagonal line for the first

time at the gth move. The probability of game length being


g is thenC

nFg �PnF�1

dCd

sþdm þd

CnF�d

g�sþdm �d

2g . Finally, we can express theexpected length of the post detection game as

E½length� ¼Xg>0

gCnF

g �PnF�1

d

sþdm þd

d

� �g�sþd

m �dnF�d

� �2g : � ð1Þ

References

[1] A. Agah, S.K. Das, K. Basu, M. Asadi, Intrusion detection in sensornetworks: a non-cooperative game approach, in: Proceedings of IEEENCA, 2004, pp. 343–346.

[2] E. Altman, A. Kumar, C. Singh, R. Sundaresan, Spatial SINR GamesCombining Base Station Placement and Mobile Association, in:Proceedings of IEEE Infocom, 2009.

[3] E. Altman, K. Avrachenkov, A. Garnaev, Jamming in wirelessnetworks under uncertainty, in: Proceedings of Modeling andOptimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT),2009, pp. 1–7.

[4] B. Awerbuch, R. Curtmola, D. Holmer, C. Nita-rotaru, H. Rubens,ODSBR: an on-demand secure Byzantine resilient routing protocolfor wireless ad hoc networks, ACM Trans. Inf. Syst. Secur.ty 10 (4)(2008) (no. 6).

[5] L. Buttyán, J.P. Hubaux, Stimulating cooperation in self-organizingmobile ad-hoc networks, ACM/Kluwer Mobile Networks Appl. 8 (5)(2003) 579–592.

[6] J. Crowcroft, R. Gibbens, F. Kelly, S. Ostring, Modelling incentives forcollaboration in mobile ad hoc networks, Perform. Eval. 57 (4) (2004)427–439.

[7] D. Fudenberg, J. Tirole, Game Theory, MIT press, Cambridge, MA,1991.

[8] Z. Han, N. Marina, M. Debbah, A. Hjørungnes, Physical layer securitygame: interaction between source, eavesdropper, and friendlyjammer, EURASIP J. Wireless Commun. Networking 2009 (2009)(Article ID 452907).

[9] Y.C. Hu, A. Perrig, D. Johnson, Packet leashes: a defense againstwormhole attacks in wireless networks, in: Proceedings of IEEEINFOCOM, 2003, pp. 1976–1986.

[10] J.J. Jaramillo, R. Srikant, DARWIN: distributed and adaptivereputation mechanism for wireless ad-hoc networks, in:Proceedings of ACM MobiCom, 2007, pp. 87–97.

[11] Z. Ji, W. Yu, K.J.R. Liu, Cooperation enforcement in autonomousMANETs under noise and imperfect observation, in: Proceedings ofIEEE Secon, 2006, pp. 460–468.

[12] L.P. Kaelbling, M.L. Littman, A.R. Cassandra, Planning and acting inpartially observable stochastic domains, Artif. Intell. 101 (1998) 99–134.

[13] S. Kim, Multi-leader multi-follower Stackelberg model for cognitiveradio spectrum sharing scheme, Comput. Networks 56 (17) (2012)3682–3692.

[14] M. Kodlialam, T.V. Lakshman, Detecting network intrusion viasampling: a game theoretic approach, in: Proceedings of IEEEInfocom, 2003, pp. 1880–1889.

[15] D.M. Kreps, R. Wilson, Sequential equilibria, Econometrica 50 (4)(1982) 863–894.

[16] F. Li, J. Wu, Hit and Run: a Bayesian game between malicious andregular nodes in MANETs, in: Proceedings of IEEE Secon, 2008, pp.432–440.

[17] X.-Y. Li, Y. Wu, P. Xu, G. Chen, M. Li, Hidden information and actionsin multi-hop wireless ad hoc networks, in: Proceedings of ACMMobihoc, 2008, pp. 283–292.

[18] P. Liu, W. Zhang, M. Yu, Incentive-based modeling and inference ofattacker intent, objectives, and strategies, ACM Trans. Inf. Syst.Secur. 56 (3) (2005) 78–118.

[19] Y. Liu, C. Comaniciu, H. Man, A Bayesian game approach for intrusiondetection in wireless ad hoc networks, in: Proceedings of ACMGameNets, 2006.

[20] A.B. Mackenzie, L.A. DaSilva, Game Theory for Wireless Engineers,Morgan & Claypool Publishers, San Rafael, California, 2006.

[21] P. Michiardi, R. Molva, Analysis of coalition formation andcooperation strategies in mobile ad hoc networks, Ad HocNetworks 3 (2005) 193–219.

[22] F. Milan, J.J. Jaramillo, R. Srikant, Achieving cooperation in multihopwireless networks of selfish nodes, in: Proceedings of ACMGameNets, 2006.

[23] K.C. Nguyen, T. Alpcan, T. Bas�ar, Security games with incompleteinformation, in: Proceedings of IEEE ICC, 2009, pp. 1–6.

[24] M.J. Osborne, An Introduction to Game Theory, Oxford UniversityPress, New York, NY, 2004.

[25] S. Sengupta, M. Chatterjee, K.A. Kwiat, A game theoretic frameworkfor power control in wireless sensor networks, IEEE Trans. Comput.59 (2) (2010) 231–242.

[26] V. Srinivasan, P. Nuggehalli, C.F. Chiasserini, R.R. Rao, Cooperation inwireless ad hoc networks, in: Proceedings of IEEE Infocom, 2003, pp.807-817.

[27] B. Sun, Y. Guan, J. Chen, U.W. Pooch, Detecting black-hole attack inmobile ad hoc networks, in: IEEE Europe Personal MobileCommunications Conference, 2003, pp. 490–495.

[28] G. Theodorakopoulos, J.S. Baras, Malicious users in unstructurednetworks, in: Proceedings of IEEE Infocom, 2007, pp. 884–891.

[29] W. Wang, M. Chatterjee, K. Kwiat, Coexistence with malicious nodes:a game theoretic approach, in: Proceedings of GameNets, 2009.

[30] W. Wang, S. Eidenbez, Y. Wang, X.-Y. Li, OURS: Optimal unicastrouting system in non-cooperative wireless networks, in:Proceedings of ACM Mobicom 2006, pp. 402–413.

[31] B. Wu, J. Chen, J. Wu, M. Cardei, A Survey on Attacks andCountermeasures in Mobile Ad Hoc Networks, Wireless NetworkSecurity, Springer, 2007. pp. 103–135.

[32] P. Xu, X.-Y. Li, S. Tang, Efficient and Strategyproof spectrumallocations in multichannel wireless networks, IEEE Trans. Comput.60 (4) (2011) 580–593.

[33] J. Zhang, Q. Zhang, Stackelberg Game for utility-based cooperativecognitive radio networks, in: Proceedings of ACM Mobihoc, 2009.

[34] S. Zhong, L. Li, Y. Liu, Y. Yang, On designing incentive-compatiblerouting and forwarding protocols in wireless ad-hoc networks—anintegrated approach using game theoretical and cryptographictechniques, in: Proceedings of ACM Mobicom, 2005, pp. 117–131.

[35] Q. Zhu, C. Fung, R. Boutaba, T. Bas�ar, A game-theoretical approach toincentive design in collaborative intrusion detection networks, in:Proceedings of GameNets, 2009.

Wenjing Wang obtained his Ph.D. from theSchool of Electrical Engineering and ComputerScience, at the University of Central Florida(UCF). He holds MS degree from UCF and BSdegree from University of Science and Tech-nology of China (USTC), all in Electrical Engi-neering. His research interests are in thebroad areas of wireless communication andnetworking, with particular emphasis onnetwork privacy and security, cognitive radionetworks, and vehicular networks. He is cur-rently with Advanced and Emerging Tech-

nology Group, Blue Coat Systems in Sunnyvale, California.

Mainak Chatterjee is an Associate Professorin the department of Electrical Engineeringand Computer Science, University of CentralFlorida, Orlando. He received the BSc degreein physics (Hons.) from the University of Cal-cutta, the ME degree in electrical communi-cation engineering from the Indian Institute ofScience, Bangalore, and the PhD degree fromthe Department of Computer Science andEngineering from the University of Texas atArlington. His research interests include eco-nomic issues in wireless networks, applied

game theory, cognitive radio networks, dynamic spectrum access, andmobile video delivery. He has published over 125 conferences and journalpapers. He got the Best Paper Awards in IEEE Globecom 2008 and IEEE
PIMRC 2011. He is the recipient of the AFOSR sponsored Young Investi-gator Program (YIP) Award. He co-founded the ACM Workshop on MobileVideo (MoVid). He serves on the editorial board of Elsevier’s ComputerCommunications and Pervasive and Mobile Computing Journals. He hasserved as the TPC Co-Chair of several conferences including IEEE WoW-MoM 2011, WONS 2010, IEEE MoVid 2009, Cognitive Radio NetworksTrack of IEEE Globecom 2009 and ICCCN 2008. He also serves on theexecutive and technical program committee of several internationalconferences.
http://refhub.elsevier.com/S1389-1286(14)00239-4/h0120















































er Networks 71 (2014) 63–83 83

Kevin A. Kwiat has been with the U.S. Air

W. Wang et al. / Comput

Force Research Laboratory (AFRL) in Rome,New York for over 30 years. Currently isassigned to the Cyber Assurance Branch. Hereceived the BS in Computer Science and theBA in Mathematics from Utica College ofSyracuse University, and the MS in ComputerEngineering and the Ph.D. in Computer Engi-neering from Syracuse University. He holds 4patents. In addition to his duties with the AirForce, he is an adjunct professor of ComputerScience at the State University of New York at

Utica/Rome, an adjunct instructor of Computer Engineering at SyracuseUniversity, and a Research Associate Professor with the University atBuffalo. He is an advisor for the National Research Council. He has been
by recognized by the AFRL Information Directorate with awards for bestpaper, excellence in technology teaming, and for outstanding individ-ual basic research. His main research interest is dependable computerdesign.
Qing Li, Chief Scientist and Vice President ofAdvanced Technologies, an industry veteranwith over 20 years of experience, has spentthe past 10 years designing and developingindustry leading technologies and products atBlue Coat Systems. Qing was fully responsiblefor the IPv6 secure proxy and WAN optimi-zation technology and product lines at BlueCoat. He produced the industry’s first IPv6Secure Web Gateway product in 2009, andreceived the IPv6 Application Solution PioneerAward from the IPv6 Forum in April 2010.

Subsequently he produced the industry’s first IPv6 WAN Optimizationappliance in 2011, and in early 2012 he produced and released the

industry’s first IPv6 visibility solution. And in March 2014 he lead theeffort that produced Blue Coat’s first 10 Gbps visibility and QoS solution.He has been an active speaker at industry and academia conferences andis an active voice in the technology media around the world. In the past3 years Qing’s research has concentrated on emerging technologiesincluding advanced application classification algorithms, mobile security,SSL interception and data analytics. His innovations have transformed theBlue Coat technology and product landscape. Qing is a published author,most notably the two-volume reference series on IPv6. Volume I, IPv6Core Protocols Implementation, and Volume II, IPv6 Advanced ProtocolsImplementation, were published in October 2006 and in April 2007respectively, by Morgan Kaufmann Publishers. In 2003 Qing published theembedded systems development book titled Real-Time Concepts forEmbedded Systems, which has served as reference text in the industry aswell as in universities. Qing was also a contributing author to the first ofits kind book entitled Handbook of Networked and Embedded ControlSystems, published in June of 2005 by Springer-Verlag. Qing holds 12 USpatents with many more pending in the areas of security and networking.

A game theoretic approach to detect and co-exist with ...mainak/papers/co-exist-comnet.pdfpresents a...

Documents

Transcript of A game theoretic approach to detect and co-exist with ...mainak/papers/co-exist-comnet.pdfpresents a...