JOURNAL OF LA Jamming-resistant Multi-radio Multi...

14
0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEE Transactions on Vehicular Technology JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks Qian Wang, Member, IEEE, Kui Ren, Fellow, IEEE, Peng Ning, Member, IEEE and Shengshan Hu Abstract—For achieving optimized spectrum usage, most ex- isting opportunistic spectrum sensing and access protocols model the spectrum sensing and access problem as a partially ob- served Markov decision process (POMDP) by assuming that the information states and/or the primary users’ (PUs) traffic statistics are known a priori to the secondary users (SUs). While theoretically sound, the existing solutions may not be effective in practice due to two main concerns. First, the assumptions are not practical, as before the communication starts, PUs’ traffic statistics may not be readily available to the SUs. Secondly and more seriously, existing approaches are extremely vulnerable to malicious jamming attacks. By leveraging the same statistic information and stochastic dynamic decision making process that the SUs would follow, a cognitive attacker with sensing capability can sense and jam the channels to be accessed by SUs while not interfering PUs. To address the above concerns, we formulate the anti-jamming multi-channel access problem as a non-stochastic multi-armed bandit (NS-MAB) problem. By leveraging probabilistically- shared information between the sender and the receiver, our proposed protocol enables them to hop to the same set of channels with high probability while gaining resilience to jamming attacks without affecting PUs’ activities. We analytically show the con- vergence of the learning algorithms and derive the performance bound based on regret. We further discuss the problem of track- ing the best adaptive strategy and characterize the performance bound based on a new regret. Extensive simulation results show that the probabilistic spectrum sensing and access protocol can overcome the limitation of existing solutions and is highly resilient to various jamming attacks even with jammed ACK information. Index Terms—Anti-jamming, cognitive radio networks, multi- radio multi-channel, spectrum access. I. I NTRODUCTION C OGNITIVE radio is an emerging advanced radio technol- ogy in wireless access, with many promising benefits in- cluding dynamic spectrum sharing, robust cross-layer adaption and collaborative networking. Opportunistic spectrum access Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. Qian Wang and Shengshan Hu are with the Key State Key Lab of Software Engineering, and School of Computer Science, Wuhan University, China. E- mail: {qianwang,hushengshan}@whu.edu.cn. Kui Ren is with the Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY 14260, USA. E-mail: [email protected]. Peng Ning is with the Department of Computer Science, North Carolina State University, Raleigh, NC 27695, USA. E-mail: [email protected]. EDICS: COM-JAMM. A preliminary version [1] of this paper was pre- sented at the 19th IEEE International Conference on Network Protocols (ICNP’11), Vancouver, BC Canada, 2011. (OSA), which is at the core of cognitive radio technologies, has recently received increasing attention due to its great potential to improve the spectrum utilization efficiency and reliability [2]–[6]. The basic idea of OSA is that individ- ual secondary users (SUs) dynamically search and access the spectrum vacancy to maximize the spectrum utilization while introducing limited interference to the primary users (PUs). In existing literature, the optimality of the channel sensing and access problem has been extensively studied from the single-channel access setting to the multi-channel access setting and from perfect sensing to imperfect sensing using various optimization tools. Most of existing solutions, however, inevitably assumed that traffic statistics are pre- known to SUs. In practice, such assumption may not always hold and more seriously, solutions based on this assumption are vulnerable to malicious jamming attacks. First, PU’s traffic statistics (i.e., initial information states, transition probabilities and the order of transition probabilities) may not be readily available to the SUs prior to the start of sensing actions. Without a priori information on the traffic patterns, those opportunistic spectrum sensing and access protocols cannot work. Moreover, a cognitive jammer with sensing capabilities can choose channels to sense by leveraging the same statistic information and stochastic dynamic decision making process. Based on the sensing results, the attackers then jam the idle channels potentially used by SUs without affecting activities of PUs. This is due to the fact that the structure of those sensing policies is fixed and the channel selection procedure that SUs follow is publicly known. Therefore, a jammer can predict which channels the SUs are going to use in each timeslot and prevent the spectrum from being utilized efficiently. Traditional anti-jamming schemes, including both frequency hopping spread spectrum (FHSS) and direct-sequence spread spectrum (DSSS) [7], commonly rely on some pre-shared secrets (i.e., hopping sequences and spreading codes) to achieve jamming-resistant communication. However, they are not directly applicable to CRNs due to the fact that the pre- sharing of secrets are not applicable in a dynamic SU network since SUs may never meet each other before the start of communication. Recently, uncoordinated frequency hopping (UFH) and uncoordinated direct-sequence spread spectrum (UDSSS) and their variations were proposed to eliminate the reliance on the pre-shared secrets [8]–[13]. The major problem with UFH and UDSSS is that they are both very expensive. For UFH, it takes a long time for an SU sender to transmit a message to an SU receiver. This is not practical

Transcript of JOURNAL OF LA Jamming-resistant Multi-radio Multi...

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1

Jamming-resistant Multi-radio Multi-channelOpportunistic Spectrum Access in Cognitive Radio

NetworksQian Wang, Member, IEEE, Kui Ren, Fellow, IEEE, Peng Ning, Member, IEEE and Shengshan Hu

Abstract—For achieving optimized spectrum usage, most ex-isting opportunistic spectrum sensing and access protocols modelthe spectrum sensing and access problem as a partially ob-served Markov decision process (POMDP) by assuming thatthe information states and/or the primary users’ (PUs) trafficstatistics are known a priori to the secondary users (SUs). Whiletheoretically sound, the existing solutions may not be effective inpractice due to two main concerns. First, the assumptions arenot practical, as before the communication starts, PUs’ trafficstatistics may not be readily available to the SUs. Secondly andmore seriously, existing approaches are extremely vulnerableto malicious jamming attacks. By leveraging the same statisticinformation and stochastic dynamic decision making process thatthe SUs would follow, a cognitive attacker with sensing capabilitycan sense and jam the channels to be accessed by SUs while notinterfering PUs.

To address the above concerns, we formulate the anti-jammingmulti-channel access problem as a non-stochastic multi-armedbandit (NS-MAB) problem. By leveraging probabilistically-shared information between the sender and the receiver, ourproposed protocol enables them to hop to the same set of channelswith high probability while gaining resilience to jamming attackswithout affecting PUs’ activities. We analytically show the con-vergence of the learning algorithms and derive the performancebound based on regret. We further discuss the problem of track-ing the best adaptive strategy and characterize the performancebound based on a new regret. Extensive simulation results showthat the probabilistic spectrum sensing and access protocol canovercome the limitation of existing solutions and is highly resilientto various jamming attacks even with jammed ACK information.

Index Terms—Anti-jamming, cognitive radio networks, multi-radio multi-channel, spectrum access.

I. INTRODUCTION

COGNITIVE radio is an emerging advanced radio technol-ogy in wireless access, with many promising benefits in-

cluding dynamic spectrum sharing, robust cross-layer adaptionand collaborative networking. Opportunistic spectrum access

Copyright (c) 2015 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

Qian Wang and Shengshan Hu are with the Key State Key Lab of SoftwareEngineering, and School of Computer Science, Wuhan University, China. E-mail: qianwang,[email protected].

Kui Ren is with the Department of Computer Science and Engineering, TheState University of New York at Buffalo, Buffalo, NY 14260, USA. E-mail:[email protected].

Peng Ning is with the Department of Computer Science, North CarolinaState University, Raleigh, NC 27695, USA. E-mail: [email protected].

EDICS: COM-JAMM. A preliminary version [1] of this paper was pre-sented at the 19th IEEE International Conference on Network Protocols(ICNP’11), Vancouver, BC Canada, 2011.

(OSA), which is at the core of cognitive radio technologies,has recently received increasing attention due to its greatpotential to improve the spectrum utilization efficiency andreliability [2]–[6]. The basic idea of OSA is that individ-ual secondary users (SUs) dynamically search and accessthe spectrum vacancy to maximize the spectrum utilizationwhile introducing limited interference to the primary users(PUs). In existing literature, the optimality of the channelsensing and access problem has been extensively studiedfrom the single-channel access setting to the multi-channelaccess setting and from perfect sensing to imperfect sensingusing various optimization tools. Most of existing solutions,however, inevitably assumed that traffic statistics are pre-known to SUs. In practice, such assumption may not alwayshold and more seriously, solutions based on this assumptionare vulnerable to malicious jamming attacks. First, PU’s trafficstatistics (i.e., initial information states, transition probabilitiesand the order of transition probabilities) may not be readilyavailable to the SUs prior to the start of sensing actions.Without a priori information on the traffic patterns, thoseopportunistic spectrum sensing and access protocols cannotwork. Moreover, a cognitive jammer with sensing capabilitiescan choose channels to sense by leveraging the same statisticinformation and stochastic dynamic decision making process.Based on the sensing results, the attackers then jam the idlechannels potentially used by SUs without affecting activities ofPUs. This is due to the fact that the structure of those sensingpolicies is fixed and the channel selection procedure that SUsfollow is publicly known. Therefore, a jammer can predictwhich channels the SUs are going to use in each timeslot andprevent the spectrum from being utilized efficiently.

Traditional anti-jamming schemes, including both frequencyhopping spread spectrum (FHSS) and direct-sequence spreadspectrum (DSSS) [7], commonly rely on some pre-sharedsecrets (i.e., hopping sequences and spreading codes) toachieve jamming-resistant communication. However, they arenot directly applicable to CRNs due to the fact that the pre-sharing of secrets are not applicable in a dynamic SU networksince SUs may never meet each other before the start ofcommunication. Recently, uncoordinated frequency hopping(UFH) and uncoordinated direct-sequence spread spectrum(UDSSS) and their variations were proposed to eliminatethe reliance on the pre-shared secrets [8]–[13]. The majorproblem with UFH and UDSSS is that they are both veryexpensive. For UFH, it takes a long time for an SU senderto transmit a message to an SU receiver. This is not practical

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

2 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

for CRNs where SUs need to finish transmission quickly toyield the channel to PUs. On the other hand, UDSSS maytake less time to deliver a message, but the message decodingprocess at the receiver side will incur a large cost. Moreover,applying UDSSS directly to the anti-jamming problem inCRNs results in a problem. UDSSS is commonly used ina broadcast communication setting where the communicationchannel is publicly known and SUs are using randomly-selected spreading codes to defend against jamming. In CRNs,it will cause large interference to PUs when they are also activeon the same communication channel. In [14], [15], the problemof defending jamming attacks in cognitive radio networks wasinvestigated using game-theoretic approaches. However, theyonly explored the single-channel case and assumed that the SUreceiver can always communicate with the secondary sender(i.e., they are considered as a single player) and sensing isperfect. In [16], the spectrum sensing problem was formulatedunder time-varying channels as an adversarial bandit problem.Similar to [14], [15], the authors only considered the case ofsingle sensing channel and assumed that the SU receiver andthe SU sender were considered as a single player. In [17],anti-jamming games were investigated in CRNs. Howeverthe SU sender and the SU receiver are still considered as asingle player, i.e., they are assumed to stay coordinated byinitialization with the same random seed.

To address the above limitations, in this paper we propose adecentralized and robust anti-jamming multi-channel spectrumaccess protocol for ad hoc CRNs, which can accommodateboth the environment dynamics and the strategic behaviors ofthe jammers. Compared to existing UFH protocols, in a CRNsetting, our protocol can adaptively choose the most likely“free” channels with high probability instead of randomlysensing and accessing channels. That is, the transceiverswill selectively sense channels with high probability of non-occupancy by the jammer and the PUs, based on the historyinformation of sensing and access. However, if the SU senderdetects the presence of a PU on a sensed channel, it will remainsilent and does not access that channel in the current timeslot.The sensing results together with the immediately followingaccess results will be feedbacked to the sensing actions in thefuture timeslots. Therefore, communication efficiency can besignificantly improved without affecting PU’s activities. Wehave shown the robustness of our schemes even if feedbackinformation are randomly jammed by the adversary. Comparedto the preliminary version [1], in this paper we have madesubstantial improvements including both the theoretical perfor-mance analysis and experiments. New experimental results andfull proofs of performance bounds are provided. In addition,we also discuss the problem of tracking the best compound(adaptive) strategy and characterize the bounds on the newregret. The main contributions of this paper are:

We propose an online adaptive multi-channel jamming-resistant spectrum access protocol for ad hoc CRNs by for-mulating the anti-jamming problem as a non-stochastic MABproblem. We analytically show the convergence of the learn-ing algorithms as T goes to infinity, i.e., the time-averagedperformance difference between the SU sender and the SUreceiver’s optimal strategies is no more than 20k√

√n lnn,

where k = maxks2, kr, kr and ks are the number ofchannels the receiver and the sender can access simultaneouslyin each timeslot, respectively, ε is the probability of sensingand n is the total number of channels. The proposed algorithmcan be efficiently implemented in polynomial time.

We further consider the problem of tracking the best adap-tive strategy and present an extension of our construction foranti-jamming spectrum access. We analyze the performancebound on the new regret defined based on the adaptive optimalstrategy. We analytically show the time-averaged performancedifference between the SU sender and the SU receiver’soptimal strategies is upper bounded by O(12k

√n lnn), where

k = maxks, kr. Since ks, kr and n are pre-set systemparameters, the performance bound is constant as T goesto infinity. The extended algorithm for tracking the bestcompound strategy can also be implemented in polynomialtime.

We present a thorough quantitative performance characteri-zation of the proposed scheme. The performance is evaluatedby analyzing a practical metric–the expected time for messagedelivery with high probability. We also perform an extensivesimulation study to validate our theoretical results. It is shownthat the proposed algorithm is efficient and highly effectiveagainst various jamming attacks even with jammed ACKinformation.

II. MODELS AND ASSUMPTIONS

A. System Model and Assumptions

In a typical cognitive radio network (CRN), there exist aprimary user (PU) network and a secondary user (SU) network.To facilitate dynamic spectrum access, the spectrum is dividedinto n channels, each of which evolves independently andhas the same total bandwidth. Different from most existingworks, in our model we assume the channel statistics are notnecessarily the same for n channels. In the system, PUs occupyand vacate the spectrum following a discrete-time Markovprocess, where channel i transits from busy state (“0”) to idlestate (“1”) with probability pi01 and stays in idle state (“1”)with probability pi11. In the SU network, SUs seek spectrumopportunities among n channels. Specifically, SUs reserve asensing interval in each timeslot to detect the presence ofa PU. Based on the outcomes of sensing, the SU sendersdecide whether to take the opportunity to access the currentlyidle channels or not, and vacate the spectrum whenever PUsreclaim them. At the end of a timeslot, the SU receiver sendsa short acknowledgement to the SU sender on the channelwhere a packet transmission is successful.

It is worth noting that we investigate the problem of robustspectrum sensing and access in an ad hoc SU network withouta central controller for coordinating the SUs. Therefore, eachautonomous SU aims to maximize its own performance bysensing and accessing the spectrum independently [2]. Differ-ent from most existing opportunistic spectrum access (OSA)protocols [2]–[6] where traffic statistics are known a priori, weconsider a more general and practical scenario where trafficstatistics are not available to SUs before the start of commu-nication. For ease of exposition, in the following discussion

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 3

we term one pair of communicating SUs as the sender and thereceiver. In a multi-radio setting, the sender and the receiverare equipped with ks < n and kr < n radios, respectively,enabling them to access multiple channels simultaneously ineach timeslot. Since SUs must not interference with activePUs in each timeslot, a SU sender thus senses ks < nand accesses only ka ≤ ks channels sequentially. At thereceiver side, various efficient message verification schemescan be used for packet verification to defend against pollutionattacks, and fragments that have passed integrity checks arereassembled to reconstruct the original message. To relax thestrict synchronization between the sender and the receiver, wecan let the hopping frequency of the receiver be much slowerthan the hopping frequency of the sender, so packet lossescaused by the lack of synchronization between the sender andthe receiver can be neglected. Note that in our model, we donot consider node authentication and message privacy, whichare orthogonal to the security problems this work addresses.

B. Threat Model and Assumptions

In CRNs, PUs such as TV users are licensed users (i.e.,being protected by law) and usually well physically protected.From the jammer’s perspective, it is very difficult to launcheffecitve attacks, and there will be heavy penalties on theattackers if being detected [18]. Therefore, we assume thejammer does not have high incentive to attack PUs and riskitself in jamming the licensed bands when PUs are active.Instead, the jammer’s target is on the secondary users (SUs),who are unlicensed users and only permitted to access thespectrum when not interfering with PUs. The SUs’ access tothe spectrum is opportunistic in nature without clear legalprotection. Besides, SU networks are usually dynamic adhoc networks formed by randomly deployed self-organizingwireless devices, where it is difficult to implement effectivesecurity countermeasures. A stealthy attacker can choose tojam any targeted SUs and prevent the targets from using thespectrum for communication. Note that such attacks againstSUs by the jammer are stealthy and do not affect PUs’communications. That is, the attacker utilizes the same sensinginterval to detect (sense) the activity of the PUs and only jamthe idle channels (which are potentially used by SUs) basedon the sensing outcomes.

We assume the jammer has similar radio capabilities as SUs.That is, in each timeslot, the jammer is capable of sensingand jamming kj (kj < n) channels simultaneously. Assumingthe jammer knows the whole spectrum access protocol, hisobjective then is to prevent the spectrum from being utilizedefficiently by the legitimate SUs. Specifically, we consider fourtypes of jammers with different jamming strategies:Static jammer. A static jammer is an oblivious attacker, whoselects the same set of channels in each timeslot to sense.Based on the sensing results he emits jamming signals onthe sensed idle channels. Note that the jamming action ismade independent of the sensing history the jammer may haveobserved in the past.Random jammer. A random jammer is also an obliviousattacker, who selects a set of channels uniformly at random

from the public set of n channels in each timeslot to sense.Based on the sensing results he emits jamming signals on thesensed idle channels. Similar to the static jammer, the jammingaction is made independent of the sensing history he may haveobserved in the past.Myopic jammer. An myopic jammer is a powerful cognitiveattacker running the myopic algorithm, which is a well-knownOSA strategy and can achieve suboptimal performance (Theprinciple of myopic policy will be shown later in Section III).Initially, the jammer selects kj channels to sense in eachtimeslot until all the n channels have been sufficiently sensed.Then he can make an accurate estimation of the traffic statisticsusing the sensing results, based on which he utilizes myopicpolicy to predict PUs’ channel occupancy pattern and emitsjamming signals on the most likely idle channels. Obviously,in each timeslot the jamming strategy is selected based on thesensing history and pre-known channel occupancy statistics.Adaptive jammer. An adaptive jammer is also a cognitiveattacker running an multi-armed bandit (MAB) algorithm,which is a online learning protocol (The MAB based learningprotocol will be shown in section IV). The jammer selects kjchannels to sense in each timeslot and jams the sensed idlechannels based on his sensing history and past observations.

Note that, in the power adaptive jamming attack model weassume the jammer can adjust his sensing and jamming strate-gies by leveraging the outcomes of jamming. In other words,we assume that the jammer knows whether he succeeds injamming the transmitting channels (where both the sender andthe receiver reside on in a timeslot) for all the past timeslots.We emphasize that it is almost impossible to implement sucha powerful jammer in practice. However, for the purpose ofperformance comparison we show that SUs equipped with ouranti-jamming spectrum sensing and access protocol are stillresilient to such adaptive jamming attacks. Table I summarizesthe notations used throughout in the paper.

III. VULNERABILITY ANALYSIS OF MULTI-CHANNELOPPORTUNISTIC SPECTRUM ACCESS PROTOCOLS

In this section, we analyze the weakness of the existingmulti-channel opportunistic spectrum access protocols underjamming attacks. For ease of illustration, in the following weconsider a SU network with a single sender-receiver pair, butthe same ideas can also be applied and extended to a multi-user setting.

A. Opportunistic spectrum access with known channel trafficstatistics

In the context of cognitive radio for opportunistic spectrumaccess, a single-channel access problem within the frameworkof POMDP was investigated, and myopic policies under bothperfect and imperfect sensing cases have been investigatedin [2]–[6]. The main idea of these schemes is that the senderchooses a subset of n channels to sense based on its pastobservations and gains a fixed reward if a channel is sensedidle. The objective of the sender then is to maximize therewards that it can gain over a (potentially infinite) numberof timeslots. It has been shown that this optimization problem

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

4 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

TABLE I: Summary of Notations

Symbol Descriptionkr/ks the number of channels the receiver/sender can access

simultaneously in each time slotSs/Sr the sender/receiver’s sensing and access strategy spaceSj the jamming strategy space for the jammerGi,t the cumulative rewards up to timeslot t of strategy i

Gt the total rewards over all chosen strategies up to timeslott

Nx the total number of strategies at the sender or the receiverside, where x ∈ s, r

wxf,t the channel f ’s weight for the sender or the receiver at

timeslot t, where x ∈ s, rwx

i,t the strategy i’s weight for the sender or the receiver attimeslot t, where x ∈ s, r

gxf,t the reward received on channel f during timeslot t at thesender or the receiver side, where x ∈ s, r

gxi,t the total rewards received by using strategy i during times-lot t at the sender or the receiver side, where x ∈ s, r

Gst the accumulated rewards at the sender side up to timeslot

t

Grt the accumulated rewards at the receiver side up to timeslot

tGmax

T (s) the rewards of the static optimal strategies for the senderup to T timeslots

GmaxT (r) the rewards of the static optimal strategies for the receiver

up to T timeslotsS(f, k) the strategy set in which each strategy chooses k channels

from channels f, f + 1, . . . , n

S(f, k) the strategy set in which each strategy chooses f channelsfrom channel 1, 2, . . . , f

can be solved by a stochastic dynamic programming (SDP)approach to obtain optimal performance. To reduce the com-putation complexity of SDP caused by the expensive backwardinduction procedure, many researches has been focused onindex policies and myopic policy that maximizes the condi-tional expected reward acquired at t was first proposed andexplored in [2], [3]. By concentrating only on the present andcompletely ignores the future, myopic approaches achievessuboptimal performance in general. In myopic policy, it hasalso been shown that a sufficient statistic or the informationstate of the system for the optimal decision making is thebelief vector Ω(t) = [ω1(t), ω2(t), . . . , ωn(t)], where ωi(t) isthe conditional probability that channel i is idle in timeslot t.In timeslot t, a sensing action a(t) denotes the ks channelsto be sensed. Let Ki(t) ∈ 0, 1 denote whether an ACKon channel i is received or not in timeslot t. Given a(t) andKi(t), the belief state in timeslot t+ 1 is given by [2]

ωi(t+ 1) =

pi11, i ∈ a(t),Ki(t) = 1pi01, i ∈ a(t),Ki(t) = 0ωi(t)p

i11 + (1− ωi(t))p

i01, i /∈ a(t).

Assume all channels have the same transmission rateBi, the myopic policy under Ω is defined as a(t) =argmaxa(t)

∑i∈a(t) ωi(t)Bi. Recently, the dynamic multi-

channel access problem was studied under a special class ofrestless multi-armed bandit problems (RMBP) in [6], basedon which an index policy called Whittle’s index policy hasalso been applied in the dynamic spectrum access. Similarto myopic policy, the proposed Whittle’s index policy enablesthe SU sender to choose those channels whose current states

have the largest indices to sense and access. However, a strictconstraint which requires the activating of exact m = ksarms/channels at each time step may cause the optimality tobe lost, but even so the Whittle’s index policy has the nearoptimal performance. Another interesting observation is thatthe Whittle’s index policy has the same structure as the myopicpolicy when channels are stochastically identical.

B. Analysis of OSA Under Malicious Jamming Attacks

Although theoretically sound, almost all the existing OSAprotocols (including index based policies) only work well innon-malicious environments. Among others, one key assump-tion made by the existing solutions is that the traffic statisticsshould be known a priori. Take index based policies forexample, it is required that the initial belief vectors Ω(0) andthe order of state transition probabilities (i.e., pi01 is greater orless than pi11) on all channels be pre-known to SUs. In practice,however, these statistics may not be readily available [5].More seriously, due to the deterministic nature of the chan-nel/frequency selection procedure, those OSA protocols arevulnerable to malicious jamming attacks. That is, an intelligentjammer, who knows the traffic statistics of all channels orlearns them through sensing and estimation by observing allchannels, can leverage such information to predict which chan-nel to be used. Since the index policies always choose the firstks channels with the largest indices for sensing and accessing,the jammer can use the same dynamic decision process toperform effective jamming attacks. In the worst case, thecommunication can be completely jammed as the jammermaintains the same updates information for channel “index”as SUs in each timeslot. From a theoretical perspective, mostof OSA protocols are formulated as optimization problemswith deterministic solutions. For example, the index policiesare established based on the stochastic model of the channelstatistics. Consider the Whittle’s index policy developed underthe restless multi-armed bandit problems (RMBP) [19]. Sincethe evolvement of information state (belief vector) is known,the players ( the sender and the receiver) can compute ahead oftime exactly what payoffs (rewards) will be received from eacharm (channel). Based on the above analysis, it is necessaryand important to develop probabilistic OSA protocols that areresistant to various jamming attacks and can accommodate thespecial characteristics of CRNs.

To enhance the robustness of OSA, the problem of defend-ing jamming attacks in cognitive radio networks was inves-tigated using game-theoretic approaches [14], [15]. However,they only explored the single-channel case and assumed thatthe SU receiver can always communicate with the secondarysender (i.e., they are considered as a single player) and sensingis perfect. In [16], the spectrum sensing problem was for-mulated under time-varying channels as an adversarial banditproblem. Similar to [14], [15], the authors only consideredthe case of single sensing channel and assumed that the SUreceiver and the SU sender were considered as a single player.In this paper, we consider a more practical model and makea step towards the development of robust multi-radio multi-channel OSA protocols for CRNs.

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 5

IV. JAMMING-RESISTANT MULTI-RADIO MULTI-CHANNELOPPORTUNISTIC SPECTRUM ACCESS

A. Scheme Overview

Based on the above analysis, we can see that when anattacker launches malicious jamming attacks to disrupt legit-imate communications in SU networks, the channel statistics(which are determined by activities of PUs when there existsno jamming) cannot correctly reflect the true state (idle orbusy) of the channel. That is, the rewards (i.e., indicationsof successful packet receptions) associated with each channelcannot be modeled by a stationary distribution or no statisticalassumptions can be made about the transition of informationstate and the generation of rewards. This is due to the dynamicbehaviors of both PUs and jammers, i.e., PUs occasionallyoccupy and free the channels and a jammer may adjusthis sensing and jamming strategy to maximize the effect ofjamming. These effects will make the generation of rewardsarbitrarily change on channels in each timeslot. Motivated bythis observation, it is necessary to keep an exploration ofthe best possible set of channels for transmission to adaptthe dynamics of jammers and PUs. Meanwhile, it is alsonecessary to exploit the previously-chosen favorable set ofchannels as too much exploration will potentially underutilizethem. Obviously, the proposed anti-jamming problem is thusthe one balancing between exploitation and exploration, ratherthan only optimizations.

B. Problem Formulation: An Multi-player Game

In this paper, we consider a jamming and anti-jamminggame among a SU sender, a SU receiver and a jammer underdynamic PU behaviors. To fully utilize the vacant spectrum,the objective of the SU sender-receiver is to choose thesensing, access/receiving actions in each timeslot to maximizethe total expected rewards (i.e., successfully received packets)over T timeslots. On the contrary, the jammer’s objective is tominimize the total expected rewards to disrupt the legitimateSU communications. Since channel states (idle or busy) are notdirectly observable before channel sensing, the sender choosesks channels to sense during the sensing interval, where thesensing action is made based on all the past decisions andobservations. Due to PUs’ dynamic actions on a channel, thesender only chooses ka (ka ≤ ks) idle channels to access.At the receiver side, the receiver independently chooses krchannels to receive, where the selection is also made basedon all the past decisions and observations. During the sametimeslot, the jammer chooses kj channels to sense and jam thesensed idle channels based on his chosen jamming strategy.

Note that, although we consider a single SU pair in ouranti-jamming problem, the proposed scheme can be directlyapplied to a SU communication network with multiple SUsender-receiver pairs. This is because each SU, which isautonomous in an ad hoc SU network, can utilize our proposedscheme to maximize its own performance by taking interfer-ence/collistions caused by other SU pairs as jamming signals.It is easy to see that when the number of other SU pairs inthe neighborhood of the receiver which use the same channels

is much less than n, the impact of unintentional interferencecan be negligible.

We next formalize the jamming and anti-jamming gameusing mathematical notation. We first number the chan-nels/frequencies from 1 to n and construct the vector space0, 1n. Obviously, the sender’s sensing and access strategyspace and the receiver’s receiving strategy space are denotedas Ss ⊆ 0, 1n of size

(nks

)and Sr ⊆ 0, 1n of size

(nkr

),

respectively. In a SU’s strategy/vector, the value of the f -th(f ∈ 1, . . . , n) entry of a vector is 1 if the f -th channelis chosen for sending and access or receiving; 0 otherwise.Accordingly, the jamming strategy space for the jammer isdenoted as Sj ⊆ 0, 1n of size

(nkj

). Different from a SU’s

strategy, the value 0 in the f -th entry denotes that the jammerchooses the f -th channel to sense and jam and the valueis 1 otherwise. Different from the above three parties, PUs’activities on the channels are independent of other parties’sactions, and a PU’s action/strategy can also be denoted as avector sp ∈ 0, 1n, where the value 1 denotes the channel isidle and the value 0 denotes the channel is occupied.

During each timeslot, the sender, the receiver and the jam-mer choose their own respective strategies ss ∈ Ss, sr ∈ Sr

and sj ∈ Sj , respectively. In each timeslot, assume the PU’sstrategy or activity is sp. From the receiver’s perspective,ss ∧ sp ∧ sj can be considered as a joint decision made bythe sender, the PU and the jammer, where ∧ denotes bitwise“AND” operation. We say that in timeslot t the sender, areward “gf,t = 1” is introduced for channel f if the f -thentry of ss ∧ sp ∧ sj is 1; otherwise no reward is received,i.e., “gf,t = 0”. On the receiver side, the reception of areward depends on the state of the channel f the receiverhas chosen for packet reception. In addition, we use erasurecoding combined with short signatures to verify/authenticatethe received packets, reassemble message and defend againstpollution-based DoS attacks [9]. Note that, we do not differ-entiate between packet jamming and packet collisions as theyboth cause interference to the legitimate packets, and packetcoding can be used to recover bit errors in received packets.

After the receiver chooses a strategy sr, a reward on channelf is revealed to the receiver if and only if f is chosen as areceiving channel. There are four possible cases:Case 1: No packet is received on f . This is because f hasnot been selected by the sender for transmission. In this case,reward 0 is obtained.Case 2: A packet is received on f . If the received packet failsto pass the verification, reward 0 is obtained.Case 3: A packet is received on f . Jammed or collided packetsthat cannot be recovered will be discarded, resulting in 0reward.Case 4: A packet is received on f . If no jamming is detectedor corrupted packets due to jamming can be recovered viapacket coding, a reward 1 is obtained.

Similarly, after the sender chooses a strategy ss, a rewardon channel f is revealed to the sender if and only if f ischosen as a sending channel. A reward 1 is obtained if anACK is received on f , otherwise the reward is 0. Note thatfor packet reception, real experiments have shown in [20] thatby looking at the received signal strength during bit reception,

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

6 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

accurate differentiation of packet errors caused by jammingand those caused by weak links can be realized. Since ourwork is focused on the defense against jamming attacks, weconsider packet errors due to jamming.

In this paper, we formally formulate the jamming-resistantspectrum sensing and access problem as a non-stochasticMAB problem (NS-MAB) [21]–[23], where each channel canbe considered as an arm of an multi-arm bandit. Due tothe jamming effect and dynamics behaviors of PUs, eachchannel f is then associated with an arbitrary and unknownsequence of rewards, which can be obtained on a channel ifthe sender and the receiver choose f for sending and receivingsimultaneously.

For ease of analysis and presentation, we first define someimportant notation. In each timeslot t ∈ 1, . . . , T, the sender(receiver) independently selects a strategy It from his strategysets. We write f ∈ i if channel f is chosen in strategy i,i.e., the value of the f th entry of i is 1. Note that a strategyis a vector of dimension n, It denotes a particular strategychosen for timeslot t, and i denotes a general strategy in thestrategy set. The total rewards of a strategy i during timeslot tis gi,t =

∑f∈i gf,t, and the cumulative rewards up to timeslot

t of each strategy i is Gi,t =∑t

s=1 gi,s =∑

f∈i

∑ts=1 gf,s.

The total rewards over all chosen strategies up to timeslott is thus Gt =

∑ts=1 gIs,s =

∑ts=1

∑f∈Is

gf,s, where Isis chosen randomly according to certain distribution over thestrategy set. To quantify the performance, we use the followingmetric called regret:

maxi∈Sx Gi,T − GxT , x ∈ s, r,

where the superscript is used to differentiate the sender fromthe receiver, and the maximum is taken over all strategiesavailable to the sender or the receiver. The regret is definedas the accumulated rewards (or successfully received packets)difference over T timeslots between the proposed strategy andthe optimal static one. The static optimal strategy denotes thebest fixed solution (i.e., the best set of channels that if keepingto use them largest rewards will be generated.) for messagereception in the presence of jamming. Note that the senderand the receiver will adaptively choose their own strategies ineach timeslot based on the updated probability distributionsover the strategy set. As for the sender (receiver), the updatesof the probability distribution are determined by the outcomesof joint actions of PU, the jammer and the receiver (sender).Thus, the accumulated rewards of the sender (receiver) alongthe time depend on the actions of the other three parties ineach timeslot.

C. Our Construction

In this subsection, we present our jamming-resistant spec-trum sensing and access protocol. Our algorithm is a prob-abilistic one that can accommodate the changes of channelstatus caused by a (potentially) malicious jammer. The dy-namic property of the proposed solution lies in the trade-offbetween exploration action and exploitation action, which willboth affect the system performance.

Algorithm 1 A Jamming-resistant Multi-radio Multi-channel Spec-trum Sensing and Access Protocol.Input: n, kr, ks, T , ε ∈ (0, 1], δ ∈ (0, 1), βs, βr ∈ (0, 1], γs, γr ∈(0, 1/2], ηs, ηr > 0.Initialization: Initialize all system parameters, setting the channelweight wx

f,0 = 1 ∀f ∈ [1, n], the strategy weight wxi,0 = 1 ∀i ∈

[1, Nx], and the total strategy weight W x0 = Nx =

(nks

), where

x = s, t.For timeslot t = 1, 2, . . . , T

1: Select a strategy Ixt according to pxi,t (∀i ∈ [1, Nx]), with pxi,tcomputed following Eq. (5).

2: Compute channel selection probability qxf,t (∀f ∈ [1, n]) asqxf,t =

∑i:f∈i p

xi,t.

3: Transmit a packet if and only if the channel is sensed to be idle.4: Perform verification and jamming detection once a packet is

received on channel f . Transmit back an acknowledgement on fif the received packet passes the check.

5: Compute rewards gxf,t (∀f ∈ Ixt ) and virtual rewards gx′

f,t withthe revealed gf,t (∀f ∈ [1, n]), following Eqs. (3) and (4).

6: Update channel weight wxf,t and strategy weight wx

i,t followingEqs. (1) and (2), respectively. Update the total strategy weight asW x

t =∑Nx

i=1 wxi,t.

End

As shown in Algorithm 1, the algorithm comprises twosubalgorithms: As at the sender side and Ar at the receiverside. In Algorithm 1, the system parameters β, γ and η aredetermined by the regret bound, and the derivation of themwill be shown in proof of Theorem 1.

Let Nx (x ∈ s, r) denote the total number of strategies.Each strategy is assigned a strategy weight, and each channelis assigned a channel weight. During each timeslot, the channelweight is dynamically adjusted based on the virtual channelrewards revealed to the sender and the receiver:

wxf,t = wx

f,t−1eηxgx′

f,t , x ∈ s, r. (1)

We use exponentially weighted forecasters which follow theExp3 (“Exponential-weight algorithm for Exploration andExploitation”) first proposed in [21]. In a multi-armed banditsetting, at time t, an expert is chosen with probability thatincreases with the past performance of the expert. In practice,the most popular choice of such kind of function is exponentialfunction. It is easy to see that the increase of the virtualchannel rewards leads to larger channel weights.

A strategy indicates the choices of channels for use, so wedefine the weight of a strategy as the product of the weightsof all channels:

wxi,t = Πf∈iw

xf,t = wx

i,t−1eηsgx′

i,t , x ∈ s, r. (2)

where gx′

i,t =∑

f∈i gx′

f,t.Here, the reason to estimate reward for each channel first

instead of estimating rewards for each strategy directly is thatthe reward of each channel can provide useful informationabout the other unchosen strategies containing the same chan-nels. The parameter β is used to control the bias in estimatingthe channel reward gs

f,t and gr′

f,t, which are computed as:

gs′

f,t =

gsf,t+βs

εqsf,tRt if f ∈ Ist ,

βs

εqsf,tRt otherwise,

(3)

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 7

gr′

f,t =

grf,t+βr

qrf,tif f ∈ Irt ,

βr

qrf,totherwise,

(4)

where qxf,t (x ∈ s, t) denotes the channel f ’s probabilitydistribution, and Rt is a random variable under Bernoullidistribution satisfying PRt = 1 = ε. The parameter β isa fixed value that will be determined before the execution ofthe protocol (see the proof of Theorem 1). Based on the truerewards revealed to the sender and the receiver, we define thevirtual rewards to increase weight of “good” channels, i.e.,increase the access probabilities of “good” channels whichhave been less often sensed.

In Algorithm 1, at the beginning of each timeslot, thetransceiver chooses a strategy based on the probability dis-tribution pxi,t (x ∈ s, t) as:

pxi,t =

(1− γx)wx

i,t−1

Wxt−1

+ γx

|Cx| i ∈ Cx

(1− γx)wx

i,t−1

Wxt−1

otherwise,(5)

where x ∈ s, r. The parameter γx is used to balancebetween

wxi,t−1

Wxt−1

and 1|C|x .

In the calculation of the strategy probability distribution,the first part is a distribution which assigns to each actiona probability mass exponential in the estimated cumulativereward for that action, and the second part is the uniformdistribution. If not mixed with the uniform distribution, thealgorithm might have large deviations with high probability,i.e., from time to time it may concentrate on the wrong strategyfor too long and then incur a large regret. So the mixing isdone to make sure that the algorithm tries out all strategies andgets good estimates of the gains for each channel [21]. Noteγx is a fixed value that will be determined before the executionof the protocol (see the proof of Theorem 1). The coveringstrategy set Cx is defined to ensure that each channel/frequencyis sampled sufficiently often. The covering set has the propertythat for each channel f , there is a strategy i in the coveringset such that f ∈ i. Based on the definition of strategy, eachstrategy includes kx (x ∈ s, r) “active” channels. Thus, wecan construct one typical and simple covering set with size|Cx| = ⌈ n

ks⌉ (x ∈ s, r).

Discussions. In practice, the transceiver (i.e., the senderand the receiver) may not have the same sensing outcomesdue to sensing errors. So, in our design we let the senderperform sensing in each timeslot, and the receiver only selectschannels to listen on. Note that the operating point of thespectrum sensor is set as the probability of the collision withPUs [2], which includes two types of sensing errors: falsealarm probability and miss detection probability. Without lossof generality, we use τ to denote the sensing error probabilityin the following analysis, where τ = Pfalse alarm(1 −PPU active) + Pmiss detectionPPU active.

To eliminate the information asymmetry between the senderand the receiver, the sender uses the acknowledge informationto update the probability distribution over strategy set. Thus,the accumulated rewards for the sender and the receiver areequivalent, i.e., Gs

t = Grt (Note that, we make this assumption

to obtain the upperbound performance of the proposed anti-jamming scheme. In Section VI, we evaluate the case where

ACKs are randomly jammed by the attacker, showing thestrong resilience of our proposed scheme). In addition, becausethe sender and the receiver are not perfectly synchronized, it isnecessary and important to evaluate how close the sender’s andthe receiver’s strategies are as time goes. Since the closer thetransceivers’ chosen strategies, the more rewards generated.This is equivalent to saying that how well the learning basedalgorithm proceeds to maximize the system throughput.

The spectrum sensing usually consumes more energy com-pared to reception, i.e., it is costly to obtain the sensingresults [24]. In certain application scenarios, legitimate nodesmay only have a limited number of sensing times due to energyconstraint. Let ε denote the proportion of timeslots whensensing is performed. For T timeslots, the number of sensingtimes is approximately Tε. In Algorithm 1, we introduce aBernoulli random variable with PRt = 1 = ε at the senderside. Thus, the sender senses the channel with probability ε.There are two possible cases when the sender does not performsensing in a timeslot. In the first case, the sender remains silentin this timeslot without transmitting any packets. Due to therandom sensing and access strategy, it is hard for the adversaryto predict the behaviors of the sender. However, as no packetsare transmitted, the transmission delay may be increased. Inthe second case, the sender still accesses the most possibly freechannels based on the previous probability distribution. In thiscase, there is a tradeoff between the collision probability withPUs and the number of sensing times.

D. Theoretical Analysis

Definition 1: An algorithm A is α-static (or α-adaptive)approximation of the static (or adaptive) optimal solution ifand only if it can transmit the message successfully in timeαT with high probability (w.h.p) 1 − 1

lϵ when the static (oradaptive) optimal solution can transmit the same messagesuccessfully with the same probability 1− 1

lϵ in time T , whereϵ > 0 is a constant and l is the number of packets in themessage.

Definition 2: The regret of an algorithm A is the differencebetween the accumulated rewards using the static optimalstrategy and that using A over T timeslots, i.e., Gmax

T − GAT ,

where GmaxT = maxi∈S Gi,T = maxi∈S

∑f∈i

∑Ts=1 gf,s and

GAT =

∑Ts=1 gIs,s =

∑Ts=1

∑f∈Is

gf,s.The first definition is used to characterize the approximation

ratio between the proposed algorithm and the static andadaptive optimal solutions. The second definition is used tocharacterize the throughput performance between the proposedalgorithm and the optimal solution. In the following analysis,we will write Gmax instead of Gmax

T whenever the valueof T is clear from the context. In addition, we will writeGmax

T (s) and GmaxT (r) to denote the rewards of the static

optimal strategies for the sender and the receiver, respectively.Due to the probabilistic strategy selection, the sender and

the receiver are not perfectly synchronized in each timeslot.However, we show that the sender’s sensing strategy and thereceiver’s receiving strategy will converge to their own optimalstrategies. The following theorem measures how close theiroptimal strategies are as T → ∞.

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

8 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

Theorem 1: The normalized reward distance 1T |G

maxT (s)−

GmaxT (r)| converges to 0 at rate O(1/

√T ) as T → ∞.

Proof Due to the page limitation, please refer to our technicalreport [25] for the detailed proof.

Theorem 2: Algorithm 1 has time complexity O(kxnT )and space complexity O(kxn), where x ∈ s, r.

Proof See Appendix A.

In practice, the transmitted messages, which may have muchlarger size than the length of timeslots, have to be splitinto small fragments to fit the timeslots. As shown above,the proposed jamming-resistant spectrum sensing and accessprotocol is probabilistic in nature, so we cannot guaranteethe transmitted message is delivered in certain number oftimeslots with probability one. So, to evaluate the transmissionefficiency, we consider the expected time for a messagedelivery with high probability, which implies the probabilitygoes to one when the total number of packets goes to infinity.Based on the acknowledgement information, in each timeslotthe sender will pick up a packet that has not been delivered.Without loss of generality, assuming a message M is parti-tioned into l packets M1,M2, · · · ,Ml, each of which has size|Mi| = |M |/l (1 ≤ i ≤ l). Then, the transmitted message Mcan be reconstructed at the receiver if and only if all l packetsare successfully received. The following theorems characterizethe approximation factors for the static optimal and adaptiveoptimal solutions.

Theorem 3: When l ≥ 36(1 + cϵ)krn lnn/(c− 1)2ϵ2, ouralgorithm is (1 + cϵ)-static approximation for any constantc > 1.

Proof See [1].

Theorem 4: When l ≥ 36 n3 lnnK(1+cϵ)ksε(1−τ)(n−kj)(c−1)2ϵ2 , our algo-

rithm is n2

krksε(1−τ)(n−kj)K(1+cϵ)-adaptive approximation for

any constant c > 1, where K = minkr, ksε(1− τ), n− kj,ε is the probability of sensing a channel and τ is the sensingerror probability.

Proof See [1].

Discussions. As can be seen in the proof of Theorem 1,the parameters β, η and γ are fixed values and they areall pre-computed before the protocol execution. If we aimat ensuring that with probability at least 1 − δ the regretbound can be achieved, we can set a preferable value forδ. The parameter selection process is as follows. We haveβx =

√kx

nT ln nδ , γx = 2ηxn and ηx =

√lnn4Tn . Here, n and

kr are pre-selected system parameters. Once T is obtained,the specific values of βx, ηx and γx can be determinedsuch that the regret bound holds (or asymptotic optimalityis achieved). To determine T , in our protocol design, welet the sender determine a feasible T and encode it in eachpacket for transmission. The receiver obtains T by successfullydecoding any received packet and begins to run the algorithm.Assume p is the probability of message delivery, the senderdetermines T by first estimating a lower bound kr of krand an upper bound kj of kj . It then calculates ϵ such that

1 − 1lϵ = p and determines the constant c > 1 such that

l = 36(1 + cϵ)krn lnn/(c− 1)2ϵ2. Finally, the expected time

for message delivery is T = (1 + cϵ)l/(krksε(1−τ)

nn−kj

n ). Bytheorem 3, with probability at least p the message M can besuccessfully recovered at the receiver.

V. TRACKING THE ADAPTIVE COMPOUND STRATEGY FORANTI-JAMMING SPECTRUM ACCESS

In the above discussions, regret is computed as the accu-mulated reward difference between the proposed anti-jammingstrategy and the static optimal strategy. We have shown that theproposed jamming-resistant in Algorithm 1 can track the staticoptimal strategy and converge to it as time goes. Accordingto the definition, the static optimal strategy is selected as thefixed “best” strategy used for all timeslots. However, for eachtimeslot there always exists the best strategy against the “joint”strategy of the other parties involved in the anti-jamminggame. Linking these strategies from all timeslots together, thebest compound strategy is formulated, and this is so-called theadaptive optimal strategy. So, an interesting question can beraised here: at each timeslot, the good strategy may change,is it possible to select a sequence of strategies to approximatethe adaptive/compound strategy?

A. The Proposed Construction

In Algorithm 1, the size of the static optimal strategy set is(nkr

)((nks

)). However, for all possible compound strategies, the

strategy set is extremely large, i.e., with size approximatelyas large as

(nkr

)T ((nks

)T ). Therefore, by using the previousprotocol it is computationally expensive to track the bestcompound (adaptive) strategy. In this section, we will consideran extension of the anti-jamming protocol using the trackingthe best expert problem and develop an efficient algorithm toapproximate the best compound strategy.

Different from the static optimal strategy, the best com-pound strategy is allowed to change its strategy m times inT timeslots, i.e., a strategy from

(nkr

)((nks

)) is assigned in a

timeslot. Consider the compound strategy i = (i1, i2, . . . , im)corresponding to the timeslot vector t = (t1, t2, . . . , tm),strategy ij is used to predict the best strategy at time instanttj ≤ t ≤ tj+1. Then the new regret is defined as

max(i,t)

Gi,T − GxT , x ∈ s, r, (6)

where max(i,t) Gi,T denotes the accumulated rewards obtainedby using the adaptive compound strategy with respect to (i, t).For ease of analysis, we assume ε = 1, i.e., the sender per-forms sensing in each timeslot. The new algorithm for trackingthe adaptive compound strategy differs from Algorithm 1 instep 6. For ease of notation, we eliminate the superscript xin the following expressions. In step 6, the sender and thereceiver both update the strategy weight as

vi,t = wi,t−1eηg′

i,t , (7)

wi,t = (1− α)vi,t +α

NWt, (8)

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 9

where N =(nks

)(N =

(nkr

)), g′i,t =

∑f∈i g

′f,t and Wt is the

sum of the total weights, i.e.,

Wt =N∑i=1

vi,t. (9)

B. A Fast Implementation of The Proposed Construction

As can be seen, the time complexity of the proposedconstruction for tracking the adaptive compound strategy isO(nksT ). In this section, we present an alternative method ofimplementing the above algorithm in O(ksnT

2) time.The basic idea of our fast implementation is to select chan-

nels one by one in each timeslot/round, instead of computingeach strategy from a large strategy set. We let S(f, k) denotethe strategy set in which each strategy chooses k channelsfrom channels f, f +1, . . . , n and S(f, k) denote the strategyset in which each strategy chooses f channels from channel1, 2, . . . , f . In addition, we let G′

t′,t−1(f) denote the sumof cumulative gains in the interval [t′, t − 1] of channel f ,G′([t′, t − 1], i) denote the sum of cumulative gains in theinterval [t′, t− 1] of strategy i, Mt′,t−1(f, k) denote the sumof exponential cumulative gains in the interval [t′, t−1] of allthe strategies in S(f, k). Formally, we have

Mt′,t−1(f, k) =∑

i∈S(f,k)

eη∑

f∈i G′t′,t−1

(f),

where G′t′,t−1(f) =

∑t−1j=t′ g

′f,j .

Similarly, we define

M t′,t−1(f, k) =∑

i∈S(f,k)

eη∑

f∈i G′t′,t−1

(f).

Correspondingly, we have the following properties

Mt′,t−1(f, k) = Mt′,t−1(f + 1, k)

+ eηG′t′,t−1

(f)Mt′,t−1(f + 1, k − 1)(10)

M t′,t−1(f, k) = M t′,t−1(f − 1, k)

+ eηG′t′,t−1

(f)M t′,t−1(f − 1, k − 1)(11)

At timeslot t, for any t′ ∈ [1, t−1], if k < ks channels havebeen chosen from channels 1, . . . , f − 1, we choose channelf with probability

eηGt′,t−1(f)Mt′,t−1(f + 1, ks − k − 1)

Mt′,t−1(f, ks − k). (12)

If t′ = t, all channels are chosen with the same probability1

nN .Note that, t′ is chosen before the computation of Eq. (12)

according to the following distribution,

pt′ =

(1−α)t−1Z1,t−1

NWt, if t′ = 1

α(1−α)t−t′Wt′Zt′,t−1

NWt, if t′ = 2, . . . , t,

(13)

Algorithm 2 A Fast Implementation of Tracking the AdaptiveCompound Strategy for Anti-jamming Spectrum Access.Input: n, k, δ ∈ (0, 1), T, β ∈ (0, 1], γ ∈ (0, 1/2], η ∈ (0, 1),m ∈0, 1, . . . , T − 1, α = m

T−1.

Initialization: Set the initial gain G′t′,0(f) = g′f,0 = 0 and the total

weight W1 = 1. Let Mt′,t−1(f, 0) = 1 and Mt,t−1(n + 1, k) =M t′,t−1(0, k) = 0 and compute M0(f, k) and M0(f, k) followingEqs. (10) and (11), respectively.For timeslot t = 1, . . . , T ,

1: Choose t′ from [1, t] randomly following Eq. (13).2: Select channel f (∀f ∈ [1, n]) one by one following Eq. (12)

until a strategy It with k chosen channels is obtained.3: Compute probability qf,t (∀f ∈ [1, n]) following Eq. (15).4: Obtain the channel reward gf,t−1 and compute the virtual reward

g′f,t (∀f ∈ [1, n]) as g′f,t =

gf,t+β

qf,tif f ∈ It

βqf,t

otherwise.

5: Update Mt′,t(f, k) and M t′,t(f, k) for t′ = 1, . . . , t followingEqs. (10) and (11), respectively.

6: Update Wt following Eq. (14).End

where Zt′,t−1 =∑N

i=1 eηG′([t′,t−1],i) and Zt,t−1 = N . Here,

Wt can be computed efficiently as follows

Wt =α

N

t−1∑t′=2

(1− α)t−t′−1Wt′Zt′,t−1 +(1− α)t−2

NZ1,t−1.

(14)Note that G′

t′,t(f) = G′t′,t−1(f) + g′f,t−1, so Zt′,t =

Mt′,t(1, ks).Instead of maintaining the weight of each strategy wi,t, we

compute the probability qf,t for each channel as follows

(1− γ)

∑ks−1k=0 M t−1,t−1(f − 1, k)eηG

′t′,t−1

(f)

Mt−1,t−1(1, kr)

·Mt−1,t−1(f + 1, ks − k − 1) + γ|i ∈ C|

C. (15)

Algorithm 2 shows a complete description of the fast im-plementation algorithm. It is easy to see that, when calculatingEqs. (10) and (11) for a given t′, it only requires O(nk)computations. So, at each timeslot t, the computational cost ofcalculating Mt′,t(f, k) and M t′,t(f, k) for all t′ = 1, . . . , t(t ∈[1, T ]) and f ∈ [1, n] is approximately O(Tnks). In addition,the computation of Wt and qf,t can be done in O(T ) andO(ks), respectively. Therefore, for all timeslots, the totaltime complexity is approximately O(T 2nks) while the spacecomplexity is O(Tnks).

We next show the correctness of Algorithm 2. Let G′(f) =G′

t′,t−1(f) and c(f) = 1 if channel f is chosen in the strategyi; otherwise G′(f) = c(f) = 0. Then, the number of channelschosen among channels 1, 2, . . . , f is denoted by

∑f=ff=1 c(f).

It is obvious that the virtual reward of strategy i is G′t′,t−1(i) =∑n

f=1 G′(f). Therefore, the probability that a strategy i is

chosen for any t′ ∈ [1, t− 1] at timeslot t is

n∏f=1

eηG′(f)Mt′,t−1(f+1,ks−

∑f=ff=1 c(f))

Mt′,t−1(f, ks −∑f=f−1

f=1 c(f))=

eη∑n

f=1G′(f)

Mt′,t−1(1, ks)

=eηG

′t′,t−1

(i)

Zt′,t−1.

(16)

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

10 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

Besides, if t′ = t the probability to choose strategy i is 1N .

Thus, according to the conditional probability formula andEqs. (13) and (16), we can derive the probability to choosethe strategy i as wi,t−1

Wt−1, which is exactly the same as the

original algorithm shown in Section V-A. Therefore, this fastimplementation and the original algorithm are equivalent inthe sense that the prediction sequences of strategies have thesame distribution.

Finally, we have the following theorem to characterize theperformance bound on the new regret defined in Eq. (6), whichmeasures the difference between the proposed algorithm andthe adaptive optimal compound strategy.

Theorem 5: For the new algorithm tracking the best com-pound strategy, the normalized reward distance 1

T |GmaxT (s)−

GmaxT (r)| is upper bounded by O(12k

√n lnn), where k =

maxks, kr.

Proof Following the same proof strategy in Theorem 1 andwith a slight modification of proof of tracking the best expertin [23], we can show at the receiver side, with probability1 − δ, the regret for the compound strategy Gmax

T (r) − GAr

T

is at most 2√Tkr(

√4kr|C|D+

√n(T + 1) ln n(T+1)

δ ) when

βr =√

krTTn ln nT

δ , γr = 2ηrkr|C|, ηr =√

D4Tk2

r|C|, D =

T lnN + T − 1, and T ≥ maxkrTn ln nT

δ , 4|C|D. Using thefacts |C| = ⌈ n

kr⌉ and N ≤ nkr , we prove that when T → ∞,

the regret for the compound strategy is at most 6kr√T 2n lnn

by properly choosing kr, n and δ.Similarly, we obtain the bound 6ks

√T 2n lnn at the sender

side. Finally, as GAs

T = GAr

T , |GmaxT (s) − Gmax

T (r)| ≤12k

√T 2n lnn, where k = maxks, kr. Thus, 1

T |GmaxT (s)−

GmaxT (r)| is bounded by O(12k

√n lnn).

Discussion. In comparison to the regret bound obtained usingthe static optimal strategy, the proposed algorithm cannotguarantee that the normalized reward distance converges to0 when tracking the best compound strategy. This is becausethe best strategy may always change in each timeslot, andit is hard for the decision maker to adapt his choices to theadaptive optimal strategy. However, Theorem 5 guarantees thatthe reward distance between the sender and the receiver is atmost O(12k

√n lnn) when T goes to infinity. In practice, k

and n are pre-set system parameters, so the reward differencebetween two transceivers can achieve constant performance.

VI. NUMERICAL AND SIMULATION RESULTS

In our simulation, we assume both the sender and thereceiver use the proposed probabilistic anti-jamming protocol,i.e., MAB-based online channel selection strategy. Meanwhile,the PU dynamically access the whole spectrum with pi11 >pi01. The jammer, however, chooses his jamming strategyfrom static jamming, random jamming, myopic jamming andadaptive jamming (i.e., MAB-based jamming). For ease ofillustration, we let a four-element tuple denote the four parties’respective strategies. For example, “mab sta dyn mab” isused to denote the simulation setting that the sender uses theMAB-based strategy, the jammer uses static jamming strategy,the PU dynamically uses the spectrum according to certain

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cu

mu

lativ

e d

istr

ibu

tion

fu

nct

ion

of

T*

ε=0.8,τ=0.1n=8, l=10k

s=k

r=k

j=3

mab sta dyn mabmab ran dyn mabmab mab dyn mabmab myo dyn mab

(a)

0 500 1000 1500 2000 2500 30000

500

1000

1500

2000

2500

3000

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ks=k

r=k

j=3, n=8,ε=0.8,τ=0.1

mab sta dyn mabmab ran dyn mabmab mab dyn mabmab myo dyn mab

50 100 150

0

20

40

60

80

(b)

Fig. 3: The comparisons of the different jamming strategies on thesystem performance.

0 500 1000 1500 2000 2500 30000

200

400

600

800

1000

1200

Number of timeslots (T) A

vera

ge

cu

mu

lativ

e r

ew

ard

s/p

ack

ets

τ=0.1n=8k

s=k

r=k

j=3

mab myo dyn mab ε=1mab myo dyn mab ε=0.8mab myo dyn mab ε=0.6

(a)

0 100 200 300 400 500 600 700 8000

50

100

150

200

250

300

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1n=14k

s=k

r=3

mab myo dyn mab kj=3

mab myo dyn mab kj=5

mab myo dyn mab kj=7

(b)

Fig. 4: The effects of sensing probability ε and jamming capabilitykj on the system performance under “mab myo dyn mab”.

traffic statistics and the receiver uses the MAB-based strategy.Without loss of generality, we assume ks = kr = 3, underwhich we vary the jammer’s jamming capabilities and the totalnumber of channels in the simulation.

A. Message Delivery Performance Evaluation

We first evaluate the performance of Algorithm 1. Fig. 1shows (i) the average number of delivered packets as a functionof T and (ii) the CDF of the expected time to achieve messagedelivery when l = 10, kj = 3, n = 8 and p11i > p01i . Fig.1 (a), (c), (e), (g) show that the performances of static optand adaptive opt remain nearly the same especially whenthe jammer uses static jamming strategy. This implies thePU’s dynamics incur relatively “static” channel status from theperspective of SUs. So, we cannot gain much more by usingthe adaptive opt than the static opt. We also compare the effectof different jamming strategies on the throughput performancein Fig. 3. In Fig. 3 (a), it is shown that when static, random orMAB-based jamming strategies are adopted and the number ofpackets to be transmitted is relatively small, the whole messagecan be delivered with high probabilities before T = 150. Asfor the myopic jamming attack, it takes T = 250 for thereceiver to recover the whole message with high probability.However, as shown in Fig. 3 (b), if T continues to increaseto 150 timeslots, the adaptive jammer incurs nearly the sameperformance deterioration as the myopic jammer. Among oth-ers, the key reason why the myopic jammer and the adaptivejammer are the most effective jammers is that they can makeuse of traffic statistics and/or acknowledgement information todynamically adjust their jamming strategies.

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 11

20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cu

mu

lativ

e d

istr

ibu

tion

fu

nct

ion

of T

*

ε=0.8,τ=0.1n=8, l=10 k

s=k

r=k

j=3,

mab sta dyn mabmab sta dyn sta−optmab sta dyn adp−opt

(a)

0 50 100 150 2000

20

40

60

80

100

120

140

160

180

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1n=8 k

s=k

r=k

j=3,

mab sta dyn mabmab sta dyn sta−optmab sta dyn adp−opt

(b)

20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cu

mu

lativ

e d

istr

ibu

tion

fu

nct

ion

of T

*

n=8, l=10 ε=0.8,τ=0.1k

s=k

r=k

j=3

mab ran dyn mabmab ran dyn sta−optmab ran dyn adp−opt

(c)

0 50 100 150 2000

50

100

150

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1n=8 k

s=k

r=k

j=3

mab ran dyn mabmab ran dyn sta−optmab ran dyn adp−opt

(d)

20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cu

mu

lativ

e d

istr

ibu

tion

fu

nct

ion

of T

*

n=8, l=10ε=0.8,τ=0.1k

s=k

r=k

j=3

mab mab dyn mabmab mab dyn sta−optmab mab dyn adp−opt

(e)

0 50 100 150 2000

20

40

60

80

100

120

140

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1n=8k

s=k

r=k

j=3

mab mab dyn mabmab mab dyn sta−optmab mab dyn adp−opt

(f)

50 100 150 200 250 300 350 400 450 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cu

mu

lativ

e d

istr

ibu

tion

fu

nct

ion

of T

*

ε=0.8,τ=0.1n=8, l=10 k

s=k

r=k

j=3

mab myo dyn mabmab myo dyn sta−optmab myo dyn adp−opt

(g)

0 100 200 300 400 5000

50

100

150

200

250

300

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1n=8 k

s=k

r=k

j=3

mab myo dyn mabmab myo dyn sta−optmab myo dyn adp−opt

(h)

Fig. 1: Average number of received packets vs. the number of timeslots (T) and CDF of expected time to achieve message delivery underdifferent strategy settings with p11i > p01i .

0 500 1000 15000

100

200

300

400

500

600

700

800

900

1000

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8 n=8 k

s=k

r=k

j=3

mab sta dyn mab τ=0.1mab sta dyn mab τ=0.15mab sta dyn mab τ=0.2

(a)

0 500 1000 15000

100

200

300

400

500

600

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8 n=8 k

s=k

r=k

j=3

mab ran dyn mab τ=0.1mab ran dyn mab τ=0.15mab ran dyn mab τ=0.2

(b)

0 500 1000 15000

50

100

150

200

250

300

350

400

450

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8n = 8k

s=k

r=k

j=3

mab mab dyn mab τ=0.1mab mab dyn mab τ=0.15mab mab dyn mab τ=0.2mab mab dyn mab τ=0.25mab mab dyn mab τ=0.3mab mab dyn mab τ=0.35mab mab dyn mab τ=0.45

1200 1300 1400300

350

400

(c)

0 100 200 300 400 500 600 700 8000

50

100

150

200

250

300

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8n=8k

s=k

r=k

j=3

mab myo dyn mab τ=0.1mab myo dyn mab τ=0.15mab myo dyn mab τ=0.2mab myo dyn mab τ=0.25mab myo dyn mab τ=0.3mab myo dyn mab τ=0.35mab myo dyn mab τ=0.45

650 700 750 800

200

220

240

260

(d)

Fig. 2: The effects of sensing error probability τ on the system performance.

Fig. 4 (a) and (b) illustrates how the sensing probabilityε and the jamming capability kj affect the performance,respectively. Not surprisingly, the increase of kj will leadto less number of delivered packets, and a larger sensingprobability will enable the sender to update the strategydistributions using the sensing outcomes. In Fig. 2, we eval-uate how the sensing error probability τ affects the systemperformance. It has been shown that under static jamming orrandom jamming attacks, the average number of cumulativedelivered packets decreases when τ increases. Interestingly, ifadaptive jamming and myopic jamming attacks are launched,the system performance is first improved as τ increases andthen deteriorates when τ reaches a certain threshold. This isbecause a smaller τ can help disrupt the predictions of the twotypes of intelligent jammers on the available channels. If thesensing error probability τ continues to increase, sensing errorsbegins to dominate the performance and causes a performancedeterioration.

In Fig. 5 and Fig. 6 we use the setting “mab myo dyn mab”as an example to show how the parameters n and l affect thesystem performance. Fig. 5 shows that when l increases (i.e.,

100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cum

ulat

ive

dist

ribut

ion

func

tion

of T

*

ε=0.8,τ=0.1k

s=k

r=k

j=3

mab myo dyn mab n=8, l=10mab myo dyn mab n=10, l=10mab myo dyn mab n=14, l=10mab myo dyn mab n=8, l=30mab myo dyn mab n=10, l=30mab myo dyn mab n=14, l=30

Fig. 5: The effects of n and l on the system performance with respectto the CDF of the expected time to achieve message delivery.

from 10 to 30), the expected time to received the messagew.h.p. increases correspondingly. On the other hand, differentvalues of n will also affect performance as T increases. Forexample, see the circle point in Fig. 5 and Fig. 6 (a). WhenT < 180, the case of n = 8 gives the best performance; AfterT > 180, the case of n = 10 outperforms that of n = 8; Whenthe time reaches T = 240, the case of n = 14 outperformsthe case of n = 8 and it gives the best performance after

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

12 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

0 100 200 300 400 500 600 700 8000

50

100

150

200

250

300

350

400

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1k

s=k

r=k

j=3

mab myo dyn mab n=8mab myo dyn mab n=10mab myo dyn mab n=14

(a)

0 500 1000 1500 20000

100

200

300

400

500

600

700

Number of timeslots (T)

Ave

rag

e c

um

ula

tive

re

wa

rds/

pa

cke

ts

ε=0.8,τ=0.1k

s=k

r=k

j=3

mab myo dyn mab n=8mab myo dyn mab n=10mab myo dyn mab n=12mab myo dyn mab n=14mab myo dyn mab n=16

1700 1800 1900 2000

450

500

550

600

650

(b)

Fig. 6: The effect of n on the system performance with respect tothe average cumulative rewards/packets.

T = 320. That means that it is better to choose a small nwhen the message size is short; a larger n is preferred whenthe message size is relatively large. However, it does not implythat the larger n will always give the best performance. Asshown in Fig. 6 (b), when n increases from 12 to 14, theperformance gain is very small, and when n further increasesto n = 16, the performance deteriorates. This is because theuse of a large n also makes it difficult for the sender and thereceiver to hop to the same set of channels.

We next evaluate the performance of Algorithm 2. In Fig. 7(a), (b), (c) and (d), we show the impact of parameter m onthe system performance. Let ms,mr denote the number oftimes to change the strategy in T timeslots for the senderand the receiver, respectively. As expected, the larger ms willhelp to improve the system performance, which indicates thatthe transceiver requires more time to learn to choose goodchannels. The selection of a new strategy will contribute tothe update of system parameters so that available channels arechosen with a higher probability. However, when ms = 1500,the larger mr will lead to less number of received packetsand it requires more time to deliver the message with highprobability. This is because it is difficult for the sender andthe receiver to hop on the same channels when both partieschoose new channels too frequently.

We also evaluate the system performance when the jammerrandomly jams the ACK information. In practice, this is thebest strategy the jammer can adopt to disrupt the strategyconvergence between the sender and the receiver. In Fig. 8,we can see that when ACKs get jammed, the number ofsuccessfully received packets will decrease and it requiresmore time to deliver the whole message. However, it alsoindicates that our proposed anti-jamming spectrum sensingand access protocol can still defend against such a powerfuljammer.

VII. CONCLUSION

In this paper, we identified the vulnerability of the existingOSA protocols under malicious jamming attacks. Motivatedby this observation, we designed efficient and robust onlineOSA algorithms and analytically showed the regret bounds andapproximation ratios of our methods with respect to optimalstrategies. Our extensive simulations validate the theoreticalanalysis, showing that our methods perform extremely well

0 100 200 300 400 500 6000

20

40

60

80

100

120

140

160

180

Number of timeslots (T)

Ave

rage c

um

ula

tive r

ew

ard

s/pack

ets

mab mab dyn mab (ms=500,m

r=500)

mab mab dyn mab (ms=1000,m

r=500)

mab mab dyn mab (ms=1500,m

r=500)

mab mab dyn mab (ms=1500,m

r=1000)

mab mab dyn mab (ms=1500,m

r=1500)

(a)

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cum

ula

tive d

istr

ibutio

n funct

ion o

f T

*

mab mab dyn mab (ms=500,m

r=500)

mab mab dyn mab (ms=1000,m

r=500)

mab mab dyn mab (ms=1500,m

r=500)

mab mab dyn mab (ms=1500,m

r=1000)

mab mab dyn mab (ms=1500,m

r=1500)

(b)

0 100 200 300 400 500 6000

20

40

60

80

100

120

140

Number of timeslots (T)

Ave

rage c

um

ula

tive r

ew

ard

s/pack

ets

mab ran dyn mab (ms=500,m

r=500)

mab ran dyn mab (ms=1000,m

r=500)

mab ran dyn mab (ms=1500,m

r=500)

mab ran dyn mab (ms=1500,m

r=1000)

mab ran dyn mab (ms=1500,m

r=1500)

(c)

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)

Cum

ula

tive d

istr

ibutio

n funct

ion o

f T

*

mab ran dyn mab (ms=500,m

r=500)

mab ran dyn mab (ms=1000,m

r=500)

mab ran dyn mab (ms=1500,m

r=500)

mab ran dyn mab (ms=1500,m

r=1000)

mab ran dyn mab (ms=1500,m

r=1500)

(d)

Fig. 7: Average number of received packets vs. the number oftimeslots (T) and CDF of expected time to achieve message deliveryunder different strategy settings with p11i > p01i and the change ofm.

0 100 200 300 400 500 6000

50

100

150

Number of timeslots (T)

Ave

rage c

um

ula

tive r

ew

ard

s/pack

ets

mab ran dyn mabmab mab dyn mabmab mab dyn mab (ACKs are jammed)mab ran dyn mab (ACK are jammed)

(a)

100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of timeslots to achieve message delivery (T*)C

um

ula

tive d

istr

ibutio

n funct

ion o

f T

*

ks=k

r=k

j=3

ε=0.8,τ=0.1

mab ran dyn mabmab mab dyn mabmab mab dyn mab (ACKs are jammed)mab ran dyn mab (ACK are jammed)

(b)

Fig. 8: The effect of jammed ACK information on the systemperformance.

and are very effective in defending against malicious jammingattacks.

ACKNOWLEDGMENT

Kui’s research is supported in part by US National ScienceFoundation under grant CNS-1318948. Qian’s research issupported in part by National Natural Science Foundationof China (Grant No. 61373167), National Basic ResearchProgram of China (973 Program, Grant No. 2014CB340600),National High Technology Research and Development Pro-gram of China (863 Program, Grant No. 2015AA016004),and Wuhan Science and Technology Bureau (Grant No.2015010101010020).

APPENDIX APROOF OF THE THEOREM 2

Proof We prove that by using dynamic programming boththe sender’s sensing and access algorithm and the receiver’s

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

SHELL et al.: BARE DEMO OF IEEETRAN.CLS FOR COMPUTER SOCIETY JOURNALS 13

receiving algorithm can be efficiently implemented with timeand space complexities which are linear to n and ks (kr). Weprove it for the receiver side, and the proof for the sender sideis same.

In the proposed algorithm, steps 1 and 2 are time consumingsince the total number of possible strategies is N = O(nkr ).In this proof, we show that the time complexity can bereduced by using dynamic programming. Let S(f , k) denotethe strategy set in which each strategy chooses k channelsfrom channels f , f + 1, · · · , n. We also use S(f , k) todenote the strategy set in which each strategy chooses kchannels from channels 1, 2, · · · , f . We define Wt(f , k) =∑

i∈S(f ,k)

∏f∈i wf,t and Wt(f , k) =

∑i∈S(f ,k)

∏f∈i wf,t.

Note Wt(f , k) = Wt(f + 1, k) + wf ,tWt(f + 1, k − 1) andWt(f , k) = Wt(f − 1, k) + wf ,tWt(f − 1, k − 1), whichimplies both Wt(f , k) and Wt(f , k) can be computed in timeO(krn) (letting Wt(f , 0) = 1, W (n + 1, k) = W (0, k) = 0)by dynamic programming for all 1 ≤ f ≤ n and 1 ≤ k ≤ kr.

In step 1, a strategy should be drawn from N strategies. In-stead of drawing a strategy, we choose channel for the strategyone by one. Assume we make decision on each channel oneby one in increasing order of their indices, i.e., we first decidewhether channel 1 should be chosen or not, and channel 2, andso on. For any channel f , if k ≤ kr channels have been chosenamong channels 1, · · · , f−1, we choose channel f with prob-ability wf,t−1Wt−1(f+1,kr−k−1)

Wt−1(f,kr−k) and we do not choose channel

f with probability Wt−1(f+1,kr−k)Wt−1(f,kr−k) . Let w(f) = wf,t−1 if

channel f is chosen in the strategy i; 0 otherwise. w(f) is theweight of f in the total weight of the strategy. In our algorithm,wi,t−1 =

∏nf=1 w(f). Let c(f) = 1 if channel f is chosen in

the strategy i; 0 otherwise.∑f

f=1 c(f) denotes the number ofchannels chosen among channels 1, 2, · · · , f in strategy i. Inthis implementation, the probability that a strategy i is chosen

is∏n

f=1

w(f)Wt−1(f+1,kr−∑f

f=1 c(f))

Wt−1(f ,kr−∑f−1

f=1 c(f))=

∏nf=1

w(f)

Wt−1(1,kr)=

wi,t−1

Wt−1.

The probability is exactly the same as that in Algorithm 1,which implies the correctness of this implementation.

Note in this implementation, we do not maintain the totalweight of each strategy wi,t. So we cannot compute qf,t aswe described in step 2 of our algorithm. The probabilityqf,t can be computed within O(n) as follows qf,t = (1 −γ)

∑kr−1k=0 Wt−1(f−1,k)wf,t−1Wt−1(f+1,kr−k−1)

Wt−1(1,kr)

+ γ |i∈C:f∈i||C| for each round.

REFERENCES

[1] Q. Wang, K. Ren, and P. Ning, “Anti-jamming communication incognitive radio networks with unknown channel statistics,” in Proc. ofICNP’11, 2011, pp. 393–402.

[2] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitivemac for opportunistic spectrum access in ad hoc networks: A pomdpframework,” IEEE JSAC, vol. 25, no. 3, pp. 589–600, 2007.

[3] S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari,“Optimality of myopic sensing in multi-channel opportunistic access,”IEEE Transactions on Information Theory, vol. 55, no. 9, pp. 4040–4050, 2009.

[4] K. Liu, Q. Zhao, and B. Krishnamachari, “Dynamic multichannel accesswith imperfect channel state detection,” IEEE Transactions on SignalProcessing, vol. 58, no. 5, pp. 2795–2808, 2010.

[5] J. Unnikrishnan and V. V. Veeravalli, “Algorithms for dynamic spectrumaccess with learning for cognitive radio,” IEEE Transactions on SignalProcessing, vol. 58, no. 2, pp. 750–760, 2010.

[6] K. Liu and Q. Zhao, “A restless bandit formulation of multi-channelopportunistic access: Indexablity and index policy,” IEEE Transactionson Information Theory, vol. 56, no. 11, pp. 5547–5567, 2010.

[7] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication.Addison Wesley, 1995.

[8] M. Strasser, C. Popper, S. Capkun, and M. Cagalj, “Jamming-resistantkey establishment using uncoordinated frequency hopping,” in Proc. ofIEEE Security and Privacy, May 2008.

[9] M. Strasser, C. Popper, and S. Capkun, “Efficient uncoordinated fhssanti-jamming communication,” in Prob. of ACM MobiHoc’09, 2009.

[10] D. Slater, P. Tague, R. Poovendran, and B. J. Matt, “A coding-theoreticapproach for efficient message verification over insecure channels,” inProc. of ACM WISEC’09. ACM, 2009.

[11] A. Liu, P. Ning, H. Dai, and Y. Liu, “Usd-fh: Jamming-resistantwireless communication using frequency hopping with uncoordinatedseed disclosure,” in Proc. of MASS’10, 2010.

[12] Y. Liu, P. Ning, H. Dai, and A. Liu, “Randomized differential dsss:Jamming-resistant wireless broadcast communication,” in Proc. of IEEEINFOCOM’10, 2010.

[13] A. Liu, P. Ning, H. Dai, Y. Liu, and C. Wang, “Defending dsss-basedbroadcast communication against insider jammers via delayed seed-disclosure,” in Proc. of ACSAC, 2010, pp. 367–376.

[14] H. Li and Z. Han, “Dogfight in spectrum: Combating primary useremulation attacks in cognitive radio systems, part i: Known channelstatistics,” IEEE Transactions on Wireless Communications, vol. 9,no. 11, pp. 3566–3577, 2010.

[15] ——, “Dogfight in spectrum: Combating primary user emulation attacksin cognitive radio systems - part ii: Unknown channel statistics,” IEEETransactions on Wireless Communications, vol. 10, no. 1, pp. 274–283,2011.

[16] S. Gao, L. Qian, D. R. Vaman, and Z. Han, “Distributed cognitivesensing for time varying channels: Exploration and exploitation,” inProc. of WCNC, 2010.

[17] Y. Wu, B. Wang, K. R. Liu, and T. C. Clancy, “Anti-jamming gamesin multi-channel cognitive radio networks,” IEEE Journal on SelectedAreas in Communications, vol. 30, no. 1, pp. 4–15, 2012.

[18] B. Wang, Y. Wu, K. J. R. Liu, and T. C. Clancy, “A stochastic anti-jamming game in cognitive radio networks,” IEEE Journal on SelectedAreas in Communications, vol. 29, no. 4, pp. 877–889, 2011.

[19] P. Whittle, “Restless bandits: activity allocation in a changing world,”Journal of Applied Probability, vol. 25A, pp. 287–298, 1988.

[20] M. Strasser, B. Danev, and S. Capkun, “Detection of reactive jammingin sensor networks,” in ACM Transactions on Sensor Networks (TOSN).ACM, 2010.

[21] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The non-stochastic multiarmed bandit problem,” SIAM J. Comput., 2002.

[22] B. Awerbuch and R. D. Kleinberg, “Adaptive routing with end-to-endfeedback: distributed learning and geometric approaches,” in Proc. ofACM STOC’04, 2004, pp. 45–53.

[23] A. Gyorgy, T. Linder, G. Lugosi, and G. Ottucsak, “The on-line shortestpath problem under partial monitoring,” J. Mach. Learn. Res., 2007.

[24] V. Namboodiri, “Are cognitive radios energy efficient? a study of thewireless lan scenario,” in Proc. of IPCCC’09, 2009, pp. 437–442.

[25] Q. Wang, K. Ren, P. Ning, and S. Hu, “Jamming-resistantmulti-radio multi-channel opportunistic spectrum access incognitive radio networks,” Wuhan University, Tech. Rep.,http://www.ece.iit.edu/%7Eqian/TechReport2014II.pdf.

Qian Wang received the B.S. degree from WuhanUniversity, China, in 2003, the M.S. degree fromShanghai Institute of Microsystem and InformationTechnology, Chinese Academy of Sciences, China,in 2006, and the Ph.D. degree from Illinois Insti-tute of Technology, USA, in 2012, all in ElectricalEngineering. He is currently a Professor with theSchool of Computer Science, Wuhan University. Hisresearch interests include wireless network securityand privacy, cloud computing security, and appliedcryptography. Qian is an expert under “1000 Young

Talents Program” of China. He is a co-recipient of the Best Paper Award fromIEEE ICNP 2011. He is a Member of the IEEE and a Member of the ACM.

0018-9545 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVT.2015.2511071, IEEETransactions on Vehicular Technology

14 JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

Kui Ren is an Associate Professor of computerscience at State University of New York at Buffalo.He received his PhD degree from Worcester Poly-technic Institute and both BE and ME degrees fromZhejiang University. Kui’s research interests includeCloud Security, Wireless Security, and Smartphone-enabled Crowdsourcing Systems. His research hasbeen supported by NSF, DoE, AFRL, MSR, andAmazon. He is a recipient of NSF CAREER Awardin 2011 and Sigma Xi Research Excellence Award in2012. Kui has published 135 peer-review journal and

conference papers. Kui received several Best Paper Awards including IEEEICNP 2011. Kui currently serves as an associate editor for IEEE Transactionson Information Forensics and Security, IEEE Wireless Communications, IEEEInternet of Things Journal, IEEE Transactions on Smart Grid, IEEE Commu-nications Surveys and Tutorials, Elsevier Pervasive and Mobile Computing,and Oxford The Computer Journal. Kui is a Fellow of IEEE, a member ofACM, a Distinguished Lecturer of IEEE Vehicular Technology Society, anda past board member of Internet Privacy Task Force, State of Illinois.

Peng Ning is a Professor with the Department ofComputer Science, North Carolina State University,Raleigh, NC, USA. He is currently on leave atSamsung Mobile, Santa Clara, CA, USA, wherehe is leading the Samsung KNOX Research andDevelopment Team. His research interests are pri-marily in mobile security, wireless security, andcloud computing security.

Shengshan Hu received the B.S. degree in com-puter science and technology from Wuhan Univer-sity, China, in 2014. He is currently pursuing hisPh.D degree in School of Computer Science, WuhanUniversity. His research interest includes cloud com-puting, network security with current focus on secureoutsourcing computation.