Download - NONPARAMETRIC AND PARTIALLY NONPARAMETRIC … › researcher › files › us... · 2010-03-30 · NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL INFERENCE IN WIRELESS SENSOR

NONPARAMETRIC AND PARTIALLY

NONPARAMETRIC STATISTICAL INFERENCE IN

WIRELESS SENSOR NETWORKS

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Ting He

August 2007

NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL

INFERENCE IN WIRELESS SENSOR NETWORKS

Ting He, Ph.D.

Cornell University 2007

Statistical inference has been extensively studied in a parametric framework. In

wireless sensor networks, increased concerns about performance under unknown

conditions have urged researchers to reconsider many parametric assumptions that

have been widely accepted before. This thesis aims at solving selected statistical

inference problems arising in wireless sensor networks while minimizing parametric

assumptions about the underlying distributions.

The main problem considered in the thesis is the detection of information

flows based on timing information. The problem is to detect on-going flows of

information-carrying packets in a multi-hop network by measuring node trans-

mission epochs. This problem is formulated as a testing against correlated point

processes, where the correlation is modelled by constraints on relaying packets

such as causality, packet conservation, etc. The problem is divided into three

subproblems: centralized detection without chaff noise, centralized detection with

chaff noise, and distributed detection with chaff noise. For flows without chaff

noise, linear-time detection algorithms are proposed which are shown to outper-

form existing detectors in error exponents. For flows with chaff noise, it is shown

that there exists a threshold on the fraction of chaff noise such that the flows

are undetectable for noise level above the threshold and detectable with vanish-

ing error probabilities otherwise. The value of the threshold is characterized both

analytically and algorithmically. Optimal detectors that can tolerate the max-

imum amount of chaff noise are developed. The problem is then extended to

distributed detection where there are capacity constraints in data collection. Joint

quantization-detection schemes are developed and analyzed to give achievability

results.

The other problem considered in the thesis is the detection and estimation of

changes in large random sensor fields. The problem is formulated as the detection

of changes in the geographical distribution of alarmed sensors and the estimation

of the set with the maximum change. A nonparametric detector is proposed based

on the largest change in the empirical distributions. Exponentially decaying up-

per bounds on the error probabilities are derived by statistical learning theory.

Polynomial-time algorithms are developed to implement the detector, which also

give consistent estimation results.

BIOGRAPHICAL SKETCH

Ting He was born in 1980 in Urumqi, Xinjiang, in northwest China. She received

her BS degree in Computer Science and Technology from Peking University, China,

in 2003 as a top 2% student in her batch. Since then, she has been in the M.S/Ph.D

program in the School of Electrical and Computer Engineering at Cornell Univer-

sity. Ting joined Adaptive Communications & Signal Processing Group (ACSP)

under the supervision of Prof. Lang Tong upon her arrival at Cornell, and has

worked as a graduate research assistant. Previously, she worked as an under-

graduate research assistant in Micro Processor Research & Development Center of

Peking University from 2001 to 2003, during which period she participated in the

development of Unicore System, a key project in the National 863 Plan of China.

Ting has been a student member of IEEE since 2004. She received the Best Stu-

dent Paper Award at the 2005 International Conference on Acoustic, Speech and

Signal Processing (ICASSP). Her research topics include nonparametric change

detection and estimation, stepping-stone detection, and both centralized and dis-

tributed information flow detection in wireless and sensor networks. Her research

aims at applying signal processing tools to network layer analysis and design. Her

general research interests include signal processing, information theory, algorithm

design, and network security.

iii

To my parents and family.

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor Prof. Lang Tong without

whom my Ph.D. would not have been possible. I am grateful for his guidance and

generous sharing of experiences. I have learnt from him not only methodologies,

but also what it means to be a great researcher. I have always been impressed by

his enthusiasm about research and high standards of excellence. I firmly believe

that the in-depth discussions and group meeting presentations benefit me more

than any courses I have taken.

I would like to thank Prof. Stephen Wicker for serving on my committee. I

would like to thank Prof. Toby Berger and Prof. Sergio Servetto for not only being

on my committee but also teaching me the fantastic courses such as Information

Theory, Communication Networks, and Network Information Theory. I would like

to thank Dr. Ananthram Swami and Prof. Shai Ben-David for our extremely

successful collaborations which have not only led to fruitful publications but also

left truly pleasant experiences in my memory.

I would also like to thank Prof. Narahari U. Prabhu, Prof. David B. Shmoys,

and Prof. Charles Van Loan for teaching great courses in applied mathematics

which lay the foundation of my research. It was a true pleasure and priviledge for

me to have learnt from these exemplary professors.

I would also like to thank all the people in the ACSP research group for their

help and company. I will always remember Parv and Youngchul with whom I share

numerous experiences and thoughts, Qing and Min who clearly set the standards

for me, Cris who is my only and best bowling partner, Zhiyu and Gokhan for their

sharpness and confidence. I have to thank Vidyut specially because he is remark-

ably talented in simplifying complicated problems, as a result of which this thesis is

based on his template. I will always remember Anima for her enthusiastic attitude

v

to life, Chin-Chen and Abhishek for never minding that I never really understand

what they are saying, Stefan and Saswat for their great work efficiency, Oliver for

his interesting view of many things in life and unbelievably broad interest, Matt

for his remarkable versatility and kindness, and Tae eung for always being polite

and modest. Without these people, my Ph.D. life would have been incomplete.

I also thank Min Jiang and Jun Liu for their selfless help when I first came to

Cornell, Jun Cui and Yanling Wu for sharing the happy moments when I was most

lonely, Juan for being the best roommate I have ever had, Jian Kong, Qing He, Jing

Tao, Junyun Yang, and all the friends at Winston Ct. for giving me tremendous

fun, Din, Edward, Wen, Matt, Hui, Birsen, Azadeh, Kwangtaik, Peter, Christina,

John, Frank, and all the acquaintance I have made in the ECE department for

making my years in Rhodes Hall a lot more enjoyable than it would have been

otherwise.

Special thanks to Huajun without whom life would become so much lonelier.

Last but not the least, I would like to thank my parents and all my family for

all that they have done for me. I can never be grateful enough for their endless

care, support, and love.

This work was supported in part by the Multidisciplinary University Research

Initiative (MURI) under the Office of Naval Research Contract N00014-00-1-

0564, Army Research Laboratory CTA on Communication and Networks under

Grant DAAD19-01-2-0011, the National Science Foundation under Contract CCR-

0311055, TRUST (The Team for Research in Ubiquitous Secure Technology) which

receives support from the National Science Foundation (NSF award number CCF-

0424422), and the National Science Foundation under award CCF-0635070.

vi

TABLE OF CONTENTS

Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction 11.1 Nonparametric Statistical Inference in Wireless Sensor Networks . . 11.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Nonparametric Change Detection and Estimation in 2D Random

Sensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Information Flow Detection by Timing Analysis . . . . . . . . . . . 101.4.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 121.4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Nonparametric Change Detection and Estimation 182.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 The Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . 202.2.3 Detection and Estimation . . . . . . . . . . . . . . . . . . . 22

2.3 Performance Guarantee . . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.1 Complete Algorithms . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . 472.5.2 Detector Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 482.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.6 Extension to Finite-level Sensor Measurements . . . . . . . . . . . . 532.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.A Proof of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Detecting Information Flows Without Chaff Noise 623.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.2 Flow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.2.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

vii

3.3 Detecting Information Flows with Bounded Delay . . . . . . . . . . 653.4 Detecting Information Flows with Bounded Memory . . . . . . . . . 693.5 Comparing the Algorithms . . . . . . . . . . . . . . . . . . . . . . . 71

3.5.1 DMV vs. DA . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5.2 DM vs. DMV . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.A Proof of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.B Algorithms of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Detecting Information Flows With Chaff Noise 874.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.2.1 Multi-hop Flow Models . . . . . . . . . . . . . . . . . . . . . 884.2.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 89

4.3 Flow Detectability . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.4 Detectability of Two-hop Flows . . . . . . . . . . . . . . . . . . . . 94

4.4.1 Two-hop Flows with Bounded Delay . . . . . . . . . . . . . 944.4.2 Two-hop Flows with Bounded Memory . . . . . . . . . . . . 97

4.5 Detectability of Multi-hop Flows . . . . . . . . . . . . . . . . . . . 994.5.1 Multi-hop Flows with Bounded Delay . . . . . . . . . . . . . 1004.5.2 Multi-hop Flows with Bounded Memory . . . . . . . . . . . 104

4.6 Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.7 Generalization of Poisson Assumption . . . . . . . . . . . . . . . . . 1144.8 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.8.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . 1174.8.2 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.A Proof of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.B Asymptotic CTR of MBMR . . . . . . . . . . . . . . . . . . . . . . 1404.C Algorithms of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 142

5 Distributed Detection of Information Flows 1535.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1535.2 The Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 154

5.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 1545.2.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . 155

5.3 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 1565.3.1 Level of detectability . . . . . . . . . . . . . . . . . . . . . . 1565.3.2 Level of Undetectability . . . . . . . . . . . . . . . . . . . . 1585.3.3 General Converse and Achievability . . . . . . . . . . . . . . 160

5.4 Quantizer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1625.5 Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 163

5.5.1 Case I: Slotted Quantization, Full Side-Information . . . . . 164

viii

5.5.2 Case II: Slotted Quantization, Equal Capacity Constraints . 1675.5.3 Case III: One-Bit Quantization, Full Side-Information . . . . 1705.5.4 Case IV: One-Bit Quantization, Equal Capacity Constraints 173

5.6 Analysis and Comparison . . . . . . . . . . . . . . . . . . . . . . . 1755.6.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 1755.6.2 Numerical Comparison . . . . . . . . . . . . . . . . . . . . . 177

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1805.A Proof of Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1815.B Algorithms of Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . 197

6 Conclusions 2016.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2046.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Bibliography 206

ix

LIST OF TABLES

2.1 Time Complexity Comparison . . . . . . . . . . . . . . . . . . . . . 552.2 Space Complexity Comparison . . . . . . . . . . . . . . . . . . . . 55

3.1 Detect-Match (DM). . . . . . . . . . . . . . . . . . . . . . . . . . . 853.2 Alternative Implementation of (*). . . . . . . . . . . . . . . . . . . 853.3 Detect-Maximum-Variation (DMV). . . . . . . . . . . . . . . . . . 86

4.1 Levels of undetectabilities (Poisson null hypothesis). . . . . . . . . 1094.2 Parameters for Simulations on Synthetic Data. . . . . . . . . . . . 1184.3 Parameters for Simulations on Traces. . . . . . . . . . . . . . . . . 1214.4 Bounded-Greedy-Match (BGM). . . . . . . . . . . . . . . . . . . . 1424.5 Multi-Bounded-Delay-Relay (MBDR). . . . . . . . . . . . . . . . . 1434.6 Expanded-Multi-Bounded-Delay-Relay (E-MBDR). . . . . . . . . . 1444.7 Bounded-Memory-Relay (BMR). . . . . . . . . . . . . . . . . . . . 1454.8 Multi-Bounded-Memory-Relay (MBMR). . . . . . . . . . . . . . . 1464.9 Detect-Bounded-Delay (DBD). . . . . . . . . . . . . . . . . . . . . 1474.10 Detect-Multi-Bounded-Delay (DMBD). . . . . . . . . . . . . . . . 1494.11 Detect-Bounded-Memory (DBM). . . . . . . . . . . . . . . . . . . . 1504.12 Detect-Multi-Bounded-Memory (DMBM). . . . . . . . . . . . . . . 152

5.1 Detector for Case I. . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.2 Detector for Case II. . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.3 Detector for Case III. . . . . . . . . . . . . . . . . . . . . . . . . . 1995.4 Detector for Case IV. . . . . . . . . . . . . . . . . . . . . . . . . . 200

x

LIST OF FIGURES

1.1 Reported alarmed sensors (red) in two collections. . . . . . . . . . 61.2 Is S communicating with D through R? . . . . . . . . . . . . . . . 101.3 Transmission patterns of S, R, and D suggest a communication

between S and D through R. . . . . . . . . . . . . . . . . . . . . . 101.4 In a wireless network, eavesdroppers are deployed to report the

transmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmissionepochs of node A (or B). . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Members of HD and HCD; : sample point in S1, •: sample pointin S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 The set s1, s2, s3, s4 is shatterable by axis-aligned rectangles, butthe set s1, s2, s3, s4, s5 is not. . . . . . . . . . . . . . . . . . . . . 31

2.3 Members of HR; : sample point in S1, •: sample point in S2. . . . 322.4 The set s1, s2, s3, s4 is shatterable by A ∪ B. . . . . . . . . . . . 352.5 Members of HV and HH; : sample point in S1, •: sample point in

S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6 Members of HDR; : sample point in S1, •: sample point in S2. . . 442.7 Detection threshold as a function of the sample size for different

VC-dimension’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.8 Detection threshold as a function of the detector size for different

sample sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.9 Miss detection probability of δdA as a function of the sample size:

simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.10 Miss detection probability of δφAas a function of the sample size:

simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.11 Detection probability of δdA as a function of detector size, 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.12 Detection probability of δφAas a function of detector size, 10000

Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1 Detecting information flows through nodes A and B by analyzingtheir transmission activities S1 and S2. . . . . . . . . . . . . . . . 63

3.2 Both the solid and the dotted lines denote matchings that are causaland bounded in delay, but the dotted lines also preserve the orderof incoming packets. . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Finding the match of s1(1): there are three candidates in the ∆-length interval following s1(1). . . . . . . . . . . . . . . . . . . . . 68

3.4 (a) the cumulative counting functions ni(w) (i = 1, 2); (b) thecumulative difference d(w) and the maximum variation v(w). . . . 70

xi

3.5 The statistic of DA is no larger than that of DMV. . . . . . . . . 753.6 PF (δDA), PF (δDMV), and their bounds; M = 40 packets, 100000

Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.7 PF (δDM) under various rates; ∆ = 10 seconds, 100000 Monte Carlo

runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.8 PF (δDA), PF (δDMV), and PF (δDM); M = 40 packets, ∆ = 10 seconds,

100000 Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . 79

4.1 Detecting information flows through nodes R1, R2, . . . , Rn by mea-suring their transmission activities; dotted lines denote a potentialroute. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2 An information flow along the path R1 → . . . → Rn. . . . . . . . . 884.3 BGM: a sequential greedy match algorithm. . . . . . . . . . . . . 954.4 Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated

by BMR. Initially, M1(0) = 0, indicating that the memory is empty.The first packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet isan arrival, and thus the memory size is increased by one. Suchupdating occurs at each arrival or departure. . . . . . . . . . . . . 98

4.5 Example: (a) The scheduling obtained by repeatedly using BGM.(b) Another scheduling. It shows that repeatedly using BGM issuboptimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.6 MBDR: a recursive greedy match algorithm. . . . . . . . . . . . . 1024.7 λ = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.8 λ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.9 λ = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.10 MBMR for n = 4 and M = 3 (s = s1

⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the10th packet), (M1(10), M2(10), M3(10)) = (1, 1, 0). . . . . . . . . 106

4.11 The level of undetectability βMn and its bounds as functions of n:

M = 4; compute βMn on 10000 packets. . . . . . . . . . . . . . . . 108

4.12 The c.d.f. of the CTR of BGM for ∆ = 5: CTR on traces vs. CTRon Poisson processes. . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.13 The c.d.f. of the CTR of BMR for M = 20: CTR on traces vs.CTR on Poisson processes. . . . . . . . . . . . . . . . . . . . . . . 116

4.14 Generating information flows with bounded memory (⌊M2⌋ = 3): f2

is generated by storing ⌊M2⌋ packets from f1 and randomly releasing

these packets during the arrival of the next ⌊M2⌋ packets. . . . . . 118

4.15 The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, 100 packets per process, 10000 Monte Carloruns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

xii

4.16 The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, 40 packets per process, 10000 Monte Carloruns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.17 The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, totally 200 packets over all process, 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

4.18 The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, totally 100 packets over all process, 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

4.19 PF (δBM), PF (δDAC), PF (δBD), and PF (δS-III) on LBL-PKT-4: M =20, ∆ = 5, threshold for δBD = 1/14, threshold for δBM = 1/21,tested on 134 × 133 trace pairs. . . . . . . . . . . . . . . . . . . . 122

4.20 PM(δBM) and PM(δDAC): M = 20, Nc = 1000, threshold for δBM =1/21, tested on 4000 bounded memory flows. . . . . . . . . . . . . 123

4.21 PM(δBD) and PM(δS-III): ∆ = 5, Nc = 1000, threshold for δBD =1/14, tested on 4000 bounded delay flows. . . . . . . . . . . . . . 124

4.22 Inserting virtual packets to calculate the delays of chaff packets. . 1264.23 The Markov chain formed by d′(w); p = λ1

λ1+λ2, q = 1 − p. . . . . 127

4.24 Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗. . . . . . . 130

4.25 Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets. 131

4.26 A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

4.27 The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15. . . . . . . . . . . . . . . 141

5.1 In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at thefusion center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.2 A distributed detection system. This system consists of two quan-tizers q

(t)1 and q

(t)2 and a detector δt. . . . . . . . . . . . . . . . . . 156

5.3 Inserting one chaff packet can destroy the alignment of measure-ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

5.4 IC-SF: Match s1 with s2 subject to delay bound T +∆. : directlyobserved epoch in s2; •: reconstructed epoch in s1. . . . . . . . . . 165

5.5 IC-OF: Backward greedy matching. Each epoch is matched to thefirst unmatched nonempty slot that is no more than ∆ earlier. . . 170

5.6 λ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1795.7 λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1795.8 λ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

xiii

5.9 Construct f1: : original epochs; •: constructed epochs. . . . . . . 1835.10 Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found by

IC-SE guarantees that x′j = y′

j, 2 + y′j+1, 1. . . . . . . . . . . . . . . 186

xiv

Chapter 1

Introduction

1.1 Nonparametric Statistical Inference in Wireless Sensor

Networks

Wireless sensor networks have become increasingly popular in the past few years.

The development of such networks was originally motivated by military applica-

tions such as battlefield surveillance. Now the use of wireless sensor networks is ex-

tended to many civilian applications, including environment and habitat monitor-

ing, healthcare applications, home automation, process monitoring, traffic control,

etc. These applications all require the collaborative inference of certain physical

or environmental conditions based on information collected by the sensors.

In classical statistical inference, the conditions to be inferred are assumed to

be characterized by a known parametric family, and the problem is reduced to

find the correct index in this family. This approach corresponds to the case when

there is thorough understanding of the conditions and their influence on sensor

measurements so that it is possible to formulate a parametric model properly.

Unlike in classical inference, the phenomena to be monitored by wireless sensor

networks are often not known at the time of inference or too diverse to fit into

specific parametric models. Therefore, it is desirable that nonparametric statistical

inference is considered in applications of wireless sensor networks.

The need of nonparametric inference also arises out of the concern of network

security. Wireless sensors may be deployed in an open environment and thus sub-

1

ject to tempering by malicious intruders. In this case, it has been proposed to use

statistical inference methods to identify misbehaving sensors, but the knowledge

about how compromised sensors will behave is very limited. Recent research inter-

ests have grown in the area of defending against intelligent adversaries, when the

intruder can control compromised sensors to collaboratively disrupt the inference

in an intelligent manner. In the presence of intelligent adversaries, it is highly

desirable that inference methods can guarantee certain performance even in the

worst case.

It is generally impossible to design a single inference method that is optimal

for different underlying distributions. Thus the enforcement of nonparametric

methods will inevitably result in a loss of performance for a specific distribution. In

the presence of intelligent adversaries, there may even be scenarios in which reliable

inference is impossible. Therefore, it is crucial to investigate the performance of

nonparametric inference techniques and their fundamental limits.

1.2 Dissertation Outline

This thesis attempts to study nonparametric statistical inference in wireless sensor

networks from the perspectives of both theoretical analysis and practical algorithm

design. The thesis addresses two problems. The first problem is the nonparamet-

ric detection and estimation of changes in the geographical distribution of alarmed

sensors, where detectors with exponentially decaying error probabilities and consis-

tent estimators are developed. The second problem is the detection of information

flows by timing analysis. The problem is further divided into three subproblems,

which deal with the detection of information flows without chaff noise, the detec-

2

tion in the presence of chaff noise, and distributed detection. Various detectors are

developed under the assumption of intelligent adversaries, and their asymptotic

performance is evaluated by error exponents (in the case of no chaff) or the maxi-

mum amount of chaff noise to guarantee vanishing error probabilities (otherwise).

In Chapter 2, we consider nonparametric change detection and estimation in

planar random sensor fields. Sensors are deployed to measure certain underlying

phenomenon and make binary decisions (i.e., alarmed, normal). Given samples

of the locations of alarmed sensors from two data collections, we want to know

whether and where the underlying phenomenon has changed. Assuming that in

each collection, the samples are i.i.d. drawn from an unknown geographical dis-

tribution, we formulate the problem as detecting changes in this distribution be-

tween two collection periods and estimating the location of the maximum change

if changes do occur. Our main contributions include a threshold detector based on

the distance between empirical distributions and uniform upper bounds on its error

probabilities under arbitrary distributions. Polynomial-time algorithms are devel-

oped to implement the detector for several types of empirical distances. Solutions

to the detection problem also give an estimate of the set with the largest change.

We show that under certain regularity conditions, such estimation is consistent.

In Chapter 3, we consider the detection of information flows. Given a wireless

or wired ad hoc network, we want to know if there are flows of information-carrying

packets through the nodes of interest by measuring the transmission activities of

these nodes in timing. Timing analysis has the advantages of robustness against

encryption and padding and easily obtainable measurements (especially in wireless

networks). Its challenges include perturbations imposed by delays, permutations,

etc. and chaff noise, which consists of dummy traffic and unrelated traffic mul-

3

tiplexed at intermediate nodes. In this chapter, we decompose the detection to

pairwise detection of every two hops of the information flows and only consider

timing perturbations. Assuming that the perturbations are bounded in delay or

memory, and there is no chaff noise, we develop linear-time detectors which have no

miss detection. We show that the proposed detectors outperform existing detectors

in false alarm probability.

In Chapter 4, we generalize the detection of information flows to allow the

insertion of chaff noise. Assuming that nodes can collaboratively perturb timing

and insert chaff noise to evade detection, we show that there exists a threshold

on the fraction of chaff noise, beyond which Chernoff-consistent detection is im-

possible. The threshold is characterized as the minimum chaff noise required for

an information flow to mimic the distribution under the null hypothesis. Optimal

chaff-inserting algorithms are developed to compute the threshold, and closed-

form expressions are obtained under the assumption that traffic under the null

hypothesis can be modelled as independent Poisson processes. Furthermore, we

develop a threshold detector based on the optimal chaff-inserting algorithms, which

can achieve Chernoff-consistent detection in the presence of chaff noise arbitrar-

ily close to the threshold. Therefore, we obtain a tight bound on the fraction of

chaff noise, within which the proposed detector is Chernoff-consistent, and beyond

which there exists an information flow embedded in chaff noise that is statisti-

cally identical with traffic under the null hypothesis so that no detector can be

Chernoff-consistent. We use this bound to characterize the level of detectability of

information flows in chaff noise. Furthermore, we show that joint detection over

multiple hops can greatly increase the level of detectability.

Chapter 5 addresses distributed detection of information flows. We focus on

4

pairwise detection of bounded delay flows in chaff noise. In distributed detection,

the collection of measurements is subject to capacity constraints in the commu-

nication channels, which makes the problem more applicable to wireless sensor

networks because the wide deployment and limited power supply make it neces-

sary to limit the communication rates. We derive theoretical upper and lower

bounds on the level of detectability as functions of the capacity constraints, and

then direct the focus to the design of practical detection systems. The detec-

tion systems consist of simple slot-based quantizers and threshold detectors based

on optimal chaff-inserting algorithms which compute the minimum chaff noise re-

quired to generate the received (compressed) measurements. Performance of the

proposed detection systems is analyzed and compared to gain heuristics on system

design.

The rest of this chapter elaborates each problem that we have introduced,

including our initial motivation for the problem, a summary of results, and a brief

overview of the related work.

1.3 Nonparametric Change Detection and Estimation in

2D Random Sensor Fields

We consider the detection of certain phenomenal change in a large-scale randomly

deployed sensor field. For example, sensors may be designed to detect certain

chemical components. When the sensor measurement exceeds certain threshold,

the sensor is “alarmed”. The state of a sensor depends on where it resides; sensors

in some area are more likely to be in the alarmed state than others are. We are not

interested in the event that certain sensors are alarmed. We are interested instead

5

in whether there is a change in the geographical distribution of alarmed sensors

from data collections at two different times. Such a change in distribution could

be an indication of abnormality.

First data collection Second data collection

Figure 1.1: Reported alarmed sensors (red) in two collections.

We assume that some (not necessarily all) of the alarmed sensors are reported

to a fusion center, either through the use a mobile access point (SENMA [36])

or using certain in-network routing scheme. Suppose that the fusion center ob-

tains reports of the locations of alarmed sensors, as illustrated in Fig. 1.1, from

two separate data collections. In the ith report, let the location of alarmed sen-

sors have some unknown distribution Pi, and each sample Si be a set of locations

drawn independently according to Pi. The change detection problem is one of

testing whether P1 = P2 without making prior assumptions about the data gener-

ating distributions Pi. Note that Pi only specifies the geographical distribution of

alarmed sensors. The joint distribution of alarmed and non-alarmed sensors is not

specified completely. A change in Pi may be caused by the change of the actual

phenomenon or the change of the sensor lay-out.

Such a general nonparametric assumption comes with a cost of usually requiring

large sample size, which renders the solution most applicable in large-scale sensor

networks where it is possible to obtain a large amount of sensor data.

There is also a related estimation problem in which, assuming that the detection

of change has been made, we would like to know where in the sensor field the change

has occurred, or where the change is the most significant (in a sense that will be

6

made precise later).

1.3.1 Summary of Results

We present a number of nonparametric change detection and estimation algorithms

based on an application of Vapnik-Chervonenkis Theory [40]. The basis of this ap-

proach has been outlined in [3] where we provided a mathematical characterization

of changes in distribution. Our focus is on the algorithmic side, aiming at obtain-

ing practical algorithms that scale with the sample size along with a certain level

of performance guarantee.

We first present results that establish a theoretical guarantee of performance.

The nonparametric detection problem considered here depends on the choice of the

distance measure between two probability distributions, and the choice is usually

subjective. We consider two distance measures. The first is the so-called A-

distance (also used in [3]) that measures the maximum change in probability on

A—a collection of measurable sets. The second is called relative A-distance—a

variation from that in [3]—for cases when the change in probability is concentrated

in areas of small probability. With these two distance measures, we apply the

Vapnik-Chervonenkis Theory to obtain exponential bounds1 on detection error

probabilities and establish the consistency results for the proposed detector and

estimator.

Next we derive a number of practical algorithms. The complexity of applying

the Vapnik-Chervonenkis Theory comes from the search among a (possibly infinite)

collection of measurable sets. In particular, given data S being the union of the

1Here we mean the error probabilities decay exponentially with the increase of sample size.

7

samples from the two collections, i.e., S = S1

⋃

S2, the key is to reduce the search

in an infinite collection of sets (e.g.,planer disks) to a search in a finite collection

H(S) (a function of S). Here we need a constraint on H(S) such that this reduction

does not affect the performance.

We consider three commonly used geometrical shapes—disks, rectangles, and

stripes—as our choices of measurable sets A. For the A-distance measure, if

M = |S| is the total number of data points in the two collections, we show that

a direct implementation of exhaustive search among the collection of all planer

disks has the complexity O(M4). We present a suboptimal algorithm, the Search

in sample-Centered Disks (SCD), that has the complexity O(M2 log M). Under

mild assumptions on Pi, the loss of performance of SCD diminishes as the sample

size increases. For the class of axis-aligned rectangles, we show that the opti-

mal search Search in Axis-aligned Rectangles (SAR) has complexity O(M3). A

suboptimal approach Search in Diagonal-defined axis-aligned Rectangles (SDR)

reduces the complexity to O(M2), again, with diminishing loss of performance

under mild assumptions. For the collection of strips, we present two algorithms:

Search in Axis-aligned Stripes (SAS) and Search in Random Stripes (SRS), both

have complexity O(M log M). Similar analysis has also been obtained for the

relative distance metric. See Table 2.1.

We implement several algorithms and verify their performance through simu-

lation. We also answer some practical questions arising in the implementation of

the detector, e.g.,how to decide the detection threshold and how to estimate the

minimum sample size.

8

1.3.2 Related Work

The problem of change detection in sensor field has been considered in different

(mostly parametric) settings [24,27]. The underlying statistical problem belongs to

the category of two-sample nonparametric change detection. A classical approach

is the Kolmogorov-Smirnov two-sample test [13] in which the empirical cumulative

distributions are compared, and the maximum difference in the empirical cumu-

lative distribution functions are used as test statistics. In a way, the proposed

methods generalize the idea of Kolmogorov-Smirnov test to a more general col-

lection of measurable sets using general forms of distance measures. Indeed, the

Kolmogorov-Smirnov two-sample test becomes a special case of the SAR (Search

in Axis-aligned Rectangles) algorithm presented in Section 2.4.1.

There is a wealth of nonparametric change detection techniques for one-

dimensional data set in which data are completely ordered. Examples include

testing the number of runs (successive sample points from the same collection) such

as Wald-Wolfowitz runs test, or testing the relative order of the sample points, e.g.

median test, control median test, Mann-Whitney U test, and linear rank statistic

tests [13,22,33]. Such techniques, however, do not have natural generalizations for

the two dimensional sensor network applications.

Vapnik-Chervonenkis Theory (VC Theory) is a statistical theory about com-

putational learning processes developed by Vapnik and Chervonenkis [38–40]. The

theory lays the theoretical foundation for learning-based inference methods. The

parts of the theory most related to our problem are the theory of consistency in

learning and the nonasymptotic theory of convergence rates. The theory has been

substantially developed since the original study of Vapnik and Chervonenkis; see

the book chapter by Bousquet et al. in [6] and the references therein.

9

1.4 Information Flow Detection by Timing Analysis

Consider a wireless ad hoc network illustrated in Fig. 1.2. We want to know if there

is information flowing between nodes S, R, and D. Suppose that we can observe

their transmission epochs2, as shown in Fig. 1.3. Then from the transmission

patterns, we can probably infer that S is communicating with D, and R is the

relay node.

S

DR

Figure 1.2: Is S communicating with D through R?

t

t

t

S

D

R

Figure 1.3: Transmission patterns of S, R, and D suggest a communicationbetween S and D through R.

This example belongs to the problem of detecting information flows by timing

analysis. Generally, in a wireless ad hoc network illustrated in Fig. 1.4, there

may be information flows along multiple potential routes. We want to decide

whether a particular information flow is going on by eavesdropping the traffic on

the route. Suppose that we cannot rely on nodes in the network to report the

information flows, and all the packets are encrypted and padded at every hop,

leaving only timing information to be observable. Given eavesdroppers deployed

2For example, if the nodes use transmitter directed signaling to communicate, and we knowthe transmitters’ codes, then we can deploy eavesdroppers turned to the transmitters’ codes todetect transmission activities.

10

to record transmission epochs of the nodes of interest, the problem is how to

correlate these transmission epochs to detect whether the corresponding nodes are

transmitting an information flow.

A B

S1: S2:

Uplinkchannels

Detector

Wireless node Eavesdropper

Figure 1.4: In a wireless network, eavesdroppers are deployed to report thetransmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmission epochsof node A (or B).

Timing measurements are subject to a number of sources of perturbations. For

example, a relay node can hold the incoming packets for random periods of time,

reshuffle them, relay them in batches, etc. Furthermore, traffic on different routes

will multiplex at the intersecting nodes, and relay nodes may selectively drop

certain packets or insert dummy packets. Both traffic multiplexing and packet

dropping/insertion cause our measurements to contain packets that do not belong

to the information flow of interest. We will refer to such packets as chaff noise.

The presence of chaff noise significantly increases the difficulty of the problem.

Another challenge comes from capacity constraints in the uplink channels. In

wide-area networks such as wireless sensor networks, eavesdroppers are often pow-

ered by batteries and have to report to the fusion center with limited power.

11

Therefore, the uplink channels are subject to limited capacity constraints. The

direct consequence is that the measurements received at the fusion center will not

be identical with the raw measurements of the eavesdroppers, but will be distorted

to a certain extent. An exception is when the detector is located at one of the

eavesdroppers, in which case the detector knows the raw measurements of that

eavesdropper perfectly, referred to as the case of full side-information.

1.4.1 Summary of Results

We consider the detection of information flows through certain nodes of interest

by measuring their transmission activities in timing. We first consider detecting

information flows with exact timing measurements (i.e., centralized detection) and

then add capacity constraints in data collection (i.e., distributed detection). With

transmission activities modelled by point processes, the problem is formulated as a

hypothesis testing against point processes conforming to certain flow models. We

consider two types of flow models derived from constraints in reliable communi-

cations: the bounded delay flow and the bounded memory flow. Chaff noise does

not need to satisfy any of the constraints.

For centralized detection of information flows without chaff noise, we develop

pairwise linear-time detection algorithms by packet matching or counting schemes.

We show that these algorithms have no miss detection and exponentially decaying

false alarm probabilities if traffic under the null hypothesis can be modelled as

independent Poisson processes. We compare our algorithms with existing detec-

tion algorithms by both error exponent analysis and numerical simulations. The

comparison shows that our algorithms outperform the existing ones.

12

For centralized detection of information flows with chaff noise, we give an ex-

act characterization of the level of detectability of information flows, defined as

the maximum fraction of chaff noise allowed for Chernoff-consistent detection (de-

tailed definition is in Chapter 4). Our contributions include a converse result

and an achievability result. For the converse, we show that there is a bound on

the fraction of chaff noise beyond which Chernoff-consistent detection is impos-

sible. Specifically, the bound is characterized as the minimum fraction of chaff

noise needed to make an information flow statistically identical with traffic under

the null hypothesis. This bound is used to establish a level of undetectability for

information flows. Optimal chaff-inserting algorithms are proposed to calculate

the level of undetectability, and closed-form expressions are derived under the as-

sumption that traffic under the null hypothesis can be modelled as independent

Poisson processes. For the achievability, we develop a detector based on the opti-

mal chaff-inserting algorithms, which claims detection if the fraction of chaff noise

in the measurements computed by these algorithms is bounded by a predeter-

mined threshold. Under Poisson null hypothesis, the proposed detector is proved

to be Chernoff-consistent for all the information flows with fractions of chaff noise

bounded by the level of undetectability. Therefore, the level of detectability is

equal to the level of undetectability, and the proposed detector is optimal. We

show that the level of detectability increases to one as the number of hops in the

information flow increases, indicating that it is impossible to hide information flows

over arbitrarily long paths.

For distributed detection of information flows (with chaff noise), we focus on

the bounded delay flow model and pairwise detection. Our results have both

theoretical and algorithmic elements. In the theoretical aspect, we extend the

notions of detectability and undetectability to the context of distributed detec-

13

tion. Theoretical upper and lower bounds on the level of detectability are derived.

In the algorithmic aspect, we propose a three-stage detection procedure which

consists of quantization, data transmission, and detection. Quantization is per-

formed based on fixed slot partition. We propose a slotted quantizer and a one-bit

quantizer which compress each slot into the number of epochs and the indicator

of nonempty slot, respectively. Under each quantization, we develop a threshold

detector based on the optimal chaff-inserting algorithm analogous to those in cen-

tralized detection except that its input is quantized. With the performance of a

detection system measured by the maximum fraction of chaff noise such that the

system remains Chernoff-consistent, we compare the proposed detection systems

together with their analytical upper bounds. The comparison shows that slotted

quantization outperforms one-bit quantization for heavy traffic, and the detector

under slotted quantization and full side-information is near optimal. Performance

of the proposed detection systems gives lower bounds on the level of detectability

as a function of capacity constraints.

1.4.2 Related Work

Information flow detection is a special case of timing analysis, which in turn belongs

to the family of traffic analysis problems [12]. For wireless networks, the idea of

traffic analysis is especially promising because the shared wireless medium is open

to interception. Most of the existing work on traffic analysis in a wireless context

is experiment-oriented, e.g., [26, 35,50].

The problem of detecting information flows has mainly been addressed in the

framework of intrusion detection in wired networks, especially internet. In 1995,

Staniford and Heberlein [34] first considered the problem of stepping-stone detec-

14

tion. The key problem in stepping-stone detection is to reconstruct the intrusion

path by analyzing various characteristics of the attacking traffic. Related work in

the literature only considers pairwise detection.

Early detection techniques are based on the content of the traffic; see, e.g.,

[34, 46]. To deal with encrypted traffic, timing characteristics are used in detec-

tion, such as the On-Off detection by Zhang and Paxson [51], the deviation-based

detection by Yoda and Etoh [48], and the packet interarrival-based detection by

Wang et al. [45]. The drawback of these approaches is that they are vulnerable to

active timing perturbations by the attacker.

Donoho et al. [11] were the first to consider bounded delay perturbation. They

showed that if packet delays are bounded by a maximum amount, then it is possible

to distinguish traffic containing information flows from independent traffic. Their

work was followed by several practical detectors, including the watermark-based

detector by Wang and Reeves [44] and the counting-based detector by Blum et

al. [5].

The problem becomes much more challenging when chaff can be inserted, with

only incomplete solutions in the literature, e.g., [5, 11, 29, 49]. Donoho et al. [11]

showed that there will be distinguishable difference between information flows in

chaff and independent traffic if chaff noise is independent of the information flows.

Peng et al. [29] and Zhang et al. [49] separately proposed active and passive packet-

matching schemes which can detect information flows with bounded delay in chaff

if chaff packets only appear in the outgoing traffic of the relay node. Blum et al. [5]

modified their counting-based detector to handle a limited number of chaff packets

at the cost of an increased false alarm probability. These techniques can only deal

with a fixed number of chaff packets if nodes can insert chaff noise intelligently.

15

The dual problem of information flow detection is how to randomize transmis-

sion activities to maximally conceal information flows. This is a critical problem

in protecting anonymous communications against timing analysis attacks. In the

context of wireless ad hoc networks, Hong et al. in [23] proposed to add random

delays to prevent correlation of specific packets, and Deng et al. in [10] proposed to

randomize sensor transmission epochs within each data collection period to thwart

the detection of routes. At flow level, however, transmission activities of nodes on

the same information flow are still correlated. Zhu et al. in [52] proposed to make

traffic on all the outgoing links from a certain node identical in timing by inserting

chaff noise. Although this approach completely hides the information flow, it is

inefficient in terms of the required amount of chaff noise. More efficient methods

to hide information flows are developed based on the chaff-inserting algorithms

developed in Chapter 4; see [41].

Distributed detection of information flows belongs to hypothesis testing un-

der multiterminal data compression [16]. Solutions in this field can model spatial

correlation across nodes, which generalize the conditional i.i.d. assumption made

in classical distributed detection [37], but they can not deal with temporal cor-

relation. Specifically, existing work only deals with temporal i.i.d. data, i.e., the

observations (xi, yi) (i = 1, 2, . . .) are drawn i.i.d. from a distribution P , where

P = P0 under H0, and P = P1 under H1. The best error exponent (or its lower

bounds) is derived as a function of data compression rates under the Neyman Pear-

son framework; see [16] and the references therein. Our problem is fundamentally

different because the timing measurements of information flows are not i.i.d. , and

our hypotheses do not have a single-letter characterization.

The problem of compressing Poisson processes has been studied previously

16

in [31, 42]. Rubin in [31] derived the rate distortion function and practical com-

pression schemes under the absolute-error distortion measure. Verdu in [42] derived

a closed-form expression for the rate distortion function under an asymmetric dis-

tortion measure on interarrival times. One of our quantization schemes, slotted

quantization, is the same as the quantization scheme proposed by Rubin. Although

slotted quantization is near optimal in Rubin’s problem, it is not necessarily near

optimal in our problem since we want to optimize the overall detection performance

whereas Rubin just wanted to reconstruct the processes.

17

Chapter 2

Nonparametric Change Detection and

Estimation

2.1 Outline

In this chapter, statistical learning based techniques are proposed to detect changes

in a 2D random field without statistical knowledge about the underlying distribu-

tions. Section 2.2 specifies the model and defines the detector and the estimator.

Section 2.3 states the main theorems about the exponential bounds on error prob-

abilities of the detector and the consistency of the estimator. Section 2.4 presents

the detection and estimation algorithms, and Section 2.5 provides simulation re-

sults. The chapter is concluded with comments about the strengths and weaknesses

of the proposed approach.

2.2 The Problem Statement

2.2.1 The Model

Let set Ω denote the sensor field and F the σ-field on Ω. We assume that in each

data collection, we draw i.i.d. samples from the locations of alarmed sensors. Let

Pi (i = 1, 2) be a probability measure on (Ω, F) modelling the drawing in the ith

collection. Drawings in different collections are independent. Let Si denote the set

of locations collected in the ith collection and S = S1

⋃

S2 the set that contains

18

data from the two collections.

We point out that the joint distribution of sensor location and report, which

is influenced by sensor layout, readings, and sampling strategy, is not completely

specified. Note that although the i.i.d. assumption implies that the decisions of

alarm occur independently, the decisions are not necessarily identically distributed,

and the probability of alarm may vary at different locations. Moreover, the prob-

ability that an alarmed sensor reports to the fusion center may also be different

across sensors. Both of these probabilities can be incorporated into Pi. Note that

how unalarmed sensors are distributed is not specified; we can model arbitrary

correlations among them, and they will not have any impact on our result. This

allows us to model certain types of correlated sensor readings.

The probability measures Pi (i = 1, 2) are not known. Instead of making specific

assumptions on the form of Pi, we introduce a collection A ⊆ F of measurable sets

to model the geographical areas of practical interest and only look for changes in

the probabilities of sets in A. The collection A represents our prior knowledge of

what changes are expected. It does not have to be finite or even countable, and

is part of the algorithm design. For example, if we expect changes in the mean

of a symmetric distribution with monotone decreasing density from a center, it

may be good to choose A as the collection of disks. The choice of A is subjective,

and it depends on the application at hand. We will focus in this chapter on

regular geometrical shapes: disks, rectangles, and stripes. Intuitively, disks and

rectangles are suitable for changes in the location or spread of the probability mass,

and stripes (a special type of rectangles) are better for changes in correlation or

marginal distributions. We point out that although parametric model for Pi is not

needed, prior knowledge helps detection by allowing us to choose A which “fits”

19

the changes best, as discussed after Theorem 1.

Given a pair of samples S1, S2 drawn i.i.d. from distributions P1, P2, and a

collection A ⊆ F , we are interested in whether there is a change in probability

measure on A and, if there is a change, which set in A has the maximum change

of probability. Specifically, the detection problem considered in this chapter is the

test of the following hypotheses on A

H0 : P1 = P2 vs. H1 : P1 6= P21

The estimation problem, conditioned on that there is a change, is to estimate

the set A∗ ∈ A that has the maximum change. For example, using the absolute

difference, we want to estimate

A∗ = arg maxA∈A

|P1(A) − P2(A)|.

We will also consider a normalized difference measure in Section 2.2.2.

2.2.2 Distance Measures

To measure changes, we need some notion of distance between two probability

distributions. In this chapter, we will consider two distance measures: A-distance

and relative A-distance.

Definition 1 (A-distance and empirical A-distance [3]) Given probability spaces

(Ω,F , Pi) and a collection A ⊆ F , the A-distance between P1 and P2 is defined as

dA(P1, P2) = supA∈A

|P1(A) − P2(A)|. (2.1)

1Here H0 means P1(A) = P2(A) for all A ∈ A and H1 ∃A ∈ A s.t. P1(A) 6= P2(A).

20

The empirical A-distance dA(S1, S2) is similarly defined by replacing Pi(A) by the

empirical measure

Si(A)∆=

|Si

⋂

A||Si|

(2.2)

where |Si ∩ A| is the number of points in both Si and set A.

This notion of empirical A-distance dA(S1, S2) is related to the Kolmogorov-

Smirnov two-sample statistic. For the case where the domain set is the real line,

the Kolmogorov-Smirnov test considers

supx

|F1(x) − F2(x)|, Fi(x)∆= Pi(y : y ≤ x)

as the measure of difference between two distributions. By setting A to be the

set of all the one-sided intervals (−∞, x), dA(S1, S2) is the Kolmogorov-Smirnov

statistic.

The A-distance does not take into account the relative significance of the

change. For example, one could argue that changing the probability of a set from

0.99 to 0.999 is less significant than a change from 0.001 to 0.01 because the latter

is a ten-fold increase whereas the former is just an increase by less than 1%. For

applications in which small probability sets are of interest, we introduce the fol-

lowing notion of relative A-distance that takes the relative magnitude of a change

into account.

Definition 2 (Relative and Empirical Relative A-distance) Given proba-

bility spaces (Ω,F , Pi) and a collection A ⊆ F , the relative A-distance between

P1 and P2 is defined as

φA(P1, P2) = supA∈A

fφ(P1(A), P2(A)), (2.3)

21

where fφ : [0, 1] × [0, 1] → [0,√

2] is defined as

fφ(x, y) =

0 if x = y = 0

|x−y|√x+y

2

o.w.. (2.4)

The empirical relative A-distance is defined similarly by replacing Pi(A) with

the empirical measure defined in (2.2).

The above definition is slightly different from that used in [3]. It is obvious

that |P1(A) − P2(A)| is a metric. The proof that |P1(A)−P2(A)|√

P1(A)+P2(A)2

is a metric follows

from [2]. Note that in general dA(P1, P2) = 0 or φA(P1, P2) = 0 does not imply

P1 = P2, but implies P1(A) = P2(A) for any A ∈ A. If we only care about sets in

A, then dA and φA defined above are pseudo-metrics.

2.2.3 Detection and Estimation

With the distance measure defined, we can now specify the class of detectors and

estimators considered in this chapter.

Definition 3 (Detector δ(S1, S2; ǫ):) Given two collections of sample points S1

and S2, drawn i.i.d from probability distributions P1 and P2 respectively, and thresh-

old ǫ ∈ (0, 1), for hypotheses H0 vs. H1, the detector using the A-distance is defined

22

as2

δdA(S1, S2; ǫ) =

1 if dA(S1, S2) > ǫ

0 otherwise(2.5)

The detector δφA(S1, S2; ǫ) using the relative A-distance is defined the same way by

replacing dA(S1, S2) by φA(S1, S2) and letting ǫ ∈ (0,√

2).

Assuming that a change of probability distribution has occurred, we define the

estimator for the event that gives the maximum change in probabilities.

Definition 4 (Estimator A∗(S1, S2):) Given two collection of sample points S1

and S2, drawn i.i.d from probability distributions P1 and P2 respectively, the esti-

mator for the event that gives the maximum change of probability is defined as3

A∗dA

(S1, S2) = arg maxA∈A

|S1(A) − S2(A)| .

The estimator A∗φA

(S1, S2) using the relative A-distance is defined similarly.

The definitions given above require searching in a possibly infinite collection

of sets. At the moment, we only specify what the outcome should be without

addressing the algorithmic procedure generating it. We will address that issue in

Section 2.4.

2We use the convention that the detector gives the value 1 for H1 and 0 for H0. Note that ifnǫ is an integer, then dA(S1, S2) will be equal to ǫ with positive probability. Ideally, in Neyman-Pearson framework, randomization is used when dA(S1, S2) = ǫ to achieve lower miss probabilityalthough it complicates the analysis. Instead, we stick to such a deterministic detector and derivean explicit expression for the threshold by Vapnik-Chervonenkis inequalities. See the discussionsfollowing Theorem 1.

3In the case of tie, choose any one of the sets achieving the maximum change of empiricalprobability.

23

2.3 Performance Guarantee

We present in this section consistency results for the detector and estimator pre-

sented earlier. The results are given in the forms of error exponents.

First let us look at some technical preliminary from [39]. For measurable space

(Ω,F), let A ⊆ F . We say a set S ⊂ Ω is shatterable by A if for all B ⊆ S,

∃A ∈ A s.t.

B = A ∩ S.

Definition 5 (VC-Dimension) The Vapnik-Chervonenkis dimension of a col-

lection A of sets is

VC-d(A) = supn : ∃S s.t. |S| = n and S is shatterable by A.

The VC-dimension of a class of sets quantifies its ability to separate sets of points.

Intuitively, the VC-dimension of a class A is the maximum number of free param-

eters needed to specify a set in A. For example, if A = 2D disks, then we see

that at most 3 free parameters are needed — x, y-coordinates of the center and a

radius, and it is shown that the VC-dimension of A is indeed 3 ( [47]).

Note that the VC-dimension of a class may be infinite,e.g.,VC-dimension of the

entire σ-field F is ∞ because any set is shatterable by F .

Theorem 1 (Detector Error Exponents) Given probability spaces (Ω,F , Pi)

and a collection A ⊆ F with finite VC-dimension d, let Si ⊂ Ω be a set of n sample

points drawn according to Pi. The false alarm probabilities for the detectors defined

24

in (2.5) are bounded by

PF (δdA) ≤ 8(2n + 1)de−nǫ2/32, (2.6)

PF (δφA) ≤ 2(2n + 1)de−nǫ2/4. (2.7)

Furthermore, if dA(P1, P2) > ǫ and φA(P1, P2) > ǫ, the miss detection probabil-

ities satisfy, respectively,

PM(δdA , P1, P2) ≤ 8(2n + 1)de−n[dA(P1,P2)−ǫ]2/32,

(2.8)

PM(δφA, P1, P2) ≤ 16(2n + 1)de−n[φA(P1,P2)−ǫ]2/16.

(2.9)

Proof: See Appendix 2.A.

A few remarks are in order. First, if the maximum change between P1 and P2

on A exceeds ǫ, the detector detects the change with probability arbitrarily close

to 1 as the sample size goes to infinity. Similarly, if there is no change in Pi on A,

then the probability of false alarm also goes to zero. Notice that the decay rates

of the error probabilities are different when the two different distance measures

are used; from (2.6,2.7), the decay rate of false alarm probabilities for the detector

using φA is eight times that using dA.

Second, the above theorem provides a way of deciding the detection threshold

ǫ for a particular detection criterion. For example, the threshold (not necessarily

optimal) of the Neyman-Pearson detection for a given size α can be obtained from

the bounds on false alarm probabilities. Theorem 1 suggests that we should choose

25

(n, ǫ) such that

8(2n + 1)de−nǫ2/32 ≤ α for δdA (2.10)

2(2n + 1)de−nǫ2/4 ≤ α for δφA. (2.11)

Taking ǫ(n) to make the inequalities equal gives a threshold4

ǫ(n) =

√

32n

log 8(2n+1)d

αfor δdA

√

4n

log 2(2n+1)d

αfor δφA

(2.12)

We shall think of ǫ(n) as a measure of detector sensitivity. From (2.8,2.9)

in Theorem 1, we see that miss detection probability starts to drop exponen-

tially when ǫ(n) < dA(P1, P2) or ǫ(n) < φA(P1, P2). Thus, roughly, ǫ(n) is a

lower bound on the amount of changes in order for the change to be detected

with high probability. Furthermore, the smaller the ǫ(n), the larger the values of

[dA(P1, P2)− ǫ(n)]2/32 and [φA(P1, P2)− ǫ(n)]2/16, and the lower the upper bound

on miss detection probability. One should be cautioned that although the error

probabilities decay exponentially, the error exponents could be small, and thus a

large sample size may be required. For example, for d = 2 and ǫ = 0.1, 105 sample

points are required to guarantee a false alarm probability bounded by 5% for the

A-distance based detector. We can reduce the sample size to 104 by using the

detector based on the relative A-distance.

Third, note that the VC-dimension d of A has diminishing effects on the rate

of decay of error probabilities. The selection of A, however, may affect the error

exponent through dA or φA. Furthermore, the selection of A has a significant

impact on the complexity of practically implementable algorithms.

Finally, we should also note that, while we have stated the above theorem

4All the log in this thesis is natural logarithm.

26

under |Si| = n, the results generalized easily to the case when two collections have

difference sizes.

The consistency of the estimator is implied by the following theorem.

Theorem 2 Given probability spaces (Ω,F , Pi) (i = 1, 2) and a collection A ⊆ F

with finite VC-dimension, if A∗dA

∆= arg max

A∈A|P1(A) − P2(A)| is separated from the

rest of A in the sense that5

|P1(A∗dA

) − P2(A∗dA

)| − supB∈A\A∗

dA|P1(B) − P2(B)| > 0,

then A∗dA

(S1, S2) converges to A∗dA

in probability. Similar result holds for A∗φA

.


2.4 Algorithms

We now turn our attention to practically implementable algorithms and their com-

plexities. The key step is to obtain test statistics within a finite number of oper-

ations, preferably with the complexity that scales well with the total number of

data points M = |S1

⋃

S2|.

Given sample points S = S1

⋃

S2 and a possibly infinite collection of sets A,

we need to reduce the search in A to a search in a finite collection H(S) ⊂ A, and

replace dA(S1, S2) by dH(S1, S2). If H is not chosen properly, such a reduction of

5If Pi’s are continuous, and A∗dA

can be approximated arbitrarily by other sets in A, thenthis condition will not be satisfied. In that case, we have results on the estimation performanceevaluated by the amount of change in the estimated set [20].

27

the search domain may lead to a loss of performance. Thus we need the notion of

completeness when choosing the search domain.

Definition 6 (Completeness) Given A being a collection of measurable subsets

of space Ω, and S ⊂ Ω be a set of points in Ω. Let H(S) ⊂ A be a finite sub-

collection of measurable sets which is a function of S. We call the collection H(S)

complete for S with respect to A if ∀A ∈ A, there exists a B ∈ H(S) such that

S ∩ A = S ∩ B.

The significance of the completeness is that, if H(S1∪S2) is complete w.r.t. A,

then dA(S1, S2) = dH(S1, S2) and φA(S1, S2) = φH(S1, S2).

For the choice of A, we consider regular geometric areas, e.g.,disks, rectangles,

and stripes. We present next six algorithms for different choices of A and sub-

collection H. We first present complete algorithms, i.e. the sub-collection H is

complete with respect to A. Next we give a couple of heuristic algorithms which

simplify the computation at the cost of a loss in completeness.

Hereinafter all sets defined are closed sets unless otherwise stated.

2.4.1 Complete Algorithms

Search in Planar Disks (SPD)

Let A be the collection of two dimensional disks. Let VC-d denote the VC-

dimension of a class. The following result is proved by [15]:

28

Proposition 1

VC-d(A) = 3.

For the set of sample points S ⊆ Ω, consider the finite sub-collection of A

defined by

HD(S)∆=

⋃

(si,sj ,sk)∈THD(si, sj, sk) (2.13)

where

T ∆= si, sj, sk ∈ S3 : si, sj, sk are not collinear,

and

HD(si, sj, sk)∆= D(si, sj, sk), D(si, sj, sk) \ si,

D(si, sj, sk) \ sj, . . . , D(si, sj, sk) \ si, sj, sk

where D(si, sj, sk) is the disk with si, sj, and sk on its boundary, i.e., HD(si, sj, sk)

is D(si, sj, sk) and all the 7 variations for excluding some of the 3 boundary points.

See Figure 2.1.

s1

s2

s3

s4s5

D(s1, s2, s3) ∈ HDD′(s4, s5) ∈ HCD

Figure 2.1: Members of HD and HCD; : sample point in S1, •: sample point inS2.

In [18] we have proved the following result:

29

Proposition 2 Let A be the collection of two dimensional disks. For S1 and S2

drawn from P1 and P2, if P1 and P2 are such that any set with Lebesgue measure 0

has probability 06, then the finite collection HD(S1 ∪ S2) in (2.13) is complete with

respect to A a.s.(almost surely).

With HD(S) defined above, the algorithm SPD(dA)—Search in Planar Disks

using distance metric dA—is given by

maxA∈HD

|S1(A) − S2(A)| .

Algorithm SPD(dA) includes three steps: (i) generating elements of HD; (ii)

computing∣

∣

∣

|S1∩A||S1| − |S2∩A|

|S2|

∣

∣

∣by counting |S1 ∩ A| and |S2 ∩ A| for every A ∈ HD,

and (iii) finding the maximum.

Algorithm SPD(φA) (Search in Planar Disks using the metric φA) is the same

as SPD(dA) except in step (ii) where the relative empirical measure is computed.

We now analyze the complexity of SPD. The complexities of both SPD(dA)

and SPD(φA) are O(M4) for sample size M = |S1 ∪ S2|. This is because there are

O(M3) disks to consider, and the counting of |S1 ∩ A| and |S2 ∩ A| for each disk

takes M steps.

Search in Axis-aligned Rectangles (SAR)

We now consider the collection A of axis-aligned rectangles. Then we have the

following property:

6This is true if P1,P2 are absolutely continuous, i.e., having pdf, because any measurablefunction has integration 0 on a 0-measure set.

30

Proposition 3

VC-d(A) = 4.

Proof: It is easy to see that VC-d(A) ≥ 4. See Fig. 2.2. The set s1, s2, s3, s4 is

x

y

xmin xmax

ymin

ymax

s1

s2

s3

s4

s5

Figure 2.2: The set s1, s2, s3, s4 is shatterable by axis-aligned rectangles, butthe set s1, s2, s3, s4, s5 is not.

shatterable by A.

For any set S of more than 4 points. Let xmin, xmax, ymin, ymax be the minimum

and the maximum x, y-coordinates for points in S, and let the points with these

coordinates be s1, s2, s3, s4 (some of them can be the same). Then any axis-aligned

rectangle containing s1, s2, s3, s4 contains S. The subset s1, s2, s3, s4 cannot

be obtained by shattering S with A, and S is not shatterable. Hence VC-d(A) ≤ 4.

¥

Given samples S1 and S2, let S = S1 ∪ S2 = (x1, y1), · · · , (xM , yM) where, at

the cost of O(M log M), we may assume that x1 ≤ x2 ≤ · · · ≤ xM . Let the finite

collection HR(S) be defined by

HR(S)∆= R(yi, yj, xm, xn) : (xk, yk) ∈ S, k = i, j,m, n (2.14)

31

where R(yi, yj, xm, xn) is the rectangle defined by the four lines y = yi, y = yj, x =

xm, x = xn. See Figure 2.3.

y1

y4

x2 x3

s1

s2

s3

s4R(y1, y4, x2, x3) ∈ HR

Figure 2.3: Members of HR; : sample point in S1, •: sample point in S2.

Proposition 4 Let A be the class of two dimensional axis-aligned rectangles.

Given S1 and S2, the finite collection HR(S1∪S2) in (2.14) is complete with respect

to A.

The reason for this proposition is that for any axis-aligned rectangle R and

given S, we can find axis-aligned rectangle R′ such that R′∩S = R∩S and R′ has

at least one sample point on each side of the boundary, where points on different

sides are not necessarily different. Since HR includes all those rectangles, it is

complete w.r.t. A.

Algorithm SAR(dA) computes dHR(S1, S2). Because of the ordering in xi’s, the

collection HR allows a recursive calculation of distance measures. Specifically, for

fixed yi and yj s.t. yi ≤ yj, define

fkij(n)

∆= |Sk ∩ R(yi, yj, x1, xn)|/|Sk|, k = 1, 2 (2.15)

Fij(n) = f 1ij(n) − f 2

ij(n) (2.16)

32

Then fkij(n) (n = 1, . . . ,M) can be computed recursively by

fkij(n) =

fkij(n − 1) + 1

|Sk| yn ∈ [yi, yj], (xn, yn) ∈ Sk

fkij(n − 1) otherwise

.

Then find

imax = arg maxn

Fij(n), imin = arg minn

Fij(n)

l∆= minimax, imin + 1, u

∆= maximax, imin

The optimal rectangle, for fixed yi and yj, is then given by R(yi, yj, xl, xu), and

the maximum difference in empirical probabilities is given by Fij(imax) − Fij(imin).

Finally, compute

dHR(S1, S2) = max

i,j:yi≤yj

(Fij(imax) − Fij(imin)).

The pair (i, j) that achieves this maximum gives the best rectangle in HR.

Algorithm SAR(φA) computes φHR(S1, S2). For fixed yi and yj (yi ≤ yj),

we compute f 1ij(n) and f 2

ij(n) for n = 1, . . . ,M as before. Compute empirical

probabilities for every pair xm < xn by

Sk(R(yi, yj, xm, xn)) = fkij(n) − fk

ij(m), k = 1, 2 (2.17)

Then optimizing over all the pairs of x’s and y’s

maxi,j,m,n:

yi≤yj,m<n

|S1(R(yi, yj, xm, xn)) − S2(R(yi, yj, xm, xn))|√

S1(R(yi,yj ,xm,xn))+S2(R(yi,yj ,xm,xn))

2

gives φHR(S1, S2) and the best rectangle.

We now analyze the complexity of Algorithm SAR. SAR(dA) has complexity

O(M3), and SAR(φA) has complexity O(M4). This is because in computing dA

33

we can use the fact that

maxm,n

|(f 1ij(n) − f 1

ij(m)) − (f 2ij(n) − f 2

ij(m))|

= maxm,n

|(f 1ij(n) − f 2

ij(n)) − (f 1ij(m) − f 2

ij(m))| (2.18)

= maxn

(f 1ij(n) − f 2

ij(n)) − minm

(f 1ij(m) − f 2

ij(m)) (2.19)

and reduce the two-variable optimization to two one-variable optimizations, which

are done in linear time. To compute φA, however, we have to check all the O(M2)

(xm, xn) pairs. The search is then repeated for all the O(M2) (yi, yj) pairs. Note

that the VC-dimension of the collection of axis-aligned rectangles is 4 while the

VC dimension of the collection of planar disks is 3, which results in a larger sample

size M for Algorithm SAR as we discuss later.

Search in Axis-aligned Stripes (SAS)

The complexities of algorithms SPD and SAR may be formidable for large M . This

urgent need of reducing complexity gives birth to a simplified algorithm that deals

with axis-aligned stripes. The basic idea is to project sample points onto x and y

coordinates, and then perform change detection/estimation on each coordinate.

Let A be the collection of vertical stripes, i.e., axis-aligned rectangles with

height equal to the field height. Similarly, let B be the collection of horizonal

stripes. The following property is true:

Proposition 5

VC-d(A ∪ B) = 4.

34

Proof: It is easy to see that VC-d(A∪B) ≥ 4. See Fig. 2.4. The set s1, s2, s3, s4

is shatterable by A ∪ B.

x

y

s1s2

s3

s4

Figure 2.4: The set s1, s2, s3, s4 is shatterable by A ∪ B.

For any set S of more than 4 points. Let sl, sr, su, so be the points with

the minimum and the maximum x, y-coordinates in S accordingly (not necessarily

different). Then any vertical stripe containing sl, sr contains S, and any horizonal

stripe containing su, so also contains S. The subset sl, sr, su, so cannot be

obtained by shattering S with A ∪ B, and thus S is not shatterable by A ∪ B.

Hence VC-d(A ∪ B) ≤ 4.

¥

Given a collection of sample points S = S1∪S2, consider finite subsets HV(S) ⊂

A and HH(S) ⊂ B defined by

HV(S)∆= V (xi, xj) : si = (xi, yi), sj = (xj, yj) ∈ S

(2.20)

HH(S)∆= H(yk, yl) : sk = (xk, yk), sl = (xl, yl) ∈ S

(2.21)

where V (xi, xj) is the vertical stripe with left and right boundary xi and xj, and

H(yk, yl) is the horizonal stripe with lower and upper boundary yk and yl. See

Figure 2.5.

35

si

sj

sk

sl

xi xj

yk

yl

H(yk, yl) ∈ HH

V (xi, xj) ∈ HV

Figure 2.5: Members of HV and HH; : sample point in S1, •: sample point in S2.

Proposition 6 Let A be the class of vertical stripes and B be the class of horizonal

stripes. Given S1 and S2, the finite collection HV(S1 ∪ S2) ∪ HH(S1 ∪ S2) defined

in (2.20) and (2.21) is complete with respect to A ∪ B.

The proposition is easy to verify because for any axis-aligned stripe, we can

find another axis-aligned stripe with the same intersection with S and at least one

sample point on each boundary. Thus it suffices to consider stripes with sample

points on the boundary.

Given S, Algorithm SAS(dA) performs the following search

maxA∈HV∪HH

|S1(A) − S2(A)| .

The algorithm includes the following steps: (i) project sample points onto x

and y coordinates; (ii) sort the projected sample points into increasing order; (iii)

in the x coordinate (we have x1 ≤ x2 ≤ · · · ≤ xM), for i = 1, . . . ,M , compute

fkx (i)

∆= Sk(V (0, xi)) (k = 1, 2) recursively by

fkx (i) =

fkx (i − 1) + 1

|Sk| if si ∈ Sk

fkx (i − 1) otherwise

, (2.22)

36

and then compute Fx(i)∆= f 1

x(i) − f 2x(i); compute Fy(j)

∆= S1(H(0, yj)) −

S2(H(0, yj)) similarly; (iv) find

m1 = arg maxi

Fx(i), m2 = arg mini

Fx(i).

n1 = arg maxj

Fy(j), n2 = arg minj

Fy(j).

We then have

maxA∈HV∪HH

|S1(A) − S2(A)|

= max(Fx(m1) − Fx(m2), Fy(n1) − Fy(n2)) (2.23)

and the estimation of the changed area is V (xm1 , xm2) if Fx(m1) − Fx(m2) >

Fy(n1) − Fy(n2), or H(yn1 , yn2) otherwise.

Algorithm SAS(φA) does the same in steps (i),(ii) and (iii), but (iv) is changed

to finding

φHV(S1, S2) = max

i,j:i<j

|S1(V (xi, xj)) − S2(V (xi, xj))|√

S1(V (xi,xj))+S2(V (xi,xj))

2

(2.24)

where Sk(V (xi, xj)) is given by fkx (j) − fk

x (i). φHH(S1, S2) is computed similarly.

Then

φHV∪HH(S1, S2) = max(φHV

(S1, S2), φHH(S1, S2))

and the changed area is the stripe on which the maximum is attained.

Now we analyze the complexities of Algorithm SAS(dA) and Algorithm

SAS(φA). Given M = |S1 ∪ S2|, the complexity of Algorithm SAS(dA) is

O(M log M). This is because by projection, we only need to perform two linear-

complexity searches. Now the dominating part is the sorting of sample points,

37

which takes O(M log M). The complexity of Algorithm SAS(φA) is O(M2) be-

cause in the two-variable optimization there are O(M2) (xi, xj) pairs to consider.

Search in Random Stripes (SRS)

Note that in Algorithm SAS the choice of x and y axes for projection is subjective,

and this choice should be part of algorithm design. When we know nothing about

the change, introducing randomness may give more robustness to the algorithms.

For θ randomly selected from [0, π2], chose Aθ to be the collection of vertical

stripes rotated (counter-clockwise) by θ, and Bθ to be the collection of horizonal

stripes rotated by θ. Define HθV(S) and Hθ

H(S) to be members of Aθ,Bθ accordingly,

with sample points on the boundary, which is similar to definitions (2.20,2.21).

We claim similar properties for Aθ ∪ Bθ and HθV(S) ∪ Hθ

H(S), i.e., VC-d(Aθ ∪

Bθ) = 4 and HθV(S) ∪ Hθ

H(S) is complete with respect to Aθ ∪ Bθ. Note that

introducing θ does not increase the VC-dimension to 5 because the projection

direction is randomly chosen but not optimized over.

Algorithm SRS is a randomized variation of Algorithm SAS. It is based on

the same projection and search idea as in Algorithm SAS. The difference is

when performing the projection, we project sample points onto random directions

instead of the fixed directions of x and y axes. The rest of the algorithm is the

same as Algorithm SAS.

Algorithm SRS has the same order of complexity as Algorithm SAS in

computing both dA and φA. The advantage of Algorithm SRS is that it is more

38

robust than Algorithm SAS. Specifically, as a randomized algorithm, SRS

will perform equally well under a wider range of change patterns (the way change

occurs) while SAS can be affected significantly by the change pattern. For

example, SAS is vulnerable to the pattern where changes always occur along a

tilted line of angle 45 or 135, because in that case the increasing and decreasing

parts of the change will largely get cancelled when projected onto axes.

A quick comment is in order. Both Algorithm SAS and Algorithm SRS can

be easily generalized to algorithms of multiple projections. By doing multiple

projections and line searches, we can increase the accuracy of the algorithm at the

cost of a constant factor increase in the complexity.

2.4.2 Heuristic Algorithms

Some complete algorithms may be good in performance but too expensive to im-

plement in practice, while the simplified complete algorithms SAS and SRS may be

not sensitive enough to detect the changes despite their improved complexities. A

trade-off is heuristic algorithms which have lower complexities than their complete

counterparts and perform reasonably well for certain classes of distributions.

Search in sample-Centered Disks (SCD)

In calculating the distances on HD in SPD, it is difficult to reuse the calculation

since sample-defined disks may overlap in arbitrary ways. We define here a dif-

ferent sub-collection in which disks form nested sets, which allows the recursive

computation of distances.

39

Let A be the collection of two dimensional disks. Given sample S = S1

⋃

S2,

HCD(S) ⊂ A is the sub-collection of sample-centered disks defined by

HCD(S)∆= D′(si, sj) : si, sj ∈ S (2.25)

where D′(si, sj) is the disk with si at the center and sj on the boundary. See

Figure 2.1.

Proposition 7

VC-d(HCD) = 2.

Proof:

It is easy to see that VC-d(HCD) ≥ 2 because any set of two points can be

shattered (a singleton also belongs to HCD).

For any set S of 3 points, i.e., S = s1, s2, s3. Let

|s1s2| = maxi, j∈1, 2, 3

|sisj|.

Then s1, s2 cannot be shattered (i.e., obtained by shattering) because the only

way to shatter it is by D′(s1, s2) or D′(s2, s1), but they both contain s3. Hence

any such S is not shatterable, and VC-d(HCD) ≤ 2.

¥

Unfortunately, HCD is not complete with respect to A. For some classes of

probability distributions, however, it turns out that SCD has the same performance

40

as SPD asymptotically. For example, if there exists some center point such that any

neighborhood around the center has reasonably high probability, SCD is expected

to perform almost as well as SPD. Generally, if probability measures Pi are such

that any disk with positive Lebesgue measure has positive probability, then the

loss of performance vanishes asymptotically. Consider a disk and an arbitrary

neighborhood of its center, the strong law of large numbers guarantees that as

sample size goes to infinity, there is a sample point within this neighborhood of the

center almost surely. This implies that as sample size goes to infinity, Algorithm

SCD will give the same output as Algorithm SPD, i.e., the search of SCD is

asymptotically complete.

Algorithm SCD(dA) computes

maxA∈HCD

|S1(A) − S2(A)| .

The presence of increasing subsets allows the counting procedure to be incremental,

i.e. fix a center and count the number of sample points recursively from the inner-

most disk to the outer-most disk.

Algorithm SCD(dA) does the following:

Fix a center si and define

Fi(j)∆= S1(D

′(si, sj)) − S2(D′(si, sj)) (2.26)

where Sk(D′(si, sj)), k ∈ 1, 2 is the empirical probability of D′(si, sj) in Sk. First

sort the sample points into increasing order sj1 , sj2 , . . . according to their distance

to si7 (sj1 = si), and then set Fi(j0) = 0 and compute Fi(jk) (k = 1, 2, . . . ,M)

7This sort is at the cost of O(M log M).

41

recursively by

Fi(jk) =

Fi(jk−1) + 1|S1| if sjk

∈ S1

Fi(jk−1) − 1|S2| if sjk

∈ S2

.

Next compute

j∗(i) = arg maxj

|Fi(j)|. (2.27)

The search is repeated for all possible si. Finally, we find the maximum among

|Fi(j∗(i))|,∀i, i.e.

imax = arg maxi

|Fi(j∗(i))|. (2.28)

Then the optimal disk in HCD for A-distance is given by D′(simax , sj∗(imax)), and

the maximum difference is

maxA∈HCD

|S1(A) − S2(A)| = |Fimax(j∗(imax))|.

Algorithm SCD(φA) computes

maxA∈HCD

|S1(A) − S2(A)|√

S1(A)+S2(A)2

.

Clearly when computing Fi(j), we can get S1(D′(si, sj)) and S2(D

′(si, sj)) by sim-

ilar update, so we can compute

Gi(j) =|S1(D

′(si, sj)) − S2(D′(si, sj))|

√

S1(D′(si,sj))+S2(D′(si,sj))

2

.

Then

maxA∈HCD

|S1(A) − S2(A)|√

S1(A)+S2(A)2

= maxi,j

Gi(j).

The complexities of Algorithm SCD(dA) and Algorithm SCD(φA) are of the

same order. Their complexity, compared with the O(M4) complexity of Algorithm

42

SPD, is reduced to O(M2 log M). The dominating term is the sorting of the

sample points according to their distances to a certain sample point, which takes

O(M log M) for each center, and is repeated for M centers.

Search in Diagonal-defined axis-aligned Rectangles (SDR)

Algorithm SDR is a heuristic simplification of Algorithm SAR. A major drawback

of Algorithm SAR is that it is much slower in computing φA distance (O(M4)

compared to O(M3) in computing dA). Aiming at reducing the cost of computing

φA for rectangles, we propose a simplified variation of SAR: Algorithm SDR.

Inspired by Kolmogorov-Smirnov two-sample test [13], we reduce the search to the

class of axis-aligned rectangles having sample points on diagonal vertices.

Let A be the collection of axis-aligned rectangles. Given sample S = S1 ∪ S2,

consider the following finite subset of A defined by

HDR(S)∆= R(yi, yj, xm, xn) : (xm, yi), (xn, yj) ∈ S

or (xm, yj), (xn, yi) ∈ S (2.29)

where R(yi, yj, xm, xn) is the axis-aligned rectangle defined as in (2.14). See

Fig. 2.6.

Proposition 8

VC-d(HDR) = 2.

Proof: It is easy to see that VC-d(HDR) ≥ 2 because any set of two points can be

shattered (a singleton also belongs to HDR).

43

s1

s2

xmxn

yi

yj

R(yi, yj, xm, xn)I

II

III

IV

Figure 2.6: Members of HDR; : sample point in S1, •: sample point in S2.

For any set S of 3 points, i.e., S = s1, s2, s3. If there is no set in HDR

containing S, then S is not shatterable. Otherwise, let s1, s2 be the points defining

such a set, i.e., the axis-aligned rectangle with diagonal vertices s1, s2 contains

S. Then s1, s2 cannot be shattered because the only way to shatter it is by

the axis-aligned rectangle with s1, s2 as diagonal vertices, but this rectangle also

contains s3. Hence VC-d(HDR) ≤ 2.

¥

HDR is not complete w.r.t. A. However, by the same argument as in Algorithm

SCD, we see that if the probability distributions are such that any disk with positive

measure has positive probability, the loss of performance vanishes as sample size

goes to infinity.

Algorithm SDR(dA) and Algorithm SDR(φA) share the following steps:

Initially, the algorithm builds two matrices C1 and C2 to store the empirical

cdf(cumulative distribution function) of S1 and S2. Specifically, assuming x1 ≤

44

x2 ≤ . . . ≤ xM , and y1 ≤ y2 ≤ . . . ≤ yM , define

Ck(j, i)∆= |Sk ∩ R(0, yj, 0, xi)|/|Sk|, k = 1, 2.

Construct C1 and C2 recursively:

(i) Sort S by the abscissa and ordinates respectively;

Define function δk : 1, . . . ,M → 0, 1, k = 1, 2,

δk(j) = 1 if the sensor with ordinate yj belongs to Sk.

Define function g : 1, . . . ,M → 1, . . . ,M,

g(j) = i if (xi, yj) ∈ S.

(ii) Compute the first row:

Ck(1,m) =δk(1)

|Sk|if m ≥ g(1) (2.30)

= 0 otherwise (2.31)

k ∈ 1, 2, m = 1, . . . ,M .

(iii) Compute the j-th row, j = 2, . . . ,M :

Ck(j,m) = Ck(j − 1,m) +δk(j)

|Sk|if m ≥ g(j) (2.32)

= Ck(j − 1,m) otherwise (2.33)

k ∈ 1, 2, m = 1, . . . ,M .

Then compute empirical probabilities for members of HDR: for every rectangle

45

R(yi, yj, xm, xn) ∈ HDR, i ≤ j,m ≤ n, its empirical probabilities are given by

Sk(R(yi, yj, xm, xn)) =

S ′k(R(yi, yj, xm, xn)) + δk(i)

|Sk| if (xm, yi) ∈ S

S ′k(R(yi, yj, xm, xn)) + δk(i)+δk(j)

|Sk| o.w.(2.34)

where

S ′k(R(yi, yj, xm, xn)) = Ck(j, n) − Ck(i, n)

−Ck(j,m) + Ck(i,m), (2.35)

k ∈ 1, 2. As seen in Fig. 2.6, the probability of the bold rectangle is the proba-

bility of I minus that of II, minus III, and plus IV, and we need the amendments

to take care of boundary points.

Then Algorithm SDR(dA) computes

maxR∈HDR

|S1(R) − S2(R)| ,

and Algorithm SDR(φA) computes

maxR∈HDR

|S1(R) − S2(R)|√

S1(R)+S2(R)2

.

Algorithm SDR(dA) and Algorithm SDR(φA) both have complexity O(M2) be-

cause constructing matrices C1 and C2 takes O(M2) steps and the search exhausts

the O(M2) rectangles in HDR. Note that this algorithm requires a substantial

amount of space: O(M2), which is due to the space to store C1 and C2.

46

2.5 Simulation

2.5.1 Simulation Setup

In the simulation, we consider the case when the distribution of collected sensors

is a mixture of 2D uniform distributions, one on an s × s square D and the other

centered at x0 ∈ D with radius r. Specifically, the PDF of the 2D random vector

x is given by

px0(x) =

pπr2p+(s2−πr2)q

x ∈ D, ||x − x0|| ≤ r

qπr2p+(s2−πr2)q

x ∈ D, ||x − x0|| > r

0 otherwise

where x0, p, q, and r are parameters, 0 < r << s and 0 ≤ q < p ≤ 1.

This model corresponds to the scenario when sensors are uniformly distributed

in D, and a sensor is alarmed with probability p if it is within distance r from x0 ∈

D and q if it falls outside this distance. If we view the disk x ∈ D : ||x−x0|| ≤ r

as the area where a noiseless sensor measurement should be “alarm” and the area

outside this disk be where a noiseless measurement should be “non-alarm”, then

1− p is the (uniform) miss detection probability and q is the (uniform) false alarm

probability at sensors.

Under hypothesis H0, two sets of sample points are drawn i.i.d. from the same

px0 ; under H1, one set of sample points are drawn from px0 , and the other set of

sample points are drawn independently from px′0

for some other center x′0.

47

2.5.2 Detector Sensitivity

We consider Neyman-Pearson detection with detector size α, and choose detection

threshold according to (2.12) to guarantee that the detector’s false alarm will not

exceed α.

Recalling that ǫ(n) measures detector sensitivity, we examine the relation be-

tween ǫ(n), the VC-dimension and the distance measure. Note that for fixed

false alarm, we need more sample points to achieve the same threshold for a

test searching in a class of larger VC-dimension. For searches in classes of the

same VC-dimension, the test using relative A-distance needs less sample points to

achieve the same threshold than the one using A-distance. See Fig.2.7.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4Theoretical Threshold w.r.t. Sample Size false alarm: α = 0.05

sample size

dete

ctio

n th

resh

old

dA: VC−d = 2

VC−d = 3 VC−d = 4 φ

A: VC−d = 2

VC−d = 3 VC−d = 4

Figure 2.7: Detection threshold as a function of the sample size for differentVC-dimension’s

Fig.2.8 shows that the detection threshold is not sensitive to the maximum

false alarm α. We see that given a certain sample size, a detector with a larger size

would not have a much smaller detection threshold. Hence increasing the sample

size is usually the only way to improve the accuracy of the detector.

48

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Theoretical Threshold w.r.t. False Alarm Probability VC−dimension = 2

size

dete

ctio

n th

resh

old

dA: n = 1000

n = 2000 n = 3000 φ

A: n = 1000

n = 2000 n = 3000

Figure 2.8: Detection threshold as a function of the detector size for differentsample sizes

2.5.3 Performance

We focus on miss detection in our Monte Carlo simulations. Fig. 2.9 and Fig. 2.10

show the miss detection probability vs. sample size. We observe that there is a

threshold sample size beyond which the miss detection probability drops sharply.

This can be explained using Theorem 1, which states that the upper bound on

miss detection probability begins to drop when ǫ(n) < dA(P1, P2) for δdA or ǫ(n) <

φA(P1, P2) for δφA, and once it starts to drop, it drops exponentially. A heuristic

argument on the minimum sample size would be that the sample size n should be

s.t.

ǫ(n) =

√

32

nlog

8(2n + 1)d

α≤ dA(P1, P2) for δdA

(2.36)

ǫ(n) =

√

4

nlog

2(2n + 1)d

α≤ φA(P1, P2) for δφA

(2.37)

If we know P1 and P2, we can calculate dA(P1, P2) and φA(P1,2 ) to obtain a

lower bound on n by solving the inequalities (2.36) and (2.37). An observation is

49

that this estimation is close to the minimum sample size required in the simulation.

For example, in our simulation setup, the estimated minimum sample sizes for

Algorithm SAS and SCD using A-distance metric are both 2725, and that for

SCD using relative A-distance metric is 53. As indicated in Fig. 2.9 and Fig. 2.10,

they all agree well to the sharp drop in missing detection probabilities.

1000 2000 3000 4000 5000 6000 7000 800010

−2

10−1

100

n

PM

Miss Detection vs. Sample Size size: α = 0.05

Algorithm SAS(dA)

Algorithm SRS(dA)

Algorithm SCD(dA)

Algorithm SDR(dA)

Figure 2.9: Miss detection probability of δdA as a function of the sample size:simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs.

0 100 200 300 400 500 600 700 800 900 100010

−3

10−2

10−1

100

n

PM

Miss Detection vs. Sample Size size: α = 0.05

Algorithm SAS(φA)

Algorithm SRS(φA)

Algorithm SCD(φA)

Algorithm SDR(φA)

Figure 2.10: Miss detection probability of δφAas a function of the sample size:

simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs.

As expected, both threshold and miss detection probability are decreasing func-

tions of sample size, which reflects a trade-off between detection precision and

50

sampling time, energy consumed and data processing expense.

We also plot the detection probability w.r.t. the size of the detector. See

Fig. 2.11 and Fig. 2.12. The plot shows the detection probability does not increase

significantly with the increase of the detector size, which is expected because the

size affects detection probability only through the threshold, and the threshold is

not sensitive to the change of size (see Fig. 2.8).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.905

0.91

0.915

0.92

0.925

0.93

0.935

0.94

Detection Prob. vs. Detector Size: δ(dA) sample size = 3000

size

dete

ctio

n pr

obab

ility

Algorithm SASAlgorithm SRSAlgorithm SCDAlgorithm SDR

Figure 2.11: Detection probability of δdA as a function of detector size, 1000Monte Carlo runs.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

Detection Prob. vs. Detector Size: δ(φA) sample size = 100

size

dete

ctio

n pr

obab

ility

Algorithm SASAlgorithm SRSAlgorithm SCDAlgorithm SDR

Figure 2.12: Detection probability of δφAas a function of detector size, 10000

Monte Carlo runs.

Note that by choosing the threshold from the upper bound in (2.38) and (2.41),

we only guarantee the false alarm is upper bounded by α. Our simulation shows

51

the actual false alarm probability can be much less than the size of the detector8,

which implies that the theoretical threshold is a loose upper bound of the actual

minimum threshold needed to guarantee the required detector size. This is be-

cause of the nonparametric nature of the theoretical threshold. This threshold is

proved to satisfy the size constraint under arbitrary distributions by the Vapnik-

Chervonenkis Theory. Therefore for a given distribution, this threshold may be

loose.

For comparison among the algorithms, an obvious observation is that δφAout-

performs δdA in detection probability. This is because on one hand, given n and α,

using (2.36,2.37) to choose threshold yields that ǫ(n) for φA is 1/2√

2 smaller than

that for dA; on the other hand, we have φA(S1, S2) ≥ dA(S1, S2). Therefore in our

simulation it is easier for algorithms using statistic φA(S1, S2) to detect a change.

However, this is caused by the specific way to decide the detection threshold, and

does not imply that δφAis uniformly better than δdA .

An intuitive guideline in algorithm design is that the better sets in A separate

the probability mass in P1 and P2 and the simpler A is, the better the detector

performance is, e.g. Algorithm SCD performs better than Algorithm SAS and

SRS. Moreover, we can introduce random factors into the algorithm to make it

more robust, e.g. we randomize SAS to be SRS so as to make it independent of

the direction in which change occurs.

8For example, in our simulation of Algorithm SAS and SRS, for sample size up to 10, 000using 1000 Monte Carlo runs, we encounter no false alarm at all.

52

2.6 Extension to Finite-level Sensor Measurements

We have presented our results based on collecting sensor locations of sensors with

the same report (i.e., “alarm”). Extension can be made to applications with finite-

level sensor measurements.

Without loss of generality, let each sensor report either it is alarmed (say,

measurement level 1) or it is not alarmed (level 0). In such a case, the ith data

collection is modelled by probability space (Ω×0, 1,F , Pi) where F is a σ-field on

Ω× 0, 1. Let random variable x ∈ Ω denote the sensor location, and L ∈ 0, 1

denote the sensor report. In the ith collection, (x, L) has joint distribution Pi, and

the location of alarmed sensors has conditional distribution Pi|L=1. It is easy to

see that there are cases when Pi changes but Pi|L=1 does not. Hence by collecting

both types of sensor reports, we are able to detect a wider range of changes.

To apply the algorithms presented previously, choose class A′ to be the collec-

tion of sets from A in either 0-plane or 1-plane, i.e., A′ = A×0, 1. For instance,

the collection of planar disks becomes the collection of planar disks with either

measurement 0 or measurement 1. Algorithms should be applied to both 0-plane

and 1-plane and we choose the larger as the test statistics dA(S1, S2) or φA(S1, S2).

The detection and estimation performance guarantee still holds, but note that the

sample size now becomes the total number of sensor reports collected (rather than

the number of alarms collected). Note that the VC-dimension of such a class A′

remains the same as that of A:

Proposition 9 For a class A of planar sets,

VC-d(A× 0, 1) = VC-d(A).

53

Proof:

It is easy to see that VC-d(A× 0, 1) ≥VC-d(A).

For any set S, if S contains points from different planes, S is not shatterable

because no set in A × 0, 1 contains points from different planes. If S only

contains points in one plane, it is shatterable only if |S| ≤ VC-d(A). Therefore,

VC-d(A× 0, 1) ≤VC-d(A).

¥

2.7 Summary

We have presented in this chapter a nonparametric approach to the detection of

changes in the distribution of alarmed sensors. We have provided exponential

bounds for the miss detection and false alarm probabilities. The error exponents

of these probabilities provide useful guideline for determining the number of sample

points required.

We have also proposed several nonparametric change detection and estimation

algorithms. Here we have aimed at reducing the computation complexity while

preserving the theoretical performance guarantee by using recursive search strate-

gies that reuse earlier computations, which gives us two near linear-complexity

algorithms SAS and SRS. The more expensive algorithms SCD and SDR also have

their roles, despite their near square cost, especially in detecting changes of highly

clustered distributions. This is because the search classes in Algorithm SCD and

SDR may yield larger distance than the more simplified classes, which in turn gives

54

larger error exponents as indicated in Theorem 1. Moreover, Algorithm SCD is

much more efficient than the exhaustive algorithm SPD with complexity O(M4),

and Algorithm SDR also improves the complexity of its exhaustive counterpart

Algorithm SAR significantly. Complexities of different algorithms presented so far

are summed up in the following table.

Table 2.1: Time Complexity ComparisondA φA

SPD O(M4) O(M4)SCD O(M2 log M) O(M2 log M)SAR O(M3) O(M4)SDR O(M2) O(M2)SAS O(M log M) O(M2)SRS O(M log M) O(M2)

Besides running time, one may also care about the amount of storage used for

executing the algorithms. Obviously O(M) space is needed to store S1 and S2,

and the extra space needed scales as follows:

Table 2.2: Space Complexity ComparisondA φA

SPD O(1) O(1)SCD O(1) O(1)SAR O(1) O(M)SDR O(M2) O(M2)SAS O(1) O(M)SRS O(1) O(M)

Comparing these tables, one can see the time-space trade-off in algorithm de-

sign. For example, although Algorithm SDR has comparable running time with

Algorithm SCD, it requires much more space to execute, i.e. O(M2) instead of

O(1). The choice of algorithm should be a trade-off between running time, space

requirement and detection performance, with the significance of each highly de-

pendent on applications.

55

One should be further cautioned that the techniques considered in this chapter

typically require a large number of sample points. Since no information about the

distribution is used, and the performance guarantee must hold for all distributions,

bounds derived here are conservative. While we have adhered to the principle

of nonparametric approach, the incorporation of certain prior knowledge about

the distribution, in the selection of A for example, would lead to more effective

detection and estimation schemes in practice.

56

APPENDIX 2.A

PROOF OF CHAPTER 2

Proof of Theorem 1

We first prove the theorem for detectors using the A-distance metric dA(S1, S2) =

supA∈A |S1(A) − S2(A)|. From [17], we have

Pr∃A ∈ A, ||P1(A) − P2(A)| − |S1(A) − S2(A)|| > ǫ

≤ 8(2n + 1)de−nǫ2/32 (2.38)

Under H0, P1 = P2, and the false alarm probability satisfies

PF (δ) = PrdA(S1, S2) > ǫ;H0

= Pr∃A ∈ A, |S1(A) − S2(A)| > ǫ;H0

= Pr∃A ∈ A, ||P1(A) − P2(A)|

−|S1(A) − S2(A)|| > ǫ;H0

≤ 8(2n + 1)de−nǫ2/32 (2.39)

where inequality (2.39) follows from (2.38).

57

For the miss probability, let A∗ = arg maxA∈A |P1(A) − P2(A)|.

PM(δ, P1, P2) = PrdA(S1, S2) ≤ ǫ; P1, P2

≤ Pr|S1(A∗) − S2(A

∗)| ≤ ǫ; P1, P2

≤ Pr||P1(A∗) − P2(A

∗)|

−|S1(A∗) − S2(A

∗)||

≥ ||P1(A∗) − P2(A

∗)| − ǫ| ; P1, P2

≤ 8(2n + 1)de−n[|P1(A∗)−P2(A∗)|−ǫ]2/32

(2.40)

Now consider relative distance. The proof for relative distance metric goes line

by line as that for the non-relative metric, replacing inequality (2.38) with the

following results from [17],

P 2n(φA(S1, S2) > ǫ) ≤ 2(2n + 1)de−nǫ2/4 (2.41)

P 2n[|φA(P1, P2) − φA(S1, S2)| > ǫ]

≤ 16(2n + 1)de−nǫ2/16 (2.42)

We have

PF (δ) ≤ 2(2n + 1)de−nǫ2/4 (2.43)

PM(δ, P1, P2) ≤ 16(2n + 1)de−n[φA(P1,P2)−ǫ]2/16 (2.44)

¥

Proof of Theorem 2

Let V C-d(A) = d < ∞. We first prove the theorem for A-distance.

58

Let

A∗dA

= arg maxB∈A

|P1(B) − P2(B)|,

and define η to be

η∆= |P1(A

∗dA

) − P2(A∗dA

)| − supB∈A

B 6=A∗dA

|P1(B) − P2(B)|.

Then η > 0.

By results of [17], we have

PrsupB∈A

||P1(B) − P2(B)| − |S1(B) − S2(B)|| ≤ η

3

≥ 1 − 8(2n + 1)de−nη2/288.

So with probability ≥ 1 − 8(2n + 1)de−nη2/288,

|S1(A∗dA

) − S2(A∗dA

)| − supB∈A

B 6=A∗dA

|S1(B) − S2(B)|

≥ |P1(A∗dA

) − P2(A∗dA

)| − supB∈A

B 6=A∗dA

|P1(B) − P2(B)|

−||S1(A∗dA

) − S2(A∗dA

)| − |P1(A∗dA

) − P2(A∗dA

)||

−| supB∈A

B 6=A∗dA

|S1(B) − S2(B)| − supB∈A

B 6=A∗dA

|P1(B) − P2(B)|| (2.45)

≥ η − 2 supB∈A

||P1(B) − P2(B)| − |S1(B) − S2(B)|| (2.46)

≥ η3

(2.47)

That is,

PrA∗dA

= arg maxB∈A

|S1(B) − S2(B)| ≥ 1 − 8(2n + 1)de−nη2/288.

As n → ∞, we see that

limn→∞

PrA∗dA

= arg maxB∈A

|S1(B) − S2(B)| = 1.

59

For relative A-distance, let

A∗φA

= arg maxB∈A

|P1(B) − P2(B)|√

P1(B)+P2(B)2

.

Let

η∆= fφ(P1(A

∗φA

), P2(A∗φA

)) − supB∈A

B 6=A∗φA

fφ(P1(B), P2(B)).

Then η > 0.

In [2] it was proved that fφ(x, y) is a metric on [0, 1]. The rest of the proof is

similar to that of A-distance. By [17] we have

Pr(supB∈A

fφ(Si(B), Pi(B)) ≤ η

5) ≥ 1 − 8(2n + 1)de−nη2/100,

i = 1, 2.

Thus with probability ≥ [1 − 8(2n + 1)de−nη2/100]2, we have

fφ(S1(A∗φA

), S2(A∗φA

)) − supB∈A

B 6=A∗φA

fφ(S1(B), S2(B))

≥ fφ(P1(A∗φA

), P2(A∗φA

)) − supB∈A

B 6=A∗φA

fφ(P1(B), P2(B))

−|fφ(P1(A∗φA

), P2(A∗φA

)) − fφ(S1(A∗φA

), S2(A∗φA

))|

−| supB∈A

B 6=A∗φA

fφ(P1(B), P2(B)) − supB∈A

B 6=A∗φA

fφ(S1(B), S2(B))| (2.48)

≥ η − fφ(P1(A∗φA

), S1(A∗φA

)) − fφ(P2(A∗φA

), S2(A∗φA

))

− supB∈A

B 6=A∗φA

fφ(P1(B), S1(B)) − supB∈A

B 6=A∗φA

fφ(P2(B), S2(B)) (2.49)

≥ η − 2 supB∈A

fφ(P1(B), S1(B)) − 2 supB∈A

fφ(P2(B), S2(B))

(2.50)

≥ η5

(2.51)

60

That is,

PrA∗φA

= arg maxB∈A

|S1(B) − S2(B)|√

S1(B)+S2(B)2

≥ [1 − 8(2n + 1)de−nη2/100]2.

Letting n → ∞ completes the proof.

¥

61

Chapter 3

Detecting Information Flows Without

Chaff Noise

3.1 Outline

This chapter addresses centralized detection of information flows without chaff

noise. The detection procedure is decomposed into pairwise detection of 2-hop

flows. Section 3.2 gives a mathematical definition of the problem. Section 3.3

presents a packet matching algorithm for detecting information flows with bounded

delay. Section 3.4 presents a variation-based algorithm for detecting information

flows with bounded memory. Section 3.5 compares the performance of the pro-

posed algorithms with existing algorithms. Section 3.6 verifies the performance by

simulations. Section 3.7 concludes the chapter with remarks on the application of

the proposed detection schemes.

3.2 Problem Formulation

3.2.1 Notations

For the ease of presentation, we use the convention that boldface letters denote

vectors, plain letters denote scalars, uppercase letters denote random variables (or

stochastic processes), and lowercase letters denote realizations. For example, we

denote a point process by S, its realization by s, the kth epoch in S by S(k),

62

and the kth epoch in s by s(k). Given a realization s, we use the script letter S

to denote the set of elements in this realization. Given two realizations of point

processes (a1, a2, . . .) and (b1, b2, . . .),⊕

is the superposition operator defined as

(ak)∞k=1

⊕

(bk)∞k=1 = (ck)

∞k=1, where c1 ≤ c2 ≤ . . . and ak∞k=1 ∪ bk∞k=1 = ck∞k=1.

3.2.2 Flow Models

Consider two nodes of interest, denoted by A and B, in a wireless sensor network,

as illustrated in Fig. 3.1. Let the transmission epochs of each node be represented

by a point process

Si = (Si(1), Si(2), Si(3), . . .), i = 1, 2,

where Si(k) (k ≥ 1) is the kth transmission epoch1 of the node (let i = 1 for node

A, and i = 2 for node B).

A B

S1 : S2 :

Figure 3.1: Detecting information flows through nodes A and B by analyzingtheir transmission activities S1 and S2.

If A and B are carrying an information flow and not involved in other trans-

missions, then (S1, S2) needs to satisfy the following definition.

Definition 7 A pair of processes (F1, F2) is an information flow if for every

realization, there exists a bijection g : F1 → F2 such that g(s) − s ≥ 0 for all

1Assume no simultaneous transmissions.

63

s ∈ F1. For an information flow with bounded delay ∆, g(s) − s ≤ ∆ for all

s ∈ F1. For an information flow with bounded memory M , g satisfies that

0 ≤ |F1 ∩ [0, t]| − |F2 ∩ [0, t]| ≤ M, ∀t ≥ 0. (3.1)

The bijection g is a mapping between the transmission and the relay epochs of

the same packets at nodes A and B, allowing permutation of order during the relay.

The condition that g is a bijection imposes a packet-conservation constraint, i.e.,

no packets are generated or dropped at the relay node. The condition g(s)− s ≥ 0

is the causality constraint, which means that a packet cannot leave a node before

it arrives. Communication constraints are additional constraints on g which are

imposed by the requirement of reliable communications. We consider two types of

commonly encountered constraints: bounded delay constraint and bounded mem-

ory constraint. The condition g(s) − s ≤ ∆ is a bounded delay constraint which

implies that the maximum delay at the relay node is uniformly bounded by ∆.

This condition was first proposed by Donoho et al. in [11]. The condition in (3.1)

is a bounded memory constraint which implies that the relay node has a limited

memory that can store at most M relay packets2. The values of ∆ and M are

assumed to be known.

There is a natural correlation between the bounded delay model and the

bounded memory model by Little’s Theorem [4]. If M = λ∆, where λ is the

rate of information flow, then in the bounded delay model, the maximum delay is

bounded by ∆ and the average memory size by M , whereas in the bounded mem-

ory model, the maximum memory size is bounded by M and the average delay by

∆. The two models are fundamentally different. It has been shown that the two

2Similar requirement on buffer size has been considered by Giles and Hajek in the context oftiming channels [14].

64

models have very different scaling behavior on the mutual information between F1

and F2 [14]. In Section 3.5.2, we will show that they also have difference detection

performance with respect to changes in traffic rate.

3.2.3 Hypotheses

We want to test the following hypotheses:

H0 : S1 and S2 are independent,

H1 : (S1, S2) = (F1, F2)

by observing3 Si (i = 1, 2) for a finite time t (t > 0), where (F1, F2) is an infor-

mation flow (with bounded delay or memory). This is a partially nonparametric

hypothesis testing problem. No statistical assumptions are made for Fi (i = 1, 2)

although under H0, Si (i = 1, 2) are assumed to be Poisson processes in the

analysis.

3.3 Detecting Information Flows with Bounded Delay

In this section, we consider detecting information flows with bounded delay. We

propose a linear-time, packet matching-based algorithm, called “Detect-Match”

(DM).

Given measurements (s1, s2), we want to match the packets in s1 with their

possible relays in s2 subject to the delay bound ∆. Under H1, there must be at

least one way of matching packets that satisfies the causality and the bounded

3Note that under H1, the detector may not observe the beginning of the information flow.

65

delay constraints, i.e., the matching induced by the mapping g. For pairs of

independent processes, however, such matching is not always possible. Algorithm

DM is designed based on this observation. Note, however, that if we exhaustively

search all the matchings, the complexity of the algorithm will be exponential4.

Instead of looking for arbitrary matching, we prove that it suffices to search for

matchings which preserve the order of incoming packets, as stated in the following

proposition.

Proposition 10 If matching sk with s′k (k ≥ 1) satisfies the causality and the

bounded delay constraints, then there exists a permutation (jk)∞k=1 such that match-

ing sk with s′jkalso satisfies the above constraints, and this matching is order-

preserving, i.e., sk ≤ sl if and only if s′jk≤ s′jl

for all (k, l).

Proof: As illustrated in Fig. 3.2, if matching sk with s′k (k = 1, 2) satisfies the

causality and the bounded delay constraints, then we can match s1 with s′2 and

s2 with s′1 such that the matching still satisfies the constraints, but the order of

packets is preserved. By induction on the number of out-of-order pairs, we can

reorder matchings of any length into a matching that satisfies the constraints and

is order-preserving.

¥

By Proposition 10, it suffices to only consider the matchings that preserve the

order of packets, which reduces the problem to finding the match of the first packet

4For example, if there are at most L transmissions during time ∆, then the exhaustive searchfor a length-n matching has complexity O(Ln).

66

s1

s2

s1 s2

s′2 s′1Figure 3.2: Both the solid and the dotted lines denote matchings that are causal

and bounded in delay, but the dotted lines also preserve the order ofincoming packets.

in s1. Based on this idea, we develop the following detector5:

δDM(s1, s2) =

1 if ∃ m ∈ [l, u] s.t. s2(k + m − 1) − s1(k) ∈ [0, ∆] for

k = 1, . . . , n,

0 o.w.,

where n = |S1|, s2(l) is the first epoch in s2 after s1(1), and s2(u) is the last

epoch in s2 before s1(1) + ∆ (including boundaries). See Appendix 3.B for its

implementation.

Now we analyze the performance of DM. We claim that DM can detect all the

information flows with bounded delay ∆, i.e., there is no miss detection. Specif-

ically, we have shown by Proposition 10 that an information flow with bounded

delay ∆ must have an order-preserving matching which satisfies the causality and

the bounded delay constraints. Since the match of the first packet in s1 must be

in the interval [s1(1), s1(1) + ∆], which is equivalent to m ∈ [l, u] (as illustrated

in Fig. 3.3), DM must be able to detect such flows.

Next we examine the false alarm probability of DM.

Theorem 3 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,

5We use the convention that the detector gives the value 1 for H1 and 0 for H0.

67

s1

s2

s1(1)

∆

s2(u)

s2(l)

Figure 3.3: Finding the match of s1(1): there are three candidates in the∆-length interval following s1(1).

respectively, then the false alarm probability of DM satisfies

PF (δDM) ≤ γn−1,

where γ = 1 − e−λ1λ2∆/(λ1+λ2).


¥

Remark: Theorem 3 gives a few insights into the problem. Since γ ≤ 1 −

e−min(λ1, λ2)∆, we have γ → 0 if min(λ1, λ2) → 0, i.e., DM almost never falsely

detects slow independent traffic. Intuitively, it is easier to match two processes of

equal rate. This intuition is strengthened by Theorem 3 because γ ≤ 1 − e−λ∆/2,

where λ = max(λ1, λ2), and thus the upper bound for equal rate is larger.

In Neyman-Pearson framework, we can estimate the sample size required by

DM to achieve a given false alarm probability α by calculating the value n that

makes the upper bound in Theorem 3 equal to α, that is,

n = log α/ log γ + 1.

68

For example, if λ1 = λ2 = 1, and ∆ = 10, then a match length 682 suffices to

guarantee a false alarm probability bounded by 1%. Note that for this match

length, DM needs up to 2n + λ2∆ = 1374 packets on the average to find a valid

match.

3.4 Detecting Information Flows with Bounded Memory

In this section, we consider the problem of detecting information flows when the

relay node has limited memory. Specifically, assuming that the node’s memory

can hold at most M packets, we use the property that the difference between the

number of incoming packets and the number of departure packets never exceeds

M during any period of time. Based on this property, we derive a counting-based

algorithm—“Detect-Maximum-Variation” (DMV).

A few definitions are needed to present the algorithm. Given realizations si

(i = 1, 2), let (sw)w≥1 = s1

⊕

s2. Let ni(w) (i = 1, 2) be the number of packets in

si when the total number of packets is w, i.e.,

ni(w)∆=

w∑

j=1

Isj∈Si,

where I· is the indicator function. Sample paths of n1(w) and n2(w) are illustrated

in Fig. 3.4. (a). Define the cumulative difference between s1 and s2 as

d(w)∆= n1(w) − n2(w),

and let the maximum variation of d(w) be

v(w)∆= max

1≤i≤wd(i) − min

1≤i≤wd(i).

See Fig. 3.4. (b) for an illustration of d(w) and v(w).

69

(a) (b)w w0

0

d(w)

n1(w)

n2(w)

v(w)

Figure 3.4: (a) the cumulative counting functions ni(w) (i = 1, 2); (b) thecumulative difference d(w) and the maximum variation v(w).

If (s1, s2) is a realization of an information flow with bounded memory, then

the sample path of d(w) will have bounded variation. Specifically, note that

v(w) = max1≤i≤j≤w

|d(j) − d(i)|

= max1≤i≤j≤w

|(n1(j) − n1(i)) − (n2(j) − n2(i))|,

where |(n1(j) − n1(i)) − (n2(j) − n2(i))| is the difference in the numbers of pack-

ets in s1 and s2 between the ith and the jth packets. For memory bound M ,

this difference is bounded by M , i.e., v(w) ≤ M, ∀w. Algorithm DMV detects

information flows with bounded memory based on the maximum variation. The

detector is defined as follows:

δDMV(s1, s2) =

1 if v(n) ≤ M,

0 o.w.,

where n = |S1| + |S2|. An implementation of the detector can be found in Ap-

pendix 3.B.

Since any information flow with bounded memory M will be detected after n

packets, i.e., miss detection is totally avoided, we only need to take care of the

false alarm probability, which is bounded as follows.

Theorem 4 For independent Poisson processes, the false alarm probability of

DMV satisfies

PF (δDMV) ≤ (M + 1)

1 − ρρn,

70

where ρ = cos πM+2

. Furthermore, if the two processes have the same rate, then the

upper bound is tight with respect to the error exponent, i.e.,

limn→∞

− 1

nlog PF (δDMV) = − log ρ.


¥

Remarks: For a given false alarm constraint α, we can guarantee the satis-

faction of this constraint by making the upper bound in Theorem 4 equal to α,

yielding a sample size

n =log α(1 − ρ) − log (M + 1)

log ρ(3.2)

which grows as O(M2 log Mα

) as M → ∞ and α → 0. For example, if M = 20, (3.2)

says that using 1196 packets will guarantee a false alarm probability no greater

than 1%.

3.5 Comparing the Algorithms

We have introduced detection algorithms under the bounded delay and the

bounded memory models. In practice, the information flows of interest may sat-

isfy the assumptions of more than one detection algorithm. This section aims at

comparing the performance of different algorithms when they are both applicable.

71

3.5.1 DMV vs. DA

Blum et al. [5] consider the detection of information flows that satisfy both the

bounded delay and the bounded peak rate conditions. The underlying idea is that

in interactive information flows, usually not only is the delay bounded, but the peak

rate at which packets are issued is also bounded. Specifically, consider information

flows in which the maximum delay is bounded by ∆, and the maximum number

of arrivals within time t (t ≥ 0) is L(t). The second condition is referred to as the

bounded peak rate condition. The formal definition of such information flows is as

follows.

Definition 8 A pair of processes (F1, F2) is an information flow with bounded

delay ∆ and bounded peak rate L(·) if it is an information flow with bounded delay

∆, and for every realization, |F1 ∩ [s, t]| ≤ L(t − s) for all 0 ≤ s ≤ t.

Information flows with bounded delay and bounded peak rate always have

bounded memory, as stated in the following proposition.

Proposition 11 Given a realization (f1, f2) of an information flow with bounded

delay and bounded peak rate, let ni(a, b) (i = 1, 2) be the number of packets in fi

in the interval [a, b] (a ≤ b). Then

|n1(a, b) − n2(a, b)| ≤ L(∆), ∀a ≤ b.


¥

72

By Proposition 11, we see that information flows with bounded delay and

bounded peak rate are also information flows with bounded memory, where the

memory bound is L(∆). Note that the inverse is not true, i.e., bounded delay and

bounded memory do not imply bounded peak rate.

For information flows with bounded delay and bounded peak rate, Blum et al.

in [5] propose a detection algorithm called “Detect-Attacks” (DA). Algorithm DA

merges the observations and divides the results into groups of 2(M + 1)2 packets.

Then it computes the cumulative differences in each group (starting from zero in

every group). The algorithm returns H0 if there exists a group with cumulative

difference greater than M . The detector is defined as

δDA(s1, s2) =

n/(2(M+1)2)∏

k=1

δ(k)DA (s1, s2),

where

δ(k)DA (s1, s2) =

1 if max0≤w≤2(M+1)2

|d(k)(w)| ≤ M,

0 o.w.,

where d(k)(w) (w = 0, . . . , 2(M +1)2) is the cumulative difference for the kth group

(d(k)(0) = 0).

Blum et al. showed that DA has no miss detection. Moreover, they proved

that 2(M + 1)2 log 1α

packets are needed to guarantee a false alarm probability no

more than α. We are more interested in the asymptotic detection performance in

terms of error exponents. Since [5] does not compute the error exponent for the

false alarm probability of DA, we introduce the following lemma.

Lemma 1 For independent Poisson processes,

Pr maxi∈1,..., m

|d(i)| ≤ M ≤ σm

1 − σ,

73

and when m is large enough,

Pr maxi∈1,..., m

|d(i)| ≤ M ≥ Kσm,

where σ = cos π2(M+1)

, and K =sin π

2(M+1)

2(M+1)(1−σ).


¥

If M is large, we can apply Lemma 1 to each group of 2(M + 1)2 epochs to

obtain the upper and the lower bounds on the false alarm probability of that group.

Moreover, it was proved in [5] that the single group false alarm probability is upper

bounded by 12. Hence the false alarm probability of one group is upper bounded

by

min

(

σ2(M+1)2

1 − σ,

1

2

)

=

2+√

216

if M = 1,

12

if M ≥ 2.

Algorithm DA has a false alarm if all the n/ [2(M + 1)2] groups have false alarms6.

Thus for large M , the total false alarm probability satisfies

(

K1

2(M+1)2 σ)n

≤ PF (δDA) ≤(

1

2

)n

2(M+1)2

. (3.3)

Therefore, for large M , the false-alarm error exponent of DA is at most

− log (K1

2(M+1)2 σ) and at least log 2/(2(M + 1)2).

Now we compare DA with DMV under the assumptions of bounded delay and

bounded peak rate. Note that since we have shown in Proposition 11 that such

information flows satisfy the bounded memory condition, DMV also has no miss

detection. It remains to compare their false alarm probabilities.

6In DA, the sample size n is always a multiple of 2(M + 1)2.

74

We first point out that under H0, DMV always outperforms DA for any real-

ization. The reasons are that v(w) ≥ max1≤i≤w

|d(i)| (see Fig. 3.5) within a group, and

DA restarts computation from zero at the beginning of each group, whereas DMV

keeps increasing v(w) across groups. Therefore, for every realization, if DMV has

a false alarm, DA must have a false alarm too.

w

d(w)

0

max1≤i≤w

|d(i)|v(w)

Figure 3.5: The statistic of DA is no larger than that of DMV.

Next we compare their false alarm probabilities. In particular, we are interested

in whose false alarm probability has a larger error exponent. From Theorem 4 and

(3.3), we see that the false-alarm error exponent of DMV is − log ρ, whereas that

of DA is at most − log (K1

2(M+1)2 σ). By Taylor expansion of the error exponents,

we have that as M → ∞,

− log ρ =π2

2(M + 2)2+ o

(

1

M2

)

,

− log (K1

2(M+1)2 σ) =π2

4+ log π

2

2(M + 1)2+ o

(

1

M2

)

.

Therefore, for large M , the false-alarm error exponent of DMV is at least 3.38

times larger than that of DA.

3.5.2 DM vs. DMV

For information flows with bounded memory and bounded delay, both DMV and

DM are applicable. We are interested in which algorithm performs better asymp-

75

totically. Note that we need to give DMV and DM the same sample size to make a

fair comparison. If we define sample size as the total number of observed packets,

then by Theorems 3 and 4, we see that for Poisson processes of equal rate λ, DM

is preferable if γ ≤ ρ2, i.e.,

λ ≤ − 4

∆log

(

sinπ

M + 2

)

. (3.4)

Otherwise, DMV is preferable. For example, for M = 40, and ∆ = 10, the

threshold is λ ≤ 1.0375. This threshold phenomenon has an intuitive explanation.

Algorithm DMV only uses the rank statistics, so it does not depend on the rate of

the traffic; on the other hand, DM performs better on lighter traffic and worse on

heavier traffic. The reason for the latter is that if we normalize the maximum delay

with the average interarrival time, then the normalized delay bound λ∆ clearly

satisfies that λ∆ → 0 as λ → 0 and λ∆ → ∞ as λ → ∞, which implies that

for extremely light traffic, almost perfect synchrony is required to raise an alarm,

whereas for extremely heavy traffic, the delay constraint is essentially removed,

causing DM to always raise alarms. Therefore, when the traffic rate is sufficiently

low, DM outperforms DMV, and otherwise DMV outperforms DM.

The comparison suggests that the bounded memory condition is more informa-

tive than the bounded delay condition for λ∆ > 4 log ((M + 2)/π). Since the right

hand side merely grows as log M , the memory bound can be advantageous even

for modest rate and large memory. For example, for M = 106 packets, ∆ = 10

seconds, we only need λ > 5.1 packets per second for the bound memory condition

to outweigh the bounded delay condition.

76

3.6 Numerical Results

We simulate our algorithms on synthetic data to verify their performance. Since all

the algorithms (DA, DM, and DMV) are free of miss detection, we only simulate

the false alarm probabilities. We generate measurements by independent Poisson

processes of equal rate. We let M = 40 packets, ∆ = 10 seconds, and vary

the sample size (i.e., the total number of packets in both s1 and s2) between

2500 and 5000. Note that since DA requires the sample size to be a multiple

of 2(M + 1)2 = 3362 packets, we extend the sample size of DA to 6724. The

performance of DA and DMV does not depend on the traffic rate. For DM, the

rate will be specified when detailed results are presented.

We have shown the advantage of DMV over DA and have quantified their

difference in terms of error exponent as M → ∞ in Section 3.5.1. We now show

how their performance compares for finite M . In Fig. 3.6, we plot the simulated

false alarm probabilities of DMV and DA, together with the upper bound on

PF (δDMV) from Theorem 4 and the asymptotic upper and lower bounds on PF (δDA)

from (3.3). Simulation shows that the asymptotic bounds in (3.3) are valid even

for relatively small M (M = 40). Furthermore, it confirms our claim that the false

alarm probability of DMV decays much faster than that of DA.

We simulate DM for different traffic rates (λ = 3, 3.5, 4, 4.5). The simulation

results are plotted in Fig. 3.7. The upper bounds in Theorem 3 for rates between 3

and 4.5 are close to 1; the actual false alarm probabilities obtained from simulation

are much lower. The plot shows that the upper bound in Theorem 3 is not tight,

but it correctly predicts the fact that PF (δDM) increases with the increase of traffic

rate, as argued in Section 3.5.2.

77

2500 3000 3500 4000 4500 5000 5500 6000 6500 700010

−5

10−4

10−3

10−2

10−1

100

101

102

PF vs. sample size n M = 40, ∆ = 10

PF

n

DA DA upper bound DA lower bound DMV DMV upper bound

DA

DMV

Figure 3.6: PF (δDA), PF (δDMV), and their bounds; M = 40 packets, 100000 MonteCarlo runs.

Furthermore, we make an overall comparison by plotting the simulated false

alarm probabilities of DA, DMV and DM together in Fig. 3.8. From the plot, it is

clear that the comparison between DM and DMV depends on the traffic rate. In

our simulation, M = 40, ∆ = 10, the threshold rate estimated by (3.4) is about

1.0375. The simulation verifies the existence of such a threshold rate because the

false alarm probability of DM decays faster than that of DMV for λ = 3.5 and

slower for λ = 4.5. Note, however, that in the estimation of the threshold rate we

are conservative about DM. This is because for DMV, Theorem 4 gives the exact

error exponent, whereas for DM, Theorem 3 only characterizes a lower bound on

its error exponent (which is shown to be not tight). Therefore, we expect that

the actual threshold rate is larger than the one estimated by (3.4), e.g.,in the

simulation the threshold rate is about 4 .

78

2500 3000 3500 4000 4500 500010

−5

10−4

10−3

10−2

10−1


PF

n

DM, λ = 3 DM, λ = 3.5DM, λ = 4 DM, λ = 4.5

Figure 3.7: PF (δDM) under various rates; ∆ = 10 seconds, 100000 Monte Carloruns.

2500 3000 3500 4000 4500 5000 5500 6000 6500 700010

−5

10−4

10−3

10−2

10−1

100


PF

n

DA DMV DM, λ = 3 DM, λ = 3.5DM, λ = 4 DM, λ = 4.5

DMλ = 4

DMV

Figure 3.8: PF (δDA), PF (δDMV), and PF (δDM); M = 40 packets, ∆ = 10 seconds,100000 Monte Carlo runs.

3.7 Summary

In this chapter, we develop techniques to detect information flows when there is no

chaff noise. These techniques all belong to pairwise detection. If the information

flow of interest involves more than two hops, one can repeatedly apply the proposed

algorithms to detect all the 2-hop pieces and then use existing serialization methods

(e.g.,see [43]) to construct the flow path.

79

APPENDIX 3.A

PROOF OF CHAPTER 3

Proof of Theorem 4 and Lemma 1

The proof is based on the theory of random walk. Let Xnn≥0 be a simple random

walk, i.e.,

X0 = 0, Xn = Z1 + Z2 + . . . + Zn, (n > 0)

where Zii=1, 2,... are i.i.d. random variables taking value in −1, 0, 1. Let

p = PrZi = 1, q = PrZi = −1. Define the hitting time of −b or a (a, b ≥ 0) as

N−b, a = infn ≥ 1 : Xn = −b or a.

The following lemma is from [8]:

Lemma 2

PrN−b, a = n ≤ 1

2

(

p

q

)a/21

sn−11

+1

2

(

q

p

)b/21

sn−11

, (3.5)

where s1 = 1

1−p−q+2(pq)12 cos ( π

a+b). If a = b, then for large n,

PrN−b, a = n ≥ sin π2a

2asn−11

. (3.6)

Moreover, there exist constants cv (v = 1, . . . , a+b−1) and sv (v = 2, . . . , a+b−1)

not depending on n, s.t.

PrN−b, a > n =a+b−1∑

v=1

cv

snv

(3.7)

where |s1| ≤ |sv| (v = 2, . . . , a + b − 1).

80

Since

PrN−b, a > n =∞

∑

r=n+1

PrN−b, a = r,

(3.5,3.6) give upper and lower bounds on PrN−b, a > n.

For the proof of Theorem 4, note that for independent Poisson processes,

it is known that d(w) is a simple random walk. Define extreme values Un =

maxi=0,..., n

d(i), Ln = mini=0,..., n

d(i). A false alarm occurs in DMV if and only if

Un − Ln < M + 1. Note that the false alarm probability is the largest if d(w)

is symmetric (i.e., p = q = 12). Then we have

PF (δDMV) = PrUn − Ln < M + 1

= PrM+1⋃

a=1

Un < a, Ln > −(M + 2 − a)

≤M+1∑

a=1

PrUn < a, Ln > −(M + 2 − a) (3.8)

≤ (M + 1)ρn

1 − ρ, (3.9)

where ρ = cos πM+2

. Here (3.8) is by union bound, and (3.9) is by noticing

PrUn < a, Ln > −(M + 2 − a) = PrN−(M+2−a), a > n,

and then applying (3.5) with p = q = 12. Furthermore, by (3.7) it is easy to see

that

limn→∞

− 1

nlog PF (δDMV) = − log ρ.

For the proof of Lemma 1, note that

Pr maxi∈1,..., n

|d(i)| ≤ M = PrN−(M+1), (M+1) > n.

Applying (3.5, 3.6) with a = b = M + 1 and p = q = 12

gives the desired result.

¥

81

Proof of Theorem 3

Given a matching (si, s′i)i=1, 2,..., define Yi∆= s′i − si. Algorithm DM has

a false alarm if and only if there exists s′1 s.t. the order-preserving matching

(si, s′i)i=1,..., n satisfies 0 ≤ Yi ≤ ∆ for all i = 1, . . . , n.

For i ≥ 2, define the interarrival times to be Ui∆= si − si−1, and Vi

∆= s′i − s′i−1.

Let Zi∆= Vi − Ui. Then

Yi = (s′i−1 − si−1) + (s′i − s′i−1) − (si − si−1) = Yi−1 + Zi.

Therefore, given Y1, Yi∞i=2 is a general random walk with steps Zi (i ≥ 2). We

know that Vi and Ui are independent exponential random variables with mean

1/λ2 and 1/λ1, respectively, and thus Zi’s are i.i.d. with distribution function

PrZi ≤ z = PrVi − Ui ≤ z

=

∫ ∞

max(0, −z)

pUi(u) PrVi ≤ u + zdu

=

1 − λ1

λ1+λ2e−λ2z if z ≥ 0,

λ2

λ1+λ2eλ1z if z < 0.

The probability density function (pdf) of Zi is

pZ(z) =

λ1λ2

λ1+λ2e−λ2z if z ≥ 0,

λ1λ2

λ1+λ2eλ1z if z < 0.

The false alarm probability satisfies

PF (δDM) = Pr∃s′1, s.t. 0 ≤ Y n1 ≤ ∆

≤ maxy1∈[0, ∆]

Pr0 ≤ Y n2 ≤ ∆|Y1 = y1.

Fix a y1 ∈ [0, ∆]. For n ≥ 2, define

pn(z)dz∆= PrY n−1

2 ∈ [0, ∆], z < Yn < z + dz|Y1 = y1.

82

Define p1(z) = δ(z − y1) (Dirac delta function). In [8] (page 53) it is shown that

pn(z) =

∫ ∆

0

pn−1(x)pZ(z − x)dx. (n = 2, 3, . . .)

Then we have

Pr0 ≤ Y n2 ≤ ∆|Y1 = y1 =

∫ ∆

0

pn(zn)dzn

=

∫ ∆

0

pn−1(zn−1)dzn−1

∫ ∆

0

pZ(zn − zn−1)dzn

=

∫ ∆

0

pZ(z2 − y1)dz2

∫ ∆

0

pZ(z3 − z2)dz3 · · ·∫ ∆

0

pZ(zn − zn−1)dzn.

Let γ∆= max

t∈[0, ∆]

∫ ∆−t

−tpZ(z)dz. Simple calculation yields that γ = 1−e−λ1λ2∆/(λ1+λ2).

Then

Pr0 ≤ Y n2 ≤ ∆|Y1 = y1 ≤ γn−1.

Since this is true for all y1 ∈ [0, ∆], we have PF (δDM) ≤ γn−1.

¥

Proof of Proposition 11

If b − a ≤ ∆, then

|n1(a, b) − n2(a, b)| ≤ max(n1(a, b), n2(a, b)) ≤ M.

83

For b − a > ∆, let n′1(a − ∆, a) be the number of packets that arrive at the

relay node in [a−∆, a) and departure after a, and n′′1(b−∆, b) be the number of

packets that arrive in (b − ∆, b] and departure before b. Then

n1(a, b) = n1(a, b − ∆) + n1(b − ∆, b),

n2(a, b) = n1(a, b − ∆) + n′1(a − ∆, a)

+n′′1(b − ∆, b),

We have

n2(a, b) − n1(a, b) = n′1(a − ∆, a) + n′′

1(b − ∆, b)

−n1(b − ∆, b).

Since n′′1(b − ∆, b) ≤ n1(b − ∆, b) and n′

1(a − ∆, a) ≤ M , we have

n2(a, b) − n1(a, b) ≤ n′1(a − ∆, a) ≤ M.

Since n′1(a − ∆, a) ≥ 0, n′′

1(b − ∆, b) ≥ 0 and n1(b − ∆, b) ≤ M , we have

n2(a, b) − n1(a, b) ≥ −n1(b − ∆, b) ≥ −M.

¥

84

APPENDIX 3.B

ALGORITHMS OF CHAPTER 3

Algorithm DM

A pseudo code implementation of δDM is presented in Table 3.1.

Table 3.1: Detect-Match (DM).

Detect-Match(s1, s2, ∆, n):l = infk : s2(k) ≥ s1(1);u = supk : s2(k) ≤ s1(1) + ∆;for m = l, . . . , u

for k = 1, . . . , n*) if s2(k + m − 1) − s1(k) < 0 or s2(k + m − 1) − s1(k) > ∆ break;

endif k == n + 1 return H1;

endreturn H0;

To analyze the complexity of DM, note that the inner loop has O(n) operations,

and the number of such loops is O(1). Thus the complexity of DM is O(n). Zhang

et al. in [49] proposed to improve the complexity of DM by replacing step (*) with

the steps in Table 3.2, which enables DM to decide H0 earlier.

Table 3.2: Alternative Implementation of (*).

if s2(k + m − 1) − s1(k) < 0break;

else if s2(k + m − 1) − s1(k) > ∆return H0;

endend

85

Algorithm DMV

An implementation of δDMV is shown in Table 3.3. This algorithm has complexity

O(n) and uses only constant memory (O(log M), to be precise).

Table 3.3: Detect-Maximum-Variation (DMV).

Detect-Maximum-Variation(s1, s2, M, n):(sw)n

w=1 = s1

⊕

s2;dmax = dmin = d(0) = 0;for w = 1 : n

d(w) =

d(w − 1) + 1 if sw ∈ S1

d(w − 1) − 1 if sw ∈ S2;

dmax = max(dmax, d(w));dmin = min(dmin, d(w));if dmax − dmin > M return H0;

endreturn H1;

86

Chapter 4

Detecting Information Flows With Chaff

Noise

4.1 Outline

In this chapter, we address the detection of information flows mixed with chaff

noise. The main contribution is a tight characterization of flow detectability as

the maximum amount of chaff noise allowed for consistent detection. The rest

of the chapter is organized as follows. Section 4.2 defines the problem. Section

4.3 summarizes our results on the detectability of information flows. Sections 4.4

and 4.5 present chaff-inserting algorithms for the optimal embedding. Section 4.6

presents the detector and analyzes its performance. The analysis is supported by

simulation results in Section 4.8. Section 4.7 comments on the generalization of the

Poisson assumption. Then Section 4.9 concludes the chapter with remarks on its

contributions. Appendix 4.A includes all the proofs, and Appendix 4.C contains

pseudo code implementations of all the proposed algorithms.

4.2 Problem Formulation

We use the same convention for notations as in Chapter 3.

87

4.2.1 Multi-hop Flow Models

The two-hop flow models in Section 3.2.2 can be extended, in a natural way, to

flows over multiple hops. Suppose that we are interested in detecting information

flows through n (n ≥ 2) nodes, as illustrated in Fig. 4.1. Let Si (i = 1, . . . , n) be

the process of transmission epochs of node Ri, i.e.,

Si = (Si(1), Si(2), Si(3), . . .), i = 1, 2, . . . , n,

where Si(k) (k ≥ 1) is the kth transmission epoch of Ri.

S1:

S2:

R2

Sn:

R1 Rn

· · ·

Figure 4.1: Detecting information flows through nodes R1, R2, . . . , Rn bymeasuring their transmission activities; dotted lines denote apotential route.

F1 F2 Fn−1 Fn

R1 R2Rn

· · ·· · ·

Figure 4.2: An information flow along the path R1 → . . . → Rn.

If (Si)ni=1 contains an information flow, then it can be decomposed into an

information-carrying part (Fi)ni=1 and a chaff part (Wi)

ni=1:

Si = Fi

⊕

Wi, i = 1, · · · , n, (4.1)

where the information-carrying part consists of packets sent by R1 and relayed

sequentially by Ri (i = 2, . . . , n) as illustrated in Fig. 4.2. Note that chaff noise

is not subject to any constraints on information flows and can be correlated with

the information flows.

88

We extend the definition of information flows from two hops to arbitrary hops

as follows.

Definition 9 A sequence of processes (F1, . . . , Fn) is an information flow if for

every realization fi (i = 1, . . . , n), there exist bijections gi : Fi → Fi+1 (i =

1, . . . , n − 1) such that gi(s) − s ≥ 0 for all s ∈ Fi. For an information flow

with bounded delay ∆, gi(s)− s ≤ ∆ for all s ∈ Fi; for an information flow with

bounded memory M , gi satisfies

0 ≤ |Fi ∩ [0, t]| − |Fi+1 ∩ [0, t]| ≤ M (4.2)

for any t ≥ 0.

The bijection gi is a mapping between the transmission epochs of the same

packets at nodes Ri and Ri+1. For explanation of this definition, we refer to the

comments after Definition 7. Although in this definition, we have assumed equal

delay or memory constraint at every relay node, it can be easily generalized to

unequal constraints. Again, the constants ∆ and M are assumed to be known.

4.2.2 Problem Statement

We are interested in testing the following hypotheses:

H0 : S1, S2, . . . , Sn are jointly independent;

H1 : (Si)ni=1 contains an information flow,

by observing Si (i = 1, . . . , n) for some time t (t > 0). No statistical assumptions

are made for Fi and Wi (i = 1, . . . , n) under H1, but the distributions of Si

89

(i = 1, . . . , n) are assumed to be known under H0 (they are assumed to be Poisson

processes in our analysis). We point out that although Poisson assumption is

needed to obtain explicit expressions, the idea of detection is applicable for general

point processes.

Remark: The above is a test of independent traffic against end-to-end informa-

tion flows. Since the complement of H0 is not H1, one should view this test as part

of an overall detection scheme. For example, if we observe realizations s1, . . . , sN ,

and we want to find out whether a subset of the processes contains an information

flow, we can first apply the above hypothesis testing to every pair of realizations

(si, sj) (i, j ∈ 1, . . . , N) to test if this pair contains an information flow, and

then if there is no detection on pairs, we extend the scope to every triple, etc. That

is, we can sequentially test H0 versus H1 on every subset (si)i∈I (I ⊆ 1, . . . , N)

for |I| = 2, . . . , N . This procedure helps us to simplify the detection of partial

information flows which may only go through a subset of the monitored nodes to

the detection of end-to-end flows.

To characterize the amount of chaff noise, we introduce the following definition.

Definition 10 Given realizations of an information flow (fi)ni=1 and chaff noise

(wi)ni=1, the chaff-to-traffic ratio (CTR) is defined as

CTR(t)∆=

n∑

i=1

|Wi ∩ [0, t]|n∑

i=1

|Si ∩ [0, t]|, CTR

∆= lim sup

t→∞CTR(t) (4.3)

In words, CTR(t) is the fraction of chaff packets in the first t period of time

and CTR its asymptotic value. We are interested in the asymptotic detection

performance with respect to CTR.

90

Since we consider a nonparametric alternative hypothesis in which distributions

of Fi and Wi (i = 1, . . . , n) are unknown, we borrow the notion of Chernoff-

consistency in [32] to introduce the following performance measure.

Definition 11 A detector δt is called r-consistent (r ∈ [0, 1]) if it is Chernoff-

consistent for all the information flows with CTR bounded by r a.s.1, that is, the

false alarm probability PF (δt) and the miss probability PM(δt) satisfy

1. limt→∞

PF (δt) = 0 for any (Si)ni=1 under H0;

2. sup(Si)n

i=1∈Plimt→∞

PM(δt) = 0, where

P = (Si)ni=1 : (Si)

ni=1 contains an information flow,

and lim supt→∞

CTR(t) ≤ r a.s..

The consistency of a detector is defined as the supremum of r such that the detector

is r-consistent.

4.3 Flow Detectability

We first give the general detectability result, starting with the following definitions.

Definition 12 For n-hop information flows with bounded delay ∆, the level of

weak detectability, denoted by α∆n , is defined as

α∆n

∆= supr : ∀(Si)

ni=1 containing an information flow with

bounded delay ∆, if lim supt→∞

CTR(t) ≤ r a.s., then

∃ a Chernoff-consistent detector for (Si)ni=1..

1Here a.s. means “almost surely”.

91

The level of strong detectability, denoted by α∆n , is defined as

α∆n

∆= supr : ∃ δt s.t. δt is r-consistent..

For information flows with bounded memory, the levels of weak and strong de-

tectabilities, denoted by αMn and αM

n , are defined similarly.

By definition, the weak detectability allows the detector to depend on the

distribution of information flows, whereas the strong detectability does not. Thus

the level of weak detectability is no lower than that of strong detectability, i.e.,

αjn ≤ αj

n (j = ∆, M).

With a sufficient amount of chaff noise, the nodes can make traffic containing

an information flow mimic arbitrary traffic patterns, including the traffic patterns

under H0. Therefore, there must be some limits on the amount of chaff noise be-

yond which information flows are no longer detectable. A basic limit is the amount

of chaff noise sufficient to make an information flow statistically identical with in-

dependent traffic. Specifically, we define a notion of the level of undetectability as

follows.

Given H0, define the level of undetectability as2

β∆n

∆= infr ∈ [0, 1] : ∃(Fi)

ni=1, (Wi)

ni=1 satisfying:

1) (Fi

⊕

Wi)ni=1

d= (Si)

ni=1 for some

(Si)ni=1 under H0;

2) (Fi)ni=1 is an information flow with

bounded delay ∆;

3) lim supt→∞

CTR(t) ≤ r a.s.. (4.4)

2Here “d=” means equal in distribution.

92

That is, β∆n is the minimum CTR for an n-hop information flow with bounded delay

∆ to be equal to traffic under H0 in distribution. The corresponding quantity βMn

for bounded memory flows is defined similarly.

Our main results are the following relationships among the levels of weak and

strong detectabilities and the level of undetectability.

Theorem 5 If Si (i = 1, . . . , n) are Poisson processes of bounded rates under H0,

then

αjn = αj

n = βjn, j = ∆, M.

Remark: This theorem states that for Poisson null hypothesis, the levels of weak

and strong detectabilities are equal and equal to the minimum fraction of chaff to

mimic the null hypothesis. For CTR less than βjn (j = ∆, M), any information

flow can be detected consistently by the same detector; for CTR above or equal

to βjn, there is a method to hide the information flow among chaff noise such that

consistent detection is impossible. We will give explicit expressions for βjn or its

bounds later.

Proof: The proof contains a converse part and an achievability part. For the

converse part, we need to show that αjn ≤ βj

n (j = ∆, M). By the definition of

βjn, there exists (Si)

ni=1 such that it contains an information flow with βj

n fraction

of chaff, and S1, . . . , Sn are truly independent Poisson processes. Thus, it is

impossible to have a Chernoff-consistent detector for this information flow, which

implies that βjn is an upper bound on the level of weak detectability.

For the achievability part, we need to show that αjn ≥ βj

n (j = ∆, M). The

approach is to design a detector which is r-consistent for r arbitrarily close to βjn.

93

The detector is presented later in Definition 13 and analysis of its consistency in

Theorems 11 and 12. Combining the converse and the achievability results and

the fact that αjn ≤ αj

n (j = ∆, M) gives Theorem 5.

¥

In the following sections, we will explain how to compute βjn (j = ∆, M) and

how to do the detection.

4.4 Detectability of Two-hop Flows

In this section, we consider 2-hop information flows (i.e., n = 2). Given the

distribution of (S1, S2) under H0, we aim at characterizing the value of βj2 (j =

∆, M).

Our approach is to first find the algorithms which optimally partition Si (i =

1, 2) into Fi and Wi such that (F1, F2) is an information flow, and the CTR is

minimized, and then calculate βj2 by analyzing the CTR of these algorithms under

H0. Such algorithms are called chaff-inserting algorithms, and the CTR of these

algorithms is defined as the CTR of the partitioned traffic.

4.4.1 Two-hop Flows with Bounded Delay

Suppose that nodes R1 and R2 want to send a 2-hop information flow with bounded

delay ∆, and they are allowed to design the insertion of chaff noise. The question

94

is how to insert the minimum amount of chaff noise such that S1 and S2 become

statistically independent.

To answer this question, Blum et al. in [5] proposed a greedy algorithm called

“Bounded-Greedy-Match” (BGM) which works as follows: given a realization

(s1, s2),

1. match every packet transmitted at time s in the first process s1 with the first

unmatched packet transmitted in [s, s + ∆] in the second process s2;

2. label all the unmatched packets in s1 and s2 as chaff.

See Fig. 4.3 for an illustration of BGM. It is easy to see that BGM has complexity

O(|S1| + |S2|). For a pseudo code implementation of BGM, see Appendix 4.C.

s1

s2

Chaff

∆

Figure 4.3: BGM: a sequential greedy match algorithm.

Algorithm BGM has been shown in [5] to be the optimal chaff-inserting algo-

rithm for 2-hop information flows with bounded delay, as stated in the following

proposition.

Proposition 12 ( [5]) For any realization (s1, s2), BGM inserts the minimum

number of chaff packets in transmitting an information flow with bounded delay ∆.

The optimality of BGM allows us to characterize the minimum chaff needed

to mimic completely independent traffic by analyzing the CTR of BGM. If, in

95

particular, the independent traffic can be modelled as Poisson processes, then we

prove the following results.


respectively, then with probability one, the CTR of BGM satisfies

limt→∞

CTRBGM(t)

=

(λ2−λ1)(

1+(

λ1λ2

)

e∆(λ1−λ2))

(λ1+λ2)(

1−(

λ1λ2

)

e∆(λ1−λ2)) if λ1 6= λ2,

11+λ1∆

if λ1 = λ2.


¥

It is easy to show that if λi ≤ λ (i = 1, 2), then the CTR of BGM is lower

bounded by 1/(1 + λ∆). By the optimality of BGM, we see that the following

result holds.

Corollary 1 If under H0, S1 and S2 are independent Poisson processes with max-

imum rate λ, then the level of undetectability β∆2 = 1/(1 + λ∆).

With 1/(1+λ∆) fraction of chaff noise, the 2-hop traffic containing an informa-

tion flow with bounded delay can be made identical with traffic under H0 so that

no detector can detect this flow consistently. Note that as λ∆ → ∞, the value of

β∆2 will decrease to zero, implying that it is easy to mimic H0 if the traffic load is

heavy (large λ) or the delay bound is loose (large ∆).

96

4.4.2 Two-hop Flows with Bounded Memory

Consider the transmission of a 2-hop information flow with bounded memory M .

We want to find a method that schedules transmissions according to independent

traffic while inserting the minimum amount of chaff noise.

The bounded memory constraint requires that the memory size used at the

relay node to store relay packets is always bounded between 0 and M . Thus, a

feasible scheduling is to keep updating the memory size for each arrival (i.e., a

packet in S1) or departure (i.e., a packet in S2), and assign that packet to be chaff

if the memory is overflowed or underflowed. Based on this idea, we develop a chaff-

inserting algorithm called “Bounded-Memory-Relay” (BMR). Given a realization

(s1, s2) and (sk)∞k=1

∆= s1

⊕

s2, let M1(k) be the memory size after the transmission

of the kth packet in s1

⊕

s2. Algorithm BMR does the following: for k = 1, 2, . . .,

1. label a packet sk as chaff if and only if this packet will cause a memory

overflow, i.e., sk ∈ S1 and M1(k − 1) = M , or underflow, i.e., sk ∈ S2 and

M1(k − 1) = 0; initially, M1(0) = 0;

2. compute M1(k) by3

M1(k) =

M1(k − 1) if sk = chaff,

M1(k − 1) + Isk∈S1

−Isk∈S2 o.w.

A sample path of M1(k) (k ≥ 1) is shown in Fig. 4.4.

The complexity of BMR is O(|S1| + |S2|). See Appendix 4.C for an imple-

mentation of BMR. Note that unlike BGM, BMR does not specify the mapping

3Here I· is the indicator function.

97

0

0

s1

⊕

s2

M = 2

k

M1(k)

Chaff

Figure 4.4: Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated byBMR. Initially, M1(0) = 0, indicating that the memory is empty. Thefirst packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet is anarrival, and thus the memory size is increased by one. Such updatingoccurs at each arrival or departure.

between packets in the two processes because as long as the memory constraint is

satisfied, the order of transmission is irrelevant.

The optimality of BMR is guaranteed by the following proposition.

Proposition 13 For any realization (s1, s2), BMR inserts the minimum number

of chaff packets in transmitting an information flow with bounded memory M .


¥

Since BMR is optimal, we can characterize βM2 by the CTR of BMR, as stated

in the following theorem.


98

respectively, then with probability one, the CTR of BMR satisfies

limt→∞

CTRBMR(t) =

(λ2−λ1)

(

1+(

λ1λ2

)M+1)

(λ1+λ2)

(

1−(

λ1λ2

)M+1) if λ1 6= λ2,

11+M

if λ1 = λ2.


¥

It can be shown that the CTR is minimized when λ1 = λ2, based on which we

have the following result.

Corollary 2 If under H0, S1 and S2 are independent Poisson processes, then the

level of undetectability βM2 = 1/(1 + M).

If nodes can insert at least 1/(1+M) fraction of chaff noise, then BMR gives a

feasible transmission schedule for an information flow with bounded memory such

that the overall traffic is statistically the same as traffic under H0. Therefore,

1/(1 + M) establishes a limit on the maximum amount of chaff noise under the

requirement of Chernoff-consistent detection. If M ≫ 1, then very little chaff noise

suffices to hide the information flows.

4.5 Detectability of Multi-hop Flows

The results in Section 4.4 suggest that pairwise detection of information flows

is vulnerable to chaff noise because a relatively small amount of chaff noise can

99

make the information flow undetectable. These results indeed reveal the weakness

of pairwise detection. As the number of hops increases, however, we see that

the constraints imposed on information-carrying packets become tighter because

only the packets satisfying the constraints at every hop can successfully reach the

destination. This observation motivates us to extend the results in Section 4.4 to

information flows over multiple hops. Specifically, we will show that the fraction of

chaff noise needed to make a multi-hop information flow mimic jointly independent

traffic increases to one as the number of hops increases, which implies that joint

detection may significantly improve the performance against chaff noise.

4.5.1 Multi-hop Flows with Bounded Delay

Consider the transmission of an n-hop (n ≥ 2) information flow with bounded

delay ∆ according to certain processes. Given a sequence of processes (Si)ni=1,

we want to decompose Si (i = 1, . . . , n) into Fi and Wi such that (Fi)ni=1 is an

information flow with bounded delay, and the CTR is minimized.

Given the 2-hop chaff-inserting algorithm BGM, one might think that we can

sequentially apply BGM to every pair of processes to obtain (Fi)ni=1. Such an ap-

proach, however, does not give the optimal decomposition. For example, consider

the realizations shown in Fig. 4.5. If we use BGM to match packets in s1 and s2,

and then repeat BGM to match the matched packets in s2 with s3, we only find

one sequence of matched packets (as shown in (a)). There is, however, another way

of matching that gives two sequences of matched packets (as shown in (b)). The

implication is that for n > 2, a hop-by-hop greedy match is not sufficient. We have

to jointly consider all the subsequent hops to find the optimal packet matching.

100

s1s1

s2s2

s3s3

∆∆(a) (b)

Figure 4.5: Example: (a) The scheduling obtained by repeatedly using BGM. (b)Another scheduling. It shows that repeatedly using BGM issuboptimal.

To solve this problem, we develop an algorithm called “Multi-Bounded-Delay-

Relay” (MBDR). The idea of MBDR is that a packet at time t1 in s1 can be

matched with a packet at t2 ∈ [t1, t1 + ∆] in s2 only if t2 has matched packets

in si for all i = 3, . . . , n. The matching of t2 and its matched packets is done by

recursions. Such recursions allow us to consider all the processes simultaneously

and achieve a smaller CTR than repeatedly applying BGM. Specifically, MBDR

works as follows: given a realization (si)ni=1,

1. match every packet at time t1 in s1 with the first unmatched packet t2 in

[t1, t1 + ∆] in s2, conditioned on that t2 has a match in s3;

2. for i = 2, . . . , n−1, match the packet ti in si with the first unmatched packet

ti+1 in [ti, ti+∆] in si+1, conditioned on that ti+1 has a match in si+2 (assume

every packet in sn has a match);

3. after trying to match all the packets in s1, label all the unmatched packets

as chaff.

For example, consider the 3-hop information flow illustrated in Fig. 4.6. To match

t1 ∈ S1, MBDR first tries to find a match for t2. Since t2 can be matched with

t3 ∈ S3, t1 is matched with t2. If t2 does not have a match in s3, MBDR will try

101

to match t1 with the next unmatched packet in [t1, t1 + ∆] in s2. If there is no

more packet left, MBDR will label t1 as chaff.

s1

s2

s3

chafft1

t2

t3

Figure 4.6: MBDR: a recursive greedy match algorithm.

A direct implementation of MBDR has complexity O((λ∆)n|S1|), where λ is

the maximum rate of S1, . . . , Sn. The complexity can be reduced to O(n2|S1|)

by expanding the recursions (see Appendix 4.C). Note that MBDR is reduced to

BGM when n = 2.

It is easy to verify that if we transmit information-carrying packets according

to the matching found by MBDR, the transmissions will satisfy the bounded delay

constraint at every hop. Moreover, such a transmission schedule preserves the order

of incoming packets. The following proposition states that MBDR is optimal.

Proposition 14 For any realization (si)ni=1, MBDR inserts the minimum number

of chaff packets in transmitting an n-hop information flow with bounded delay ∆.


¥

102

By arguments similar to those in the proof of Theorem 6, one can show that the

CTR of MBDR converges a.s. It is difficult to compute the exact limit4. Instead,

we give the following bound.

Theorem 8 If Si (i = 1, . . . , n) are independent Poisson processes of maximum

rate λ, then

limt→∞

CTRMBDR(t) ≥ 1 − κn a.s.

where

κn = min

(

(λ∆)n−2(1 − e−λ∆),n−1∏

i=1

(1 − e−iλ∆)

)

.


¥

By Theorem 8, we see that the CTR of MBDR goes to one exponentially with

the increase of n if λ∆ < 1. It can be shown that if we repeatedly apply BGM,

then the CTR is lower bounded by 1− (1− e−λ∆)n−1 a.s., which always converges

to one exponentially.

Although in Definition 9 we have assumed identical delay bounds at all the

relay nodes, MBDR can be easily extended to different delay bounds, and κn in

Theorem 8 becomes

min

(

(1 − e−λ∆n−1)n−2∏

i=1

(λ∆i),n−1∏

i=1

(1 − e−iλ∆i)

)

,

where ∆i is the maximum delay at the ith relay node.

4For example, for independent Poisson processes, computing the CTR of MBDR involvescomputing the limiting distribution of an (n − 1)-dimensional continuous state space Markovprocess.

103

The optimality of MBDR allows us to have the following result.

Corollary 3 If under H0, S1, . . . , Sn are independent Poisson processes of rates

bounded by λ, then β∆n ≥ 1 − κn.

By this result, we see that for sufficiently light traffic or small delay bound

(i.e., λ∆ < 1), β∆n converges to one exponentially fast as n increases. Numerical

calculation shows that β∆n still converges to one for λ∆ > 1, but the convergence

is slower than exponential. If we calculate the maximum rate of the information

flow by λ(1 − β∆n ), then this rate will go to zero with the increase of n, implying

that it is almost impossible to hide information flows over arbitrarily long paths.

See Fig. 4.7–4.9 for numerically computed information rate 1 − β∆n as a function

of n. From the plots, it is clear that the information rate decays exponentially at

λ∆ < 1 (Fig. 4.7) and subexponentially at λ∆ > 1 (Fig. 4.8, 4.9).

4.5.2 Multi-hop Flows with Bounded Memory

Suppose that we want to transmit an n-hop information flow with bounded memory

M according to certain processes. We generalize BMR to an algorithm called

“Multi-Bounded-Memory-Relay” (MBMR) to insert chaff noise in this case.

Algorithm MBMR borrows the idea of monitoring memory size in BMR. Specif-

ically, let Mi(k) (i = 1, . . . , n − 1) denote the memory size of Ri+1 after the

kth packet in the total traffic. Algorithm MBMR keeps updating (Mi(k))n−1i=1 for

k = 1, 2, . . . and assigns chaff packets if memory underflow or overflow occurs.

Given a realization (si)ni=1 and (sk)

∞k=1

∆= s1

⊕ · · ·⊕ sn, MBMR works as follows:

for k = 1, 2, . . .,

104

The normalized rate of information flow 1 − β∆n with respect to n (∆ = 1):

solid line: 1 − β∆n computed for 1000 packets per process; dashed line: κn.

2 3 4 5 6 7 8 9 1010

−2

10−1

100

n

info

rmat

ion

rate

Figure 4.7: λ = 0.9.

2 4 6 8 10 12 14 16 18 2010

−1

100

n

info

rmat

ion

rate

Figure 4.8: λ = 2.

0 5 10 15 20 25 30 35 4010

−1

100

n

info

rmat

ion

rate

Figure 4.9: λ = 4.

1. label sk ∈ Si as chaff if and only if Mi−1(k − 1) = 0 or Mi(k − 1) = M

(initially, Mj(0) = 0 for j = 1, . . . , n − 1; M0(0) = ∞; Mn(0) = −∞);

2. compute (Mj(k))n−1j=1 by

Mi−1(k) =

Mi−1(k − 1) − 1 if sk 6= chaff,

Mi−1(k − 1) o.w.,

Mi(k) =

Mi(k − 1) + 1 if sk 6= chaff,

Mi(k − 1) o.w.,

and Mj(k) = Mj(k − 1) for j = 1, . . . , i − 2, i + 1, . . . , n − 1.

See Fig. 4.10 for an example of MBMR.

105

s1

s2

s3

s4

s

M1

M2

M3

Chaff

Figure 4.10: MBMR for n = 4 and M = 3 (s = s1

⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the 10thpacket), (M1(10), M2(10), M3(10)) = (1, 1, 0).

Algorithm MBMR has complexity O(n∑

i=1

|Si|). See Appendix 4.C for its imple-

mentation. Note that MBMR is reduced to BMR when n = 2. If we sequentially

match the non-chaff packets found by MBMR, then we will have a transmission

schedule that satisfies the bounded memory constraint. The optimality of MBMR

is provided by the following proposition.

Proposition 15 For any realization (si)ni=1, MBMR inserts the minimum number

of chaff packets to schedule the transmission of an n-hop information flow with

bounded memory M .

Proof: The proof follows the same arguments as in the proof of Proposition 13.

¥

We can now characterize βMn by the CTR of MBMR. If S1, . . . , Sn are indepen-

106

dent Poisson processes, then the CTR of MBMR converges almost surely, and the

limit can be calculated by the limiting distribution of a Markov chain, as shown

in Appendix 4.B.

It is difficult to give a closed-form expression for the exact CTR of MBMR.

Alternatively, we derive the following upper and lower bounds. Let

A ∆= (Si)

ni=1 : S1, . . . , Sn are independent Poisson processes.

We have the following theorem.

Theorem 9 For any (Si)ni=1 ∈ A,

limt→∞

CTRMBMR(t) ≥ 1 − un a.s.

Furthermore,

infA

limt→∞

CTRMBMR(t) ≤ 1 − ln a.s.

Here ln and un are given by

ln+1 =ln(1 − lMn )

1 − lM+1n

and

un+1 = un

(

1 − 1

M + 12−M/un

)

for n ≥ 2, and l2 = u2 = M/(M + 1).


¥

107

Although identical memory constraints have been assumed in Definition 9,

MBMR can be easily modified to allow different memory constraints, and it can

be shown that the CTR is bounded between 1 − u′n and 1 − l′n, where

l′n+1 =l′n(1 − l′n

Kn)

1 − l′nKn+1

and

u′n+1 = u′

n

(

1 − 1

Kn + 12−Kn/u′

n

)

for n ≥ 2, and l′2 = u′2 = K1/(K1 + 1). Here Ki (i = 1, . . . , n − 1) is the memory

constraint at the ith relay node.

Based on Theorem 9 and the optimality of MBMR, we have the following result.

Corollary 4 If under H0, S1, . . . , Sn are independent Poisson processes, then

1 − un ≤ βMn ≤ 1 − ln.

The bounds in Corollary 4 are not far from the actual value of βMn at small n;

see the numerical results in Fig. 4.11.

2 3 4 5 6 7 8 9 100.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55CTR of MBMR

n

CT

R

βMn

1−ln

1−un

Figure 4.11: The level of undetectability βMn and its bounds as functions of n:

M = 4; compute βMn on 10000 packets.

108

Another interpretation of Corollary 4 is that the normalized maximum rate of

information flow calculated by 1 − βMn is bounded between ln and un. Numerical

calculation shows that ln and un both decay polynomially. Specifically, ln decays at

approximately Θ(n−1/M) and un at Θ(n−1/(2M−2)). Furthermore, numerical com-

parison shows that if λ∆ = M , βMn increases slower than β∆

n as n → ∞, suggesting

that it is relatively easier to hide information flows with bounded memory.

4.6 Detector

In Section 4.4 and 4.5, we have characterized the levels of undetectabilities for infor-

mation flows with bounded delay or bounded memory. The results are summarized

in Table 4.1. These results provide upper bounds on the level of detectability.

Table 4.1: Levels of undetectabilities (Poisson null hypothesis).β∆

2 = 11+λ∆

βM2 = 1

1+M

β∆n ≥ 1 − κn

1 − un ≤ βMn ≤ 1 − ln

In this section, we will present an explicit detector whose consistency can ap-

proximate βjn (j = ∆, M) arbitrarily. Our main theorem is stated as follows.

Theorem 10 For any ǫ > 0, there exists a detector such that its consistency is

no smaller than βjn − ǫ (j = ∆, M).

Remark: The theorem states that as ǫ → 0, there exists a sequence of detectors

with consistency approaching βjn (j = ∆, M). Therefore, the level of strong

detectability is no smaller than βjn, i.e., αj

n ≥ βjn (j = ∆, M).

109

The proof of Theorem 10 is by constructing a detector and showing that its

consistency approximates βjn (j = ∆, M) arbitrarily. Ideally, we would like to know

what strategy is used to perturb timing and insert chaff noise so that we can design

a detector accordingly. The difficulty here is that we do not know what strategy

is going to be used when information flows are transmitted, and therefore our goal

is to design a single detector which has good performance for a wide variety of

information flows.

The key idea is to design the detector based on the amount of chaff noise needed

by the optimal chaff-inserting algorithms. If the detector is designed to guarantee

that even the optimal algorithms need a sufficiently large amount of chaff to evade

detection, then any other chaff-inserting algorithm would have to insert no less

chaff noise to evade detection. Therefore, we can make sure that the detector is

r-consistent against fractions of chaff up to a certain level. Specifically, we propose

the following detector.

Definition 13 Given observations5 (si)ni=1 (n ≥ 2), the detector is defined as

δt((si)ni=1; τn) =

1 if CTR(t) ≤ τn,

0 o.w.,

where τn is a predetermined threshold, and CTR(t) is the minimum fraction of

chaff in the measurements.

Remark: The statistic CTR(t) is computed by the optimal chaff-inserting algo-

rithm followed by certain adjustments. Specifically, it is calculated by the following

procedure:

5To be precise, the detector is only given the part of si (i = 1, . . . , n) that falls into thelength-t observation interval.

110

1. compute C , the set of chaff packets found by the optimal chaff-inserting

algorithm (MBDR for bounded delay flows or MBMR for bounded memory

flows);

2. calculate a number C by

C =

∣

∣

∣

∣

∣

C \(

n⋃

i=1

Si ∩ [0, (i − 1)∆)

)∣

∣

∣

∣

∣

for bounded delay flows, or

C = |C | + min0≤k≤w∗

d(k)

for bounded memory flows, where d(k) is the cumulative difference defined

as

d(k)∆=

k∑

j=1

(

Isj∈S1 − Isj∈S2)

, (4.5)

d(0) = 0, and w∗ is the first time that d(k) varies by M , i.e., w∗ ∆= infw :

max0≤k≤w

d(k) − min0≤k≤w

d(k) = M;

3. compute CTR(t) = C/N , where N =n∑

i=1

|Si|.

For implementation details, we refer to Appendix 4.C. We point out that for large

N , the influence of the adjustment in step (2) on CTR(t) is negligible.

The reason for CTR(t) to be the minimum fraction of chaff in the measurements

hinges on two facts. The first is the optimality of the chaff-inserting algorithm used

to find C . The second is the adjustment in step (2). The adjustment is needed

because the detector may not observe the beginning of the information flow. At the

time the detector starts, there may have been packets stored at the relay nodes, and

when these packets are relayed, the relay packets may appear to be chaff noise from

the detector’s perspective since they do not correspond to any observed packets.

We solve this problem by ignoring certain chaff packets found at the beginning of

111

the measurements. For bounded delay flows, these are the packets in [0, (i− 1)∆)

in si (i = 1, . . . , n). For bounded memory flows, these are the packets which may

be relays of packets stored in the memory initially. Detailed explanations can be

found in Appendix 4.C.

Now that CTR(t) is the minimum CTR in the measurements, we can guarantee

detection as follows.

Theorem 11 The detector in Definition 13 has vanishing miss probability for all

the information flows with CTR bounded by τn a.s.

Theorem 11 is a direct implication of the fact that CTR(t) is the minimum

fraction of chaff packets in the measurements. Actually, a stronger statement holds,

which is that the detector has no miss detection for all realizations of information

flows with no more than τn fraction of chaff packets.

The threshold value needs to be carefully chosen such that the detector satisfies

certain false alarm constraint. Specifically, under the assumption that Si’s are

independent Poisson processes of maximum rate λ under H0, we have the following

theorems on the false alarm probabilities.

Theorem 12 If τn < βjn (j = ∆, M), then the false alarm probability satisfies

limN→∞

1

Nlog PF (δt) ≤ −Γn(τn; λ, ∆) < 0

for bounded delay flows, and

limN→∞

1

Nlog PF (δt) ≤ −Γn(τn; M) < 0

for bounded memory flows, where N =n∑

i=1

|Si|.

112


¥

The theorem states that the false alarm probability of the proposed detector

decays exponentially as long as the threshold is less than βjn (j = ∆, M). The

functions Γn(τn; λ, ∆) and Γn(τn; M) give lower bounds on the error exponents;

see the proof for their definitions. We point out that Γn(τn; λ, ∆) and Γn(τn; M)

are positive for all τn < βjn (j = ∆, M), and they are both decreasing functions of

τn.

Combining Theorem 11 and 12 yields the following result.

Corollary 5 If τn < βjn (j = ∆, M), then τn is the consistency of the proposed

detector.

Remark: As τn → βjn, the consistency of the proposed detector converges to βj

n,

which proves that the level of strong detectability is lower bounded by βjn. From

Corollary 5, we see that the proposed detector is optimal in terms of consistency.

In particular, since βjn → 1 as n increases, the proposed detector can detect almost

all the long-lasting information flows with sufficiently long paths.

The threshold τn represents a tradeoff between the consistency and the false

alarm probability. A larger τn enables consistent detection against more chaff

noise at the cost of a higher false alarm probability, whereas a smaller τn leads to

a smaller false alarm probability but less consistency against chaff noise.

113

4.7 Generalization of Poisson Assumption

We have assumed that the node transmission epochs can be modelled as indepen-

dent Poisson processes under H0. Poisson assumption allows us to obtain clean

analytical results, but it is known that wide-area traffic such as internet traffic does

not fit the Poisson model. It, however, can be argued that Poisson processes are

less bursty than real-world traffic, and therefore, our results provide lower bounds

on the levels of detectability of actual information flows.

Specifically, suppose that traffic under H0 can be modelled as independent

renewal processes with Pareto interarrival distributions [28]. It was shown in [28]

that Pareto distribution fits experimental data over many time scales. We show

that such processes are more difficult to mimic than independent Poisson processes,

as stated in the following theorem.

Theorem 13 Let CTR′BMR

(t) denote the chaff-to-traffic ratio found by BMR in in-

dependent renewal processes with Pareto interarrival distributions and CTRBMR(t)

in independent Poisson processes of the same rates. Then

lim inft→∞

CTR′BMR

(t) ≥ limt→∞

CTRBMR(t) a.s.

Similar statement holds for the CTR of BGM.


¥

By this theorem, we see that it requires more chaff noise to mimic the null

114

hypothesis under Pareto interarrival distributions. The results can be generalized

to MBDR and MBMR.

If traffic under H0 has Pareto interarrival distributions, the idea in prov-

ing Theorem 5 is still applicable. Specifically, let CTR′(t) be the fraction of

chaff packets inserted by the optimal chaff-inserting algorithm (i.e., MBDR or

MBMR) in the interval [0, t] under Pareto interarrival distribution. Then the up-

per bound on the level of weak detectability is the minimum r (r ∈ [0, 1]) such that

lim supt→∞

CTR′(t) ≤ r a.s., and the lower bound on the level of strong detectability

is the maximum r such that lim inft→∞

CTR′(t) ≥ r a.s6. We see that the levels of de-

tectabilities under Pareto distribution are no smaller than those under Exponential

distribution.

To verify the claim that Poisson assumption provides lower bounds on the ac-

tual detection performance, we simulate BGM and BMR on the traces LBL-PKT-

4, which contains an hour’s worth of all wide-area traffic between the Lawrence

Berkeley Laboratory and the rest of the world7. We compute the CTR of pairs of

different traces8, and then compare the empirical cumulative distribution function

(c.d.f.) of the computed CTR with the c.d.f. of the CTR predicted by Theorems

6 and 7 for independent Poisson processes of the same rates as the empirical rates

of the traces. See Fig. 4.12 and 4.13. From these plots, it is clear that at the

same threshold, the traces have much lower false alarm probabilities than Poisson

processes.

We point out that the results in Theorem 13 also apply to renewal processes

with other interarrival distributions which have the heavy-tailed property [21].

6Note that for Pareto interarrival distributions, the upper and the lower bounds may notmeet.

7The traces were made by Paxson and were first used in his paper [28].8We extract 134 TCP traces from the data, each of which is truncated to 1000 packets.

115

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Poisson

traces

c.d.f.

CTR

Figure 4.12: The c.d.f. of the CTR of BGM for ∆ = 5: CTR on traces vs. CTRon Poisson processes.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Poisson

traces

c.d.f.

CTR

Figure 4.13: The c.d.f. of the CTR of BMR for M = 20: CTR on traces vs. CTRon Poisson processes.

On the other hand, if the interarrival distributions are light-tailed such as the

uniform distribution, it can be shown that the opposite results hold. In terms of

tailweight, we have analyzed a popular medium-tailed distribution, the exponential

distribution, and our results should be viewed as a benchmark for other tailweights.

116

4.8 Simulations

In this section, we simulate the proposed detectors on both synthetic Poisson traffic

and internet traces. The simulations on Poisson traffic are meant to verify our

analysis and examine properties of the proposed detectors, whereas the simulations

on traces are mainly used to verify the performance on actual traffic and show the

relative advantages of our detectors compared with existing flow detectors.

4.8.1 Synthetic Data

For synthetic data, (S1, . . . , Sn) is a sequence of independent Poisson processes of

rate λ under H0. Under H1, it is the mixture of an information flow (F1, . . . , Fn)

of rate (1− fc)λ (for some fc ∈ (0, 1)) and chaff traffic (W1, . . . , Wn), where Wi

(i = 1, . . . , n) are independent Poisson processes of rate fcλ. Here the parameter

fc is the CTR. The process F1 is a Poisson process of rate (1− fc)λ, and its relays

Fi (i > 1) are generated as follows. For information flows with bounded delay,

Fi = sort(Fi−1(1) + D1, Fi−1(2) + D2, . . .), i > 1,

where Fi−1 = (Fi−1(1), Fi−1(2), . . .), and D1, D2, . . . are i.i.d. delays uniformly

distributed in [0, ∆]. For information flows with bounded memory, we partition

the epochs of Fi−1 into groups of size ⌊M/2⌋, where the jth group is

(Fi−1((j − 1)⌊M/2⌋), . . . , Fi−1(j⌊M/2⌋ − 1)).

Then Fi is generated by selecting ⌊M/2⌋ epochs independently and uniformly from

the interval [Fi−1((j − 1)⌊M/2⌋), Fi−1(j⌊M/2⌋)) for each j ≥ 2. As illustrated in

Fig. 4.14, if we match epochs in the generated realizations fi−1 and fi (i ≥ 2)

sequentially, then the matching satisfies the bounded memory constraint.

117

f1

f2

⌊M2⌋

Figure 4.14: Generating information flows with bounded memory (⌊M2⌋ = 3): f2

is generated by storing ⌊M2⌋ packets from f1 and randomly releasing

these packets during the arrival of the next ⌊M2⌋ packets.

Explanations of the parameters used in this simulation are summarized in Ta-

ble 4.2. We are mainly interested in the influence of changing n on the detection

performance. Since it can be shown that increasing n has opposite effects on the

false alarm and the miss probabilities, we plot the receiver operating characteristics

(ROCs) [30] for different n.

Table 4.2: Parameters for Simulations on Synthetic Data.n the number of processesλ the rate of Si (i = 1, . . . , n)∆ maximum delayM maximum memory sizefc CTR

We first fix the sample size per process and vary the threshold to plot the ROCs

for bounded delay flows and bounded memory flows; see Fig. 4.15, 4.16. From

the plots, we see that the ROCs approach the upper left corner (i.e., zero error

probabilities) as n increases, implying that the detector has better performance

as the number of processes increases. This is as expected because as n increases,

the detector has more observations, and thus the detection performance should be

improved.

We then fix the total sample size and plot the ROCs for different n; see Fig. 4.17,

118

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n = 2n = 3n = 4n = 5n = 6

PF (δt)

1−

PM

(δt)

n ↑

Figure 4.15: The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, 100 packets per process, 10000 Monte Carloruns.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n = 2n = 3n = 4n = 5n = 6

PF (δt)

1−

PM

(δt)

n ↑

Figure 4.16: The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, 40 packets per process, 10000 Monte Carloruns.

4.18. We want to find out whether, given a total sample size, we should do pairwise

detection or joint detection over multiple hops. As illustrated in Fig. 4.17, given

a total sample size of 200, the ROCs for n = 3 outer bounds the ROCs for both

n = 2 and n = 4, . . . , 6. Similar observations can be obtained from Fig. 4.18.

The observations suggest that if the total sample size is constrained, then there

is an optimal n such that joint detection over n nodes optimizes the performance.

Intuitively, this is because as n increases, the sample size per process decreases,

119

making the detection more difficult, but the constraints on information flows be-

come tighter, making the detection easier. These contradictory effects lead to a

tradeoff between sample size and path length, which results in the optimal n. Note

that here we have assumed the flow path to be sufficiently long.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n = 2n = 3n = 4n = 5n = 6

PF (δt)

1−

PM

(δt)

n = 3 : 6

n = 2

Figure 4.17: The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, totally 200 packets over all process, 10000Monte Carlo runs.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n = 2n = 3n = 4n = 5n = 6

PF (δt)

1−

PM

(δt)

n = 3 : 5

n = 2n = 2n = 6n = 6

Figure 4.18: The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, totally 100 packets over all process, 10000Monte Carlo runs.

Note that the levels of detectability at n = 2 for bounded delay flows and

bounded memory flows are both equal to 0.2. By the discussions following Corol-

laries 1 and 2, we know that with 20% chaff noise, we can generate information

120

flows which make the detection no better than random guessing. In the simulation

(Fig. 4.15–4.18), however, the detection is clearly much better than random guess-

ing. This observation shows that the flow-generating models used in the simulation

are not optimal. If we compare the ROCs for bounded delay flows with those for

bounded memory flows9 (i.e., Fig. 4.15 vs. Fig. 4.16 and Fig. 4.17 vs. Fig. 4.18),

we see that the detector of bounded memory flows outperforms that of bounded

delay flows for these flow-generating models.

4.8.2 Traces

For simulation on traces, we use the TCP traces in LBL-PKT-4 referenced in

Section 4.7. We extract 134 flows from the TCP packets in LBL-PKT-4. Each

flow has at least 1000 packets, and 4 of them have at least 10000 packets. Only

pairwise detection is simulated due to the limited data. Under H0, (S1, S2) is a

pair of different traces of size 1000. Under H1, Si = Fi ⊕ Wi (i = 1, 2), where

Wi consists of Nc packets i.i.d. uniformly distributed on the range of Fi. The

process F1 is a trace of size 10000, and F2 is generated by bounded delay or

bounded memory perturbations as those in Section 4.8.1. Parameters used in this

simulation are explained in Table 4.3.

Table 4.3: Parameters for Simulations on Traces.N total number of epochs∆ maximum delayM maximum memory sizeNc number of chaff packets per process

We compare the proposed detectors for bounded delay flows (denoted by δBD)

9We have made M = λ∆ for fair comparison.

121

and bounded memory flows (denoted by δBM) with the detector δDAC using algorithm

“Detect-Attacks-Chaff” (DAC) for bounded memory flows10 (by Blum et al. in [5])

and the detector δS-III using algorithm S-III for bounded delay flows (by Zhang et

al. in [49]). We first simulate the false alarm probabilities; see Fig. 4.19. We

choose the thresholds of δBD and δBM such that their false alarm probabilities are

comparable with that of δDAC. The false alarm probabilities of these three detectors

level off after the sample size 1000; the false alarm probability of δS-III, however,

keeps decreasing to a much smaller value. From the plot, we see that the false

alarm probabilities of δBM, δBD, and δDAC do not decay exponentially. It is possible

that the false alarm probability of δS-III decays exponentially, but we do not have

enough data in these traces to verify that.

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−5

10−4

10−3

10−2

10−1

100

N

PF

(δ)

δBM

δBD

δDAC δS-III

Figure 4.19: PF (δBM), PF (δDAC), PF (δBD), and PF (δS-III) on LBL-PKT-4: M = 20,∆ = 5, threshold for δBD = 1/14, threshold for δBM = 1/21, tested on134 × 133 trace pairs.

We then simulate the miss probabilities of δBM and δDAC; see Fig. 4.20. For

each of the 4 traces of size 10000, we generate 1000 bounded memory flows inde-

pendently. The simulation shows that δBM has much lower miss probability than

δDAC. In fact, δBM detects all the information flows, whereas δDAC has up to 27.58%

10Originally, DAC was proposed for information flows with bounded delay and bounded peakrate, but it is applicable for bounded memory flows as discussed in Section 3.5.1.

122

misses by sample size 22000. The plot also shows that the miss probability of δDAC

increases with the sample size. This is because as the sample size increases, the

average number of chaff packets also increases, and δDAC can only handle a fixed

number of chaff packets. Note that although our analysis says that δBM is only

consistent for CTR up to 0.0476, δBM survives CTR = 0.1 in the simulation, which

implies that the uniform chaff insertion is not optimal for bounded memory flows.

0 0.5 1 1.5 2 2.5

x 104

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

N

PM

(δ)

δBM

δDAC

Figure 4.20: PM(δBM) and PM(δDAC): M = 20, Nc = 1000, threshold forδBM = 1/21, tested on 4000 bounded memory flows.

Next we simulate the miss probabilities of δBD and δS-III; see Fig. 4.21. We

generate 1000 bounded delay flows independently from each of the traces of size

10000. The plot confirms that δBD has a much smaller miss probability than δS-III;

actually, in the simulation, δBD has no miss for almost all the sample sizes11. This

is because δBD can tolerate a certain fraction of chaff packets no matter where

they are inserted, whereas δS-III is vulnerable to chaff packets in S1. As in the case

of bounded memory flows, we see that δBD handles much more chaff noise than

predicted by the analysis, which shows that uniform chaff insertion is also not

optimal for bounded delay flows. Moreover, comparing Fig. 4.20 and Fig. 4.21, we

see that δDAC is more robust to chaff noise than δS-III.

11The exception is at the sample size 3000, where we have 6 misses out of 4000 informationflows.

123

0 0.5 1 1.5 2 2.5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

N

PM

(δ)

δBD

δS-III

Figure 4.21: PM(δBD) and PM(δS-III): ∆ = 5, Nc = 1000, threshold for δBD = 1/14,tested on 4000 bounded delay flows.

4.9 Summary

This chapter addresses timing-based detection of information flows in the pres-

ence of active perturbations and chaff noise. It characterizes the detectability

of information flows in terms of the maximum amount of chaff noise that allows

consistent detection and shows how to design the detector to achieve consistent de-

tection based on knowledge of the null hypothesis. The Poisson assumption under

the null hypothesis makes our results lower bounds on the detection performance

of practical information flows. The proposed detector coupled with capacity con-

straints between neighbor nodes can capture all the long-lived information flows

with positive rates and sufficiently long paths.

124

APPENDIX 4.A

PROOF OF CHAPTER 4

Proof of Theorem 6

Let Yj be the jth packet delay, i.e., Yj = S2(j) − S1(j). Define

Zj∆= Yj − Yj−1 = (S2(j) − S2(j − 1)) − (S1(j) − S1(j − 1)).

We see that Zj’s are i.i.d. random variables; each Zj is the difference between two

independent exponential random variables with mean 1/λ2 and 1/λ1, respectively.

The process Yj∞j=1 is a general random walk with step Zj. Define Y0 = 0.

Now for every chaff packet inserted at t in S2, we insert a virtual packet at t

in S1; for every chaff packet at s in S1, we insert a virtual packet at s + ∆ in S2,

as illustrated in Fig. 4.22. Let the new packet delays after the insertion of virtual

packets be Y ′j ∞j=0. It can be shown that Y ′

j ∞j=0 is also a random walk with step

Zj, but it has two reflecting barriers at 0 and ∆, i.e.,

Y ′j = min(max(Y ′

j−1 + Zj, 0), ∆).

Since it is almost surely impossible for Y ′j−1 + Zj to be exactly equal to 0 or

∆, each time Y ′j = 0 corresponds to a chaff packet in S2, and Y ′

j = ∆ corresponds

to a chaff packet in S1. Thus, the limiting probability for a packet to be chaff

is h∆/(1 − h0) in S1, and h0/(1 − h∆) in S2, where h0 = limj→∞

PrY ′j = 0, and

h∆ = limj→∞

PrY ′j = ∆. The overall probability for a packet in S1

⊕

S2 to be chaff

is a weighted sum

λ1h∆

(λ1 + λ2)(1 − h0)+

λ2h0

(λ1 + λ2)(1 − h∆). (4.6)

125

0

0 S1

S2t

s∆

Chaff Attackingpacket

packetVirtual

Figure 4.22: Inserting virtual packets to calculate the delays of chaff packets.

By ergodicity of Y ′j ∞j=0, the CTR of BGM converges to the limiting probability

in (4.6) almost surely.

Now we calculate h0 and h∆. Let the equilibrium distribution function of Y ′j

be H(x), i.e., H(x) = limj→∞

PrY ′j ≤ x. It is shown in Example 2.16 in [8] that

h0 = H(0) =

1−λ1λ2

1−(

λ1λ2

)2e∆(λ1−λ2)

if λ1 6= λ2,

12+λ1∆

o.w.

and

h∆ = 1 − H(∆−) =

(

λ1λ2

)

e∆(λ1−λ2)(

1−λ1λ2

)

1−(

λ1λ2

)2e∆(λ1−λ2)

if λ1 6= λ2,

12+λ1∆

o.w.

Therefore, by (4.6), we have that the CTR of BGM satisfies

limt→∞

CTRBGM(t)

=

(λ2−λ1)(

1+(

λ1λ2

)

e∆(λ1−λ2))

(λ1+λ2)(

1−(

λ1λ2

)

e∆(λ1−λ2)) if λ1 6= λ2,

11+λ1∆

if λ1 = λ2,

almost surely.

¥

126


Algorithm BMR is feasible since the non-chaff part (f1, f2) satisfies the bounded

memory constraint. It remains to show the optimality.

Assume that C∗ is an optimal chaff-inserting algorithm. If M1(k − 1) = M

and sk ∈ S1, then node R2 has an arrival when the memory is full, and C∗ has to

drop at least one arriving packet at or before sk to prevent memory overflow. If

M1(k − 1) = 0 and sk ∈ S2, then R2 has a departure when the memory is empty,

so C∗ has to insert at least one dummy packet at or before sk in s2 to prevent

memory underflow. Therefore, BMR inserts no more chaff than C∗.

¥

Proof of Theorem 7

If S1 and S2 are independent Poisson processes of rates λ1 and λ2 respectively,

then it is known that the cumulative differences d(w) defined in (4.5) form

a simple random walk. Algorithm BMR assigns chaff such that the cumulative

differences d′(w) of the processes F1 and F2 satisfy 0 ≤ d′(w) ≤ M for all w.

By the memoryless property of exponential interarrival times, it is easy to see that

d′(w) is a random walk with reflecting barriers at 0 and M (i.e., a Markov chain

with state space 0, . . . , M). Its transition probabilities are shown in Fig. 4.23.

0 1 M

pppp

qqq q

Figure 4.23: The Markov chain formed by d′(w); p = λ1

λ1+λ2, q = 1 − p.

127

It is easy to see that d′(w) is an irreducible, aperiodic, and positive recurrent

Markov chain, and thus has a limit distribution (π0, . . . , πM). Since the limit

distribution satisfies

πi =λ1

λ2

πi−1, i = 1, . . . ,M,

we have

π0 =

1−λ1λ2

1−(

λ1λ2

)M+1 if λ1 6= λ2,

11+M

o.w.

πM =

(

λ1λ2

)M(

1−λ1λ2

)

1−(

λ1λ2

)M+1 if λ1 6= λ2,

11+M

o.w.

The physical meaning of d′(w) is the memory size after the transmission of the

wth packets in S1

⊕

S2. The self-loop at state 0 corresponds to chaff packets in

S2 because these transmissions occur when the memory is empty (so they have to

be dummy packets); the self-loop at state M corresponds to chaff in S1 because

the transmissions occur when the memory is full (so the packets will be dropped).

By ergodicity of d′(w), as w → ∞, the CTR of BMR converges to the limiting

probability of self-loops almost surely. The limiting probability is a weighted sum

π0q + πMp, which is equal to

(λ2−λ1)

(

1+(

λ1λ2

)M+1)

(λ1+λ2)

(

1−(

λ1λ2

)M+1) if λ1 6= λ2,

11+M

if λ1 = λ2.

¥

128


By expanding the recursions of MBDR, it can be shown that MBDR is equivalent

to an algorithm which finds the earliest sequence of relay epochs for each packet

in s1. That is, for s ∈ S1, if p = (s, t2, . . . , tn) (ti ∈ Si) is a sequence of relay

epochs for s, then MBDR finds the sequence p = (s, t2, . . . , tn) such that

1. p satisfies the causality and the bounded delay constraints;

2. ti ≤ ti (i = 2, . . . , n) for any other sequence of relay epochs that satisfies

these constraints.

We will refer to a sequence of relay epochs as a relay sequence.

A set of relay sequences preserves the order of packets if for any two sequences

(ti)ni=1 and (t′i)

ni=1 in the set, t1 ≤ t′1 implies ti ≤ t′i for all i = 2, . . . , n. We will use

the following result.

Lemma 3 Among all sets of relay sequences satisfying the constraints of causality,

packet-conservation, and bounded delay, there always exists a set which has the

largest size and preserves the order of packets.

By this lemma, it suffices to search among order-preserving sets of relay se-

quences. It remains to show that it is optimal to find the earliest relay sequences.

Let P be the set of relay sequences found by MBDR, and P∗ be the largest

and order-preserving set of relay sequences. Suppose s1 ∈ S1 has a relay sequence

p∗1 ∈ P∗ but not in P, as illustrated in Fig. 4.24. Then there must be relay

sequences in P which start earlier than s1 and partly overlap with p∗1 (otherwise,

129

MBDR would have chosen p∗1 or a sequence earlier than p∗1 for s1). Let the earliest

of these sequences be p1, with starting epoch s2 ∈ S1. For j = 2, 3, . . ., do the

following.

i) If sj does not have a relay sequence in P∗, we stop searching; otherwise,

suppose that sj has a relay sequence p∗j ∈ P∗.

ii) The sequence p∗j is at least partly earlier than pj−1 because p∗j is earlier than

p∗j−1 and p∗j−1 partly overlaps with pj−1. Since MBDR has not chosen the

earlier part of p∗j , it implies that there must be sequences in P earlier than

pj−1, which partly overlap with p∗j . Let the earliest of these sequences be pj,

with starting epoch sj+1 ∈ S1. Continue with i).

· · ·...

s1

sn

s1s2s3smsm+1

p1

p∗1p∗2

p2p∗mpm

Figure 4.24: Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗.

When we stop searching, we will either find an epoch in s1 which has a relay

sequence in P but not in P∗, or reach a relay sequence pm ∈ P which starts before

all the relay sequences in P∗. Therefore, for every relay sequence in P∗, we can

find a different sequence in P, which implies that the size of P is no smaller than

that of P∗.

¥

130

Proof of Lemma 3

The proof is by induction. As illustrated in Fig. 4.25, suppose that

(s1(1), s2(2), s3(1)) and (s1(2), s2(1), s3(2)) are relay sequences satisfying causal-

ity, packet-conservation, and bounded delay. By switching the intersected part, we

obtain two sequences (s1(1), s2(1), s3(1)) and (s1(2), s2(2), s3(2)) which satisfy

these constraints and also preserve the packet order. By repeatedly applying such

switching, we can reorganize any set of relay sequences into an order-preserving

set and maintain satisfaction of the constraints.

s1

s2

s3

s1(1) s1(2)

s2(1)

s2(2)

s3(1) s3(2)

Figure 4.25: Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets.

¥

Proof of Theorem 8

We bound the CTR of MBDR by deriving upper bounds on the probability for an

arbitrary packet in S1 to have a match. Then the result of Theorem 8 holds by

ergodicity. Compared with the first packet, subsequent packets are more difficult

to match because some of their relay epochs may have been used to relay previous

packets. Thus, it suffices to upper bound the probability for the first packet to

131

have a match. Denote this probability by Pn.

First, note that a necessary condition for the first packet at time t to have a

match is that the corresponding intervals [t, t + (i − 1)∆] in Si (i = 2, . . . , n)

in which the packet can be relayed are all nonempty. The probability for this

event is at mostn−1∏

i=1

(1− e−iλ∆) (achievable if all the processes have rate λ). Thus,

Pn ≤n−1∏

i=1

(1 − e−iλ∆).

Next, we prove by induction that Pn is also upper bounded by (λ∆)n−2(1 −

e−λ∆). For n = 2, this bound is the same as the upper bound derived above.

Assume that the result holds for Pn−1 (n ≥ 3). By writing Pn in parts with

respect to the number of epochs within delay ∆ in S2, we have

Pn ≤∞

∑

k=1

(λ∆)k

k!e−λ∆

·Prat least one of the k epochs has a match

≤∞

∑

k=1

(λ∆)k

k!e−λ∆kPn−1 (4.7)

= λ∆Pn−1,

where union bound is used to obtain (4.7). Hence, we have shown that Pn ≤

(λ∆)n−2(1 − e−λ∆).

Combining these two bounds, we have that the CTR of MBDR is lower bounded

by 1 − min(n−1∏

i=1

(1 − e−iλ∆), (λ∆)n−2(1 − e−λ∆)) a.s.

¥

132

Proof of Theorem 9

We prove the theorem by induction.

For n = 2, we have seen from Theorem 7 that the minimum CTR of MBMR is

1/(1 + M).

Assume the result holds up to n (n ≥ 2). For (n + 1)-hop flows, it suffices to

show that 1 − un+1 ≤ limt→∞

CTRMBMR(t) ≤ 1 − ln+1 a.s. when Si’s have equal rate.

This is because equal rate is the case that minimizes CTR (which can be shown

by arguments similar to Theorem 7). We prove the result by showing that the

asymptotic fraction of non-chaff packets (i.e., 1− CTR) is bounded between ln+1

and un+1.

Note that the output of a relay node is no longer a Poisson process. This

is because the probability of finding another information-carrying packet af-

ter an information-carrying packet is greater than the probability of finding an

information-carrying packet after a chaff packet. The precise model to decide

whether a packet is chaff or not is the Markov chain shown in Appendix 4.B. As a

result, the arrival process at node Rn+1 is more regular than a Poisson process of

the same rate.

For the lower bound, assuming Si’s all have rate λ, we substitute the arrival

process at node Rn+1 with a Poisson process of rate λln. Since we destroy the

regularity and may also reduce the rate (because λln is a lower bound on the

rate), this substitution gives us a lower bound on the fraction of non-chaff packets.

For this arrival process and an independent Poisson process of rate λ which is

the departing process of Rn+1, we know from the proof of Theorem 7 that the

133

asymptotic fraction of chaff packets in the departing process is

π0 =1 − λ1

λ2

1 −(

λ1

λ2

)M+1,

where λ1 = λln and λ2 = λ. Therefore, we have that the asymptotic fraction of

non-chaff packets is lower bounded by

1 − π0 = 1 − 1 − ln1 − lM+1

n

,

which is equal to ln+1.

For the upper bound, we consider the following arrival process at node Rn+1.

The process is generated by dividing points in a Poisson process of rate λ into

consecutive groups of size M/un and selecting M consecutive points from the

beginning of each group. Analogous to conventional batched processes, we refer to

the group size M/un as the period, and M as the batch size. A realization of such

a process is drawn in Fig. 4.26.

batch

periodtime

Figure 4.26: A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5.

We consider such a batched process because it maximizes the time between the

(kM)th arrival and the (kM + 1)th arrival (k ∈ N) so that it is least likely for

the memory to be overflowed. Moreover, we choose the period to make the arrival

rate equal to λun (which may be higher than the actual rate). Therefore, using

this arrival process allows us to obtain an upper bound on the fraction of non-chaff

packets.

134

Consider such an arrival process and an independent Poisson process of rate

λ. After the first arrival in a period, with probability 2−M/un , there will be no

departure until the first arrival in the next period. In this case, there are M + 1

consecutive arrivals, and thus at least 1 packet will be dropped. Hence, the fraction

of dropped packets at node Rn+1 is lower bounded by 2−M/un/(M +1), i.e., at most

1 − 2−M/un/(M + 1) fraction of the information-carrying packets arriving at Rn+1

can be successfully relayed. Since at most un fraction of the incoming packets of

Rn+1 is carrying information, the overall fraction of information-carrying packets

relayed by Rn+1 is upper bounded by

un

(

1 − 1

M + 12−M/un

)

,

which is equal to un+1.

¥

Proof of Theorem 12

We prove the theorem for bounded delay flows and bounded memory flows sepa-

rately. Here we present the proof for n = 2; the proof for n ≥ 2 is analogous.

Proof for Bounded Memory Flows

By Theorem 7, we know that the false alarm probability is maximized when λ1 =

λ2, where λi (i = 1, 2) is the rate of Si. Consider this equal rate case.

Define T1 to be the number of packets in S1

⊕

S2 until the first chaff packet,

including the first chaff packet, and Ti (i > 1) the number of packets between the

135

(i − 1)th and ith chaff packets, excluding the (i − 1)th chaff packet but including

the ith. Let C be the number of chaff packets found by BMR. Then the false alarm

probability can be written as

PF (δt) = PrC ≤ τ2N

= Prτ2N∑

i=1

Ti ≥ N

= Pr 1

τ2N

τ2N∑

i=1

Ti ≥1

τ2

. (4.8)

It is know that for Poisson processes, the cumulative differences d(k)k=1, 2,...

defined in (4.5) form a simple random walk with Prd(k) = d + 1|d(k − 1) = d =

1/2. The Markovian property implies that T1, T2, . . . are independent, and for

i ≥ 2, Ti has the same distribution as N−1, M+1 defined by

N−1, M+1∆= infk : d(k) = −1 or M + 1 | d(0) = 0. (4.9)

By Theorem 7, we know that the ratio C/N will almost surely converge to

1/(1 + M) as N → ∞, i.e., limc→∞

c/(c

∑

i=1

Ti) = 1/(1 + M) almost surely. It implies

that limc→∞

1c

c∑

i=1

Ti = 1 + M almost surely, and thus E[Ti] = 1 + M (i ≥ 2).

Now that Ti’s (i ≥ 2) are i.i.d. , by Sanov’s Theorem in [7], we have

limN ′→∞

1

N ′ log Pr 1

N ′

N ′∑

i=1

Ti ≥ 1/τ2 =

− minW : E[W ]≥1/τ2

D(W ||T2),

where N ′ = τ2N . By (4.8), we obtain that

limN→∞

1

Nlog PF (δt) = −τ2 min

W : E[W ]≥1/τ2D(W ||T2)

∆= −Γ2(τ2; M).

136

It is difficult to compute Γ2(τ2; M) directly, but the computation can be reduced

to an optimization over a single variable by Cramer’s Theorem [9]. Nevertheless, as

long as 1/τ2 > 1 + M , we have that E[W ] > E[T2], and thus Γ2(τ2; M) is positive.

By the definition of Γ2(τ2; M), it is easy to see that it is a decreasing function of

τ2.

Proof for Bounded Delay Flows

The proof for bounded delay flows is similar to that for bounded memory flows.

By Theorem 6, we see that the false alarm probability is maximized when S1 and

S2 both have the maximum rate λ. Consider this case.

Let Ti (i ≥ 1) be defined the same as in the proof for bounded memory flows.

Then the false alarm probability can be written as

PF (δt) = Pr 1

N ′

N ′∑

i=1

Ti ≥ 1/τ2, (4.10)

where N ′ = τ2N .

Let Yj be defined as in the proof of Theorem 6. We have shown that the

process Yjj=1, 2,... is a general random walk. For i ≥ 2, Ti’s are i.i.d. with the

same distribution as

2 · infj : Yj < 0 or Yj > ∆ | Y0 = 0 − 1. (4.11)

Let C be the number of chaff packets found by BGM. By Theorem 6, we have

limN→∞

C/N = 1/(1 + λ∆) almost surely. Thus, limc→∞

1c

c∑

i=1

Ti = 1 + λ∆ almost surely,

which implies that E[T2] = 1 + λ∆.

137

By Sanov’s Theorem [7], we have

limN ′→∞

1

N ′ log Pr 1

N ′

N ′∑

i=1

Ti ≥ 1/τ2 =

− minW : E[W ]≥1/τ2

D(W ||T2).

Plugging in (4.10) yields that

limN→∞

1

Nlog PF (δt) = −τ2 min

E[W ]≥1/τ2D(W ||T2)

∆= −Γ2(τ2; λ, ∆).

For 1/τ2 > 1 + λ∆, we have that E[W ] > E[T2], and therefore Γ2(τ2; λ, ∆) >

0. As τ2 increases, the minimization is over a larger set, and thus Γ2(τ2; λ, ∆)

decreases. This completes the proof.

¥

Proof of Theorem 13

The proof utilizes the ideas in the proofs of Theorem 6 and 7.

The classical Pareto distribution (see [28]) with shape parameter β (β ≥ 0)

and location parameter α (α ≥ 0) has the probability density function

p(x) = βαβx−β−1, x ≥ α.

This distribution has a property that the conditional expectation E[X − x|X ≥ x]

is an increasing function of x.

For information flows with bounded memory, consider the cumulative differ-

ences d′(w)∞w=0 between the processes of matched epochs found by BMR, as

138

defined in the proof of Theorem 7. The CTR of BMR is the frequency of self-loops

in d′(w)∞w=0, as illustrated in Fig. 4.23. Unlike exponential distribution, Pareto

distribution has memory, and the resulting d′(w)∞w=0 is not Markovian. Note,

however, that the memory of interarrival times makes it easier to reach the states

0 and M and generate self-loops. This is because whenever d′(w) increases by 1,

the next arrival is more likely to be in S1 (since the arrival in S2 has waited for

some time and it is likely to wait even longer), and thus d′(w) is likely to keep

increasing. Hence the average time to reach 0, M is shorter than that for Pois-

son processes. At the state 0 (or M), the same argument implies that it is more

likely to take more self-loops after a self-loop. Therefore, BMR inserts more chaff

noise in independent renewal processes with Pareto interarrival distributions than

in independent Poisson processes.

For information flows with bounded delay, similar arguments hold. The process

Y ′j ∞j=0 defined in the proof of Theorem 6 is no longer a Markov process under

Pareto interarrival distributions, but we can show that the endpoints 0 and ∆ are

visited more frequently and therefore produce more chaff noise.

¥

139

APPENDIX 4.B

ASYMPTOTIC CTR OF MBMR

Here we show how to calculate the CTR of MBMR by a Markov chain. In

particular, we are interested in computing βMn (n ≥ 2). Assume the processes are

independent and Poisson under H0.

If S1, . . . , Sn are independent Poisson processes, then the vectors

(Mi(k))n−1i=1 ∞k=0 computed by MBMR form an (n − 1)-dimensional homogeneous

Markov chain. By arguments similar to that in the proof of Theorem 7, it can be

shown that the CTR is minimized when all Si’s have equal rate, in which case the

CTR of MBMR is βMn . We will focus on the equal rate case although the method

is easily generalizable to arbitrary rates.

If Si (i = 1, . . . , n) have equal rate, then the transition probabili-

ties of (Mi(k))n−1i=1 are as follows. Denote the transition probability by

Prmn−11 |m′n−1

1 , where mn−11 , m′n−1

1 ∈ 0, . . . , Mn−1, and (mi, . . . , mj) (i ≤ j)

by mji . For 2 ≤ i ≤ n − 1, mi−1 > 0, and mi < M ,

Pr(mi−21 , mi−1 − 1, mi + 1, mn−1

i+1 |mn−11 ) =

1

n;

for m1 < M ,

Pr(m1 + 1, mn−12 |mn−1

1 ) =1

n;

for mn−1 > 0,

Pr(mn−21 , mn−1 − 1|mn−1

1 ) =1

n;

140

moreover,

Pr(mn−11 |mn−1

1 ) =1

n

·(

Im1=M +n−1∑

i=2

Imi−1=0 ∨ mi=M + Imn−1=0

)

.

According to MBMR, each self-loop corresponds to a chaff packet, and therefore

the CTR is equal to the probability of self-loops in the equilibrium distribution.

That is, if π is the equilibrium distribution of (Mi(k))n−1i=1 , then the CTR of

MBMR converges to the limiting probability of self-loops, denoted by ηn, almost

surely, where

ηn =∑

mn−11 ∈0,...,Mn−1

π(mn−11 ) Pr(mn−1

1 |mn−11 ).

For example, for n = 3 and M = 2, (M1(k), M2(k)) (k ≥ 0) follows the Markov

chain in Fig. 4.27. Here η3 = 13( 1

15+ 2×4

45+ 2×1

9) + 2

3(2×4

45+ 2

9) = 19

45. This is the

CTR of MBMR for 3-hop information flows with memory sizes bounded by 2, i.e.,

β23 = 19

45.

0, 0 2, 0

0, 1 1, 1 2, 1

0, 2 1, 2 2, 2

13

13

13

13

13

13

23

23

23

115

445

445

445

445

19

19

215

29

29

Figure 4.27: The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15.

141

APPENDIX 4.C


Chaff-inserting Algorithm for Two-hop Bounded Delay

Flows

For the algorithm BGM presented in Section 4.4.1, we combine the insertion of

chaff and the matching of information-carrying packets into the implementation

presented in Table 4.4.

Table 4.4: Bounded-Greedy-Match (BGM).

Bounded-Greedy-Match(s1, s2, ∆):m = n = 1;while m ≤ |S1| and n ≤ |S2|

if s2(n) − s1(m) < 0s2(n) = chaff; n = n + 1;

else if s2(n) − s1(m) > ∆s1(m) = chaff; m = m + 1;

elsematch s1(m) with s2(n);m = m + 1; n = n + 1;

endend

end

This implementation of BGM uses two pointers m and n to record the current

epochs examined in s1 and s2, and keeps updating m and n depending on whether

the match is successful or not. Its complexity is O(|S1| + |S2|).

142

Chaff-inserting Algorithm for Multi-hop Bounded Delay

Flows

Implementation of the algorithm MBDR presented in Section 4.5.1 is presented in

Table 4.5. The complexity of such an direct implementation is O((λ∆)n|S1|) (λ

is the maximum rate of S1, . . . , Sn).

Table 4.5: Multi-Bounded-Delay-Relay (MBDR).

Multi-Bounded-Delay-Relay(s1, . . . , sn, ∆):for k = 1 : |S1|

match of s1(k) = MBDR1(s1(k), 1, s1, . . . , sn, ∆);if match of s1(k) = ∅

s1(k) = chaff;end

end

MBDR1(s, i, s1, . . . , sn, ∆):for t ∈ Si+1 ∩ [s, s + ∆]

match of t = MBDR1(t, i + 1, s1, . . . , sn, ∆);if match of t = ∅

t = chaff;else

return t;end

endreturn ∅;

Performance of recursive algorithms can often be improved by expanding re-

cursions. An implementation of expanded MBDR is shown in Table 4.6. The

complexity of this implementation is12 O(n2|S1|).12The dominating step is the recursive computation of Ci, j ’s. Suppose that the maximum

rate of S1, . . . , Sn is λ, and thus there are at most (i − 1)λ∆ points in Ci, j on the average.The selection of these points takes (2i − 3)λ∆ steps. The total complexity can be calculated by

|S1|n∑

i=2

(2i − 3)λ∆ = λ∆(n − 1)2|S1|.

143

Table 4.6: Expanded-Multi-Bounded-Delay-Relay (E-MBDR).

Expanded-Multi-Bounded-Delay-Relay(s1, . . . , sn, ∆):(pi)

ni=1 = (0, . . . , 0);

for j = 1 : |S1|C1, j = s1(j);for i = 1 : n − 1

for all s ∈ Ci, j in increasing orderfor all t ∈ Si+1 ∩ [s, s + ∆], t > pi+1, and t 6∈ Ci+1, j

t.predecessor = s;add t to the set Ci+1, j;

endend

endif Cn, j 6= ∅

tn = min(Cn, j);for i = n − 1 : −1 : 1

ti = ti+1.predecessor;end(ti)

ni=1 is the sequence of relay epochs for s1(j);

(pi)ni=1 = (ti)

ni=1;

endend

for all s ∈n⋃

i=1

Si and s 6= any selected relay epoch

s = chaff;end

Chaff-inserting Algorithm for Two-hop Bounded Memory

Flows

A pseudo code implementation of BMR presented in Section 4.4.2 is given in

Table 4.7.

Note that once BMR marks out the chaff packets, the order in which

information-carrying packets are transmitted is irrelevant as far as the memory

constraint is concerned. The complexity of BMR is only O(|S1| + |S2|).

144

Table 4.7: Bounded-Memory-Relay (BMR).

Bounded-Memory-Relay(s1, s2, M):s = s1

⊕

s2;d = 0;for w = 1 : |S |

if (d = M and s(w) ∈ S1) or (d = 0 and s(w) ∈ S2)s(w) = chaff;

else if s(w) ∈ S1

d = d + 1;else

d = d − 1;end

endend

Chaff-inserting Algorithm for Multi-hop Bounded Memory

Flows

Algorithm MBMR is a direct generalization of BMR. Its implementation is given

in Table 4.8.

Algorithm MBMR has complexity O(n∑

i=1

|Si|). It uses Mi (i = 1, . . . , n− 1) to

record the number of packets stored in node Ri+1. The algorithm keeps updating

Mi’s and guarantees that each Mi is always between 0 and M , which implies that

the scheduling found by MBMR satisfies the bounded memory constraint.

Detection Algorithm for Two-hop Bounded Delay Flows

Algorithm “Detect-Bounded-Delay” (DBD) is derived to detect 2-hop information

flows with bounded delay. It does detection with the help of the optimal chaff-

145

Table 4.8: Multi-Bounded-Memory-Relay (MBMR).

Multi-Bounded-Memory-Relay(s1, . . . , sn, M):s = s1

⊕ · · ·⊕ sn;(M1, . . . , Mn−1) = (0, . . . , 0);for w = 1 : |S |

if s(w) ∈ S1

if M1 < MM1 = M1 + 1;

elses(w) = chaff;

endelse if s(w) ∈ Sn

if Mn−1 > 0Mn−1 = Mn−1 − 1;

elses(w) = chaff;

endelse

let i (1 < i < n) be such that s(w) ∈ Si;if Mi−1 > 0 & Mi < M

Mi−1 = Mi−1 − 1;Mi = Mi + 1;

elses(w) = chaff;

endend

endend

inserting algorithm BGM.

Given measurements (s1, s2), DBD

1. calculates C, the number of chaff packets assigned by BGM in s1

⊕

s2 but

excluding chaff in S2 ∩ [0, ∆);

2. returns H1 if the ratio of C and the total sample size is less than or equal to

1/(1 + λ′∆) (λ′ is a design parameter), and otherwise returns H0.

146

Implementation of DBD is presented in Table 4.9. The complexity of DBD is

O(N), where N is the joint sample size, i.e., the total number of examined packets

in s1

⊕

s2.

Table 4.9: Detect-Bounded-Delay (DBD).

Detect-Bounded-Delay(s1, s2, ∆, N, λ′):i = j = 1;C = 0;while i + j ≤ N

if s2(j) − s1(i) < 0if s2(j) ≥ ∆

C = C + 1;endj = j + 1;

else if s2(j) − s1(i) > ∆C = C + 1; i = i + 1;

elsei = i + 1; j = j + 1;

endend

end

return

H1 if CN

≤ 11+λ′∆

,

H0 o.w.;

Suppose H1 is true. Then the actual number of chaff packets in s1

⊕

s2 has to

be no smaller than C because BGM is optimal, and chaff packets in [0, ∆) in s2 have

been ignored (because they may be the relay packets of packets arriving before the

detector starts). It means that the actual CTR has to be more than 1/(1 + λ′∆)

to evade DBD. Therefore, DBD has no miss detection for CTR≤ 1/(1 + λ′∆).

147

Detection Algorithm for Multi-hop Bounded Delay Flows

We extend DBD to multiple hops by utilizing the multi-hop chaff-inserting al-

gorithm MBDR. The algorithm, called “Detect-Multi-Bounded-Delay” (DMBD),

works as follows.

Given measurements (s1, . . . , sn), DMBD

1. calculates C, the number of chaff packets found by MBDR, excluding chaff

packets in the beginning (i − 1)∆ period of si for i = 1, . . . , n;

2. returns H1 if the ratio between C and the total sample size is bounded by

τn (τn is a design parameter); otherwise, returns H0.

See Table 4.10 for an implementation of DMBD based the extended version of

MBDR. The complexity of this implementation is O(nN).

Since MBDR inserts the minimum number of chaff packets, and chaff packets

at the beginning of si are ignored (because they may be the relay packets of

information-carrying packets sent before the detector starts), C is always a lower

bound on the actual number of chaff packets, which means that the actual CTR

has to be larger than τn to evade DMBD. Therefore, DMBD has no miss detection

for CTR≤ τn.

Detection Algorithm for Two-hop Bounded Memory Flows

Algorithm “Detect-Bounded-Memory” (DBM) detects 2-hop information flows

with bounded memory based on the chaff-inserting algorithm BMR.

148

Table 4.10: Detect-Multi-Bounded-Delay (DMBD).

Detect-Multi-Bounded-Delay(s1, . . . , sn, ∆, N, τn):C = 0; K1 = 0;(J1, . . . , Jn) = (I1, . . . , In) = (0, . . . , 0);for i = 2 : n

Ki = supk : si(k) < (i − 1)∆;endj = 1;

whilen∑

i=1

Ji < N & j ≤ |S1|C1, j = s1(j);for i = 1 : n − 1

for all s ∈ Ci, j in increasing orderfor all t ∈ Si+1 ∩ [s, s + ∆], t > si+1(Ji+1), and t 6∈ Ci+1, j

t.predecessor = s; add t to the set Ci+1, j;end

endendif Cn, j 6= ∅

In = mink : sn(k) ∈ Cn, j;for i = n − 1 : −1 : 1

Ii is such that si(Ii) = si+1(Ii+1).predecessor;end

C = C +n∑

i=1

(Ii − max(Ji, Ki) − 1);

(J1, . . . , Jn) = (I1, . . . , In);endj = j + 1;

end

C = C + max(N −n∑

i=1

max(Ji, Ki), 0);

N = max(n∑

i=1

Ji, N);

return

H1 if CN

≤ τn

H0 o.w.;

149

Given measurements (s1, s2), DBM

1. calculates d(w) (w = 1, 2, . . .), the cumulative difference between s1 and s2

defined in (4.5) (d(0) = 0);

2. if v(w)∆= max

0≤k≤wd(k) − min

0≤k≤wd(k) is less than M for all w, returns H1;

otherwise, computes the smallest index w∗ such that v(w∗) = M ; let

du = max0≤k≤w∗

d(k) and dl = min0≤k≤w∗

d(k);

3. calculates C, the number of chaff packets assigned by BMR to keep the

variable d between dl and du (the original BMR keeps d between 0 and M);

4. returns H1 if the ratio of C and the total sample size is bounded by 1/(1+M ′)

(M ′ is a design parameter); otherwise, returns H0.

Implementation of DBM is given in Table 4.11. Algorithm DBM has complexity

O(N).

Table 4.11: Detect-Bounded-Memory (DBM).

Detect-Bounded-Memory(s1, s2, M, N, M ′):s = s1

⊕

s2;d = dmax = dmin = 0;C = 0;for w = 1 : N

if (s(w) ∈ S1, d − dmin = M) or (s(w) ∈ S2, dmax − d = M)C = C + 1;

else

d =

d + 1 if s(w) ∈ S1,d − 1 if s(w) ∈ S2;

dmax = max(dmax, d);dmin = min(dmin, d);

endend

return

H1 if CN

≤ 11+M ′ ,

H0 o.w.;

150

It is shown in [19] that the actual number of chaff packets in s1

⊕

s2 is lower

bounded by C. It implies that DBM has no miss detection for realizations of

information flows with CTR up to 1/(1 + M ′).

Detection Algorithm for Multi-hop Bounded Memory Flows

We extend DBM to a joint detection algorithms called “Detect-Multi-Bounded-

Memory” (DMBM) based on the chaff-inserting algorithm MBMR.

Given measurements (s1, . . . , sn), DMBM

1. for i = 1, . . . , n − 1, calculates vi(w)∆= max

0≤k≤wdi(k) − min

0≤k≤wdi(k) for w =

1, 2, . . ., where di(k) is the cumulative difference between si and si+1;

2. if vi(w) < M for all w, Ui = ∞ and Li = −∞; otherwise, let Ui =

max0≤k≤w∗

di(k), and Li = min0≤k≤w∗

di(k), where w∗ = infw : vi(w) ≥ M;

3. calculates C, the number of chaff packets assigned by MBMR to keep the

variable Mi (i = 1, . . . , n − 1) between Li and Ui (originally, MBMR keeps

Mi between 0 and M);

4. returns H1 if the ratio of C and the total sample size is bounded by τn (τn

is a design parameter); otherwise, returns H0.

Implementation of DMBM is presented in Table 4.12. The algorithm has com-

plexity O(N). The value of C is the number of times that memory overflow or

underflow would have occurred if chaff packets had not been inserted. Since the

actual number of chaff packets is at least C, the actual CTR has to be larger than

τn to evade DMBM.

151

Table 4.12: Detect-Multi-Bounded-Memory (DMBM).

Detect-Multi-Bounded-Memory(s1, . . . , sn, M, N, τn):s = s1

⊕ · · ·⊕ sn;(M1, . . . , Mn−1) = (U1, . . . , Un−1) = (L1, . . . , Ln−1) = (0, . . . , 0);C = 0;for w = 1 : N

i is such that s(w) ∈ Si;if i = 1

if M1 − L1 < MM1 = M1 + 1; U1 = max(U1, M1);

elseC = C + 1;

endelse if i = n

if Un−1 − Mn−1 < MMn−1 = Mn−1 − 1; Ln−1 = min(Ln−1, Mn−1);

elseC = C + 1;

endelse if Ui−1 − Mi−1 < M & Mi − Li < M

Mi−1 = Mi−1 − 1; Mi = Mi + 1;Li−1 = min(Li−1, Mi−1);Ui = max(Ui, Mi);

elseC = C + 1;

endend

endend

return

H1 if CN

≤ τn,H0 o.w.;

152

Chapter 5

Distributed Detection of Information

Flows

5.1 Outline

In the previous chapters, precise timing measurements have been used for de-

tection. In wide-area networks (e.g.,wireless sensor networks), there are usually

constraints on the communication rates between the points of measurements and

the detector. This chapter addresses this issue in the framework of distributed

detection. The rest of the chapter is organized as follows. Section 5.2 formulates

the problem. Section 5.3 defines the performance criteria and gives some theo-

retical results on the performance of general detection systems. Sections 5.4–5.6

are dedicated to practical detection systems, where Section 5.4 defines two simple

quantizers, Section 5.5 presents optimal chaff-inserting and detection algorithms

for each quantizer, and Section 5.6 analyzes and compares the performance of the

proposed detection systems. Then Section 5.7 concludes the chapter with a few

remarks.

153

5.2 The Problem Formulation

5.2.1 Problem Statement

In a wireless ad hoc network as illustrated in Fig. 5.1, nodes A and B may be car-

rying an information flow. If the nodes are transmitting an information flow, then

their transmission activities Si (i = 1, 2) can be decomposed into an information

flow (F1, F2) and chaff noise Wi, i.e., Si = Fi ⊕Wi (referred to as containing an

information flow). As in Chapter 4, we allow Wi to be any process, and it may

be correlated with Fi. In this chapter, we only consider information flows with

bounded delay ∆ defined in Definition 7.

A B

S1: S2:

Uplinkchannels

Detector

Wireless node Eavesdropper

Figure 5.1: In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at the fusioncenter.

We are interested in testing the following hypotheses:

H0 : S1, S2 are independent,

H1 : (S1, S2) contains an information flow,(5.1)

by observing measurements compressed from Si (i = 1, 2). Assume that the

154

maximum delay ∆ is known. Moreover, assume that the marginal distributions of

Si (i = 1, 2) are known, and they are the same under both hypotheses (detailed

analysis is done for Poisson processes). This is a partially nonparametric hypothesis

testing because no statistical assumptions are imposed on the correlation of S1 and

S2 under H1.

We point out that the assumption that Si (i = 1, 2) have the same distributions

under both hypotheses is not a limiting assumption. This is because otherwise, an

eavesdropper can independently make a decision based on its own measurements

(e.g.,by the Anderson-Darling test [28]) and send the result (a 1-bit message) to

the fusion center, and the error probabilities can be made arbitrarily small if there

are enough measurements.

5.2.2 System Architecture

The capacity constraints in the uplink channels make it necessary to incorporate

quantizers q(t)i (i = 1, 2) at the eavesdroppers, where t is the duration of the

observation. As illustrated in Fig. 5.2, the processes Si (i = 1, 2) are compressed

into q(t)i (Si) which are delivered to the fusion center, and then the detector makes

a decision in the form of

θt = δt(q(t)1 (S1), q

(t)2 (S2)),

where1 θt ∈ 0, 1. The capacity constraints are expressed as2

||q(t)i || ≤ etRi , i = 1, 2 (5.2)

1The value 0 denotes H0, and 1 denotes H1.2The unit of Ri (i = 1, 2) is nats per unit time.

155

for sufficiently large t, where ||q(t)i || is the alphabet size of the output of q

(t)i .

Generally, R1, R2 < ∞, but if the detector is located at one of eavesdroppers,

e.g.,the eavesdropper of node B, then R2 = ∞, which is called the case of full

side-information.

S1

S2

q(t)1

q(t)2

U ∈ 1, . . . , etR1

V ∈ 1, . . . , etR2 δt

θt

Figure 5.2: A distributed detection system. This system consists of twoquantizers q

(t)1 and q

(t)2 and a detector δt.

Given (R1, R2), the problem is to design q(t)i (i = 1, 2) and δt such that the

overall detection performance is optimized.

5.3 Performance Criteria

In this section, we define the criteria for evaluating detection performance and

present theoretical results on the optimal performance under the proposed criteria.

5.3.1 Level of detectability

The detection performance in classical multiterminal hypothesis testing is usually

evaluated by the error exponents [16]. In our problem, the alternative hypothesis

is nonparametric, which makes it improper to adopt the error exponent criterion.

Instead, we measure the performance by the notion of consistency defined in Defini-

tion 11. The optimal performance establishes a level of detectability of information

flows as follows.

156

Given capacity constraints (R1, R2), we characterize the extent to which infor-

mation flows are detectable by a notion called the level of detectability, denoted by

α(R1, R2), which is defined as

α(R1, R2)

∆= supr ∈ [0, 1] : ∃(q

(t)1 , q

(t)2 , δt) :

1) δt is r-consistent;

2) lim supt→∞

1

tlog ||q(t)

i || ≤ Ri, i = 1, 2.. (5.3)

That is, α(R1, R2) is the maximum consistency of all the detection systems under

capacity constraints (R1, R2). If Ri = ∞ (i = 1, 2), then this definition is reduced

to the level of strong detectability in centralized detection (see Definition 12). Our

goal is to design quantizers and detectors to achieve α(R1, R2).

Before concluding the introduction to our performance measure, we would like

to show an example which explains why our approach deviates from the classical

approaches.

Example Consider an alternative formulation where we assume Si (i = 1, 2)

are renewal processes under both hypotheses, i.e., the interarrival times Kj∆=

S1(j + 1)−S1(j) and Lj∆= S2(j + 1)−S2(j) (j = 1, 2, . . .) are i.i.d. , respectively.

Moreover, assume that the process of epoch pairs (S1(j), S2(j))∞j=1 is also renewal,

i.e., (Kj, Lj) (j ≥ 1) are i.i.d. with some distribution PKL. The testing hypotheses

are

H0 : PKL = PK PL, H1 : PKL 6= PK PL, (5.4)

where PK PL is the product distribution with the same marginals as PKL, defined

by PK PL(k, l) = PK(k)PL(l) (i.e., Kj and Lj are independent). This is a

testing against dependence problem under multiterminal data compression. By

157

similar techniques as in the testing against independence problem in [1], one can

develop the optimal test of (5.4) to minimize the error probabilities. The problem

is, however, that this is not the problem we want to solve in information flow

detection. Specifically, there are simple strategies to manipulate the information

flows such that the optimal test of (5.4) fails. For example, consider the scenario

in Fig. 5.3, where S2 is a identical copy of S1 except that a chaff packet is inserted

at the beginning. Then the subsequent observations of interarrival times will be

misaligned. In particular, for j ≥ 3, the jth pair of interarrival times becomes

(Kj, Lj) = (Kj, Kj−1). Since Kj’s are independent, the test of (5.4) will fail to

detect such an obvious information flow.

S1

S2

Chaff

K1

K2 K3 K4K5

L1 L2L3 L4 L5

Figure 5.3: Inserting one chaff packet can destroy the alignment of measurements.

The notion of consistency prevents obvious mistakes as in the above example

by guaranteeing that it is possible to have non-vanishing miss probability only if

a sufficient amount of chaff noise is inserted.

5.3.2 Level of Undetectability

Since the eavesdroppers cannot distinguish chaff noise from information flows, there

is a limit on the amount of chaff noise beyond which an information flow can be

158

made statistically identical with traffic under H0. We use this limit to measure

the level of undetectability. For centralized detection, the level of undetectability

is defined as the minimum CTR for an information flow to mimic the distribu-

tions under H0; see (4.4). For distributed detection, the distributions seen by the

detector depend on the quantizers, and so does the level of undetectability.

For deterministic quantizers3 qi (i = 1, 2), the level of undetectability is defined

as the minimum CTR required to mimic H0 after quantization, i.e.,

φ(H0; q1, q2)

∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :

1) Fi

⊕

Wid= Si, i = 1, 2, and

(qi(Fi

⊕

Wi))2i=1

d= (qi(Si))

2i=1 for some

(Si)2i=1 under H0;

2) (F1, F2) is an information flow;

3) lim supt→∞

CTR(t) ≤ r a.s.. (5.5)

With proper perturbations and φ(H0; q1, q2) fraction of chaff noise, an information

flow can appear to be the same as traffic under H0 to both the eavesdroppers and

the detector. Therefore, the maximum consistency under quantizers (q1, q2) is

upper bounded by φ(H0; q1, q2).

Generally, the quantization schemes may involve randomization. A random-

ized quantizer of S1 is a set of conditional distributions Q1(x|s1), where s1 is a

realization of S1, x ∈ X ∞ for a finite or countable alphabet X , and Q1(x|s1) is

the probability of quantizing s1 to x. A randomized quantizer Q2(y|s2) of S2 is

3Each qi can be viewed as the limit of a sequence of deterministic quantizers q(t)i t≥0 as t

increases.

159

defined similarly. Given (Q1, Q2), the level of undetectability is defined as

φ(H0; Q1, Q2)

∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :

1) Fi

⊕

Wid= Si, i = 1, 2, and

(X, Y)|(Fi

⊕

Wi)2i=1

d= (X, Y)|(Si)2i=1

for

some (Si)2i=1 under H0;

2) (F1, F2) is an information flow;

3) lim supt→∞

CTR(t) ≤ r a.s.. (5.6)

where (X, Y)|(Si)2i=1is the marginal of (X, Y) in (X, Y, S1, S2) specified by

the distribution of (S1, S2) and the conditional distribution Q(X, Y|S1, S2) =

Q1(X|S1)Q2(Y|S2). Note that we can write the conditional distribution in product

form because the quantization of the two processes is independent, i.e., X → S1 →

S2 → Y. Similar to φ(H0; q1, q2), φ(H0; Q1, Q2) gives an upper bound on the

maximum consistency under (Q1, Q2).

5.3.3 General Converse and Achievability

Given capacity constraints (R1, R2), we are interested in finding the value of

α(R1, R2) and designing detection systems to achieve it. In Chapter 4, we have

answered these questions for infinite capacities. Now we provide high-level answers

for finite capacities.

Theorem 14 For any Ri ≥ 0 (i = 1, 2),

α(R1, R2) ≤ maxP1

φ(H0; Q1, Q2), (5.7)

160

where4

P1 = (Q1(X|S1), Q2(Y|S2)) : lim supt→∞

1

tI(S1; X) ≤ R1,

lim supt→∞

1

tI(S2; Y) ≤ R2..

Furthermore, let Q∗1 and Q∗

2 achieve the maximum in (5.7), and (F∗i , W

∗i ) (i = 1, 2)

achieve φ(H0; Q∗1, Q∗

2) as defined in (5.6) without the requirement that Fi

⊕

Wid=

Si (i = 1, 2). If Q∗1 and Q∗

2 are deterministic, and the CTR of (F∗i

⊕

W∗i )

2i=1

converges a.s. to some value α∗(R1, R2), then

α(R1, R2) ≥ α∗(R1, R2).


¥

Remark: The theorem contains a converse result and an achievability result.

The converse result states that the level of detectability under certain capacity

constraints is no more than the maximum level of undetectability over all the

quantizers satisfying the capacity constraints. The achievability result gives a lower

bound on the level of detectability by constructing a specific detection system with

consistency equal to α∗(R1, R2).

It can be shown that solving the maximization in (5.7) is equivalent to com-

puting a distortion rate function with distortion measure

φ(H0) − φ(H0; Q1, Q2),

4Note that P1 is well-defined because Si (i = 1, 2) have the same distributions under bothhypotheses.

161

which characterizes the performance loss due to quantization by Qi (i = 1, 2). How

to compute this distortion rate function is an open problem because the distortion

measure is not single-letter (and it is a function of distributions). Instead, we will

develop practical detection systems and analyze their performance to give lower

bounds on α(R1, R2).

5.4 Quantizer Design

The design of quantizers q(t)i (i = 1, 2) is complicated by the dependency on t.

To simplify design, we partition the observation into n slots of equal length T

(T = t/n) and perform independent and identical quantization in each slot. We

propose the following quantizers based on the counting measure.

Definition 14 Given a point process S, a slotted quantizer with slot length T is

defined as γ(S)∆= (Z1, Z2, . . .), where Zj (j ≥ 1) is the number of points in the

jth slot (i.e., the interval [(j − 1)T, jT )) of S.

The slotted quantizer was first used to compress Poisson processes by Rubin in

[31], where combined with proper reconstruction methods, it was shown to achieve

compression performance close to the optimal predicted by the rate distortion

function under the single-letter absolute-error fidelity criterion. Note that it does

not imply that slotted quantizer is optimal or near optimal in our problem because

our fidelity criterion is different. We refer to the quantization by a slotted quantizer

as slotted quantization. It is easy to see that the above definition is equivalent to

the point-wise quantizer γ(t) = ⌊t/T ⌋, where t ∈ R+.

162

For applications requiring extremely low rate, it may be desirable to further

compress the results of slotted quantization. To this end, we propose the following

quantizer.

Definition 15 Given a point process S, a one-bit quantizer is a binary quantiza-

tion of the output of a slotted quantizer, defined as

γ(S) = (IZj>0)∞j=1,

where Z = γ(S), and I· is the indicator function.

Quantization by a one-bit quantizer is called one-bit quantization. The rate of

one-bit quantizer decays at O(1/T ) as T → ∞.

Hereafter, we will refer to the quantization results of S1 and S2 by Xn = (Xj)nj=1

and Yn = (Yj)nj=1, respectively, the meaning of which will depend on the quantizers

used. For the full side-information case (i.e., R2 = ∞), we will use Y (s, t) to denote

the number of epochs in S2 in the interval [s, t).

If Si (i = 1, 2) are Poisson processes, then Xj’s and Yj’s are i.i.d. , and it is

known that they can be delivered almost perfectly under the capacity constraints

in (5.2) if and only if

H(X1)

T≤ R1,

H(Y1)

T≤ R2. (5.8)

5.5 Detection Algorithms

In this section, we will present detectors for each of the quantization schemes pro-

posed in Section 5.4 and analyze their consistency. The detectors borrow the idea

163

in centralized detection, i.e., the detector should compute the minimum fraction

of chaff noise needed to generate the received measurements and make detection if

this fraction is suspiciously small. Optimal chaff-inserting algorithms are developed

to compute the minimum fraction of chaff.

In the rest of this section, we will discuss the following four cases: I) q1 is

a slotted quantizer, q2 is an identity function (full side-information); II) q1 and

q2 are both slotted quantizers; III) q1 is a one-bit quantizer, q2 is an identity

function; IV) q1, q2 are both one-bit quantizers. In Cases II and IV, equal capacity

constraints (R1 = R2) are considered for simplicity, although the idea of detection

is generalizable to unequal constraints. Since the level of detectability in high

capacity regime is already known, our analysis will focus on the low capacity (i.e.,

large slot length) regime.

5.5.1 Case I: Slotted Quantization, Full Side-Information

Consider the case when q1 is a slotted quantizer and q2 is an identity function.

Assume that the capacities are sufficient to permit reliable delivery of quantized

measurements. Then the detector needs to make a decision based on the measure-

ments xn and s2.

We want to insert the minimum chaff noise to mimic given (xn, s2), i.e., we want

to find realizations of an information flow (fi)2i=1 and chaff noise wi (i = 1, 2) such

that i) xn = γ(f1⊕

w1), s2 = f2⊕

w2, and ii) the CTR is minimized. If both s1

and s2 are given, then the optimal chaff-inserting algorithms is BGM presented in

Section 4.4.1. Now that we only know xn and s2, the idea is to reconstruct s1 from

xn and apply BGM on the reconstructed processes. Based on this idea, we develop

164

a chaff-inserting algorithm called “Insert Chaff: Slotted, Full side-information”

(IC-SF) as follows. Given (xn, s2), IC-SF does the following:

1. construct a point process s1 as busts of xj simultaneous epochs at (j − 1)T

(j ≥ 1), as illustrated in Fig. 5.4;

2. run BGM on (s1, s2) with delay bound T + ∆.

s1

s2

xn: 22

∆ 2T

0

0 T

chaff

Figure 5.4: IC-SF: Match s1 with s2 subject to delay bound T + ∆. : directlyobserved epoch in s2; •: reconstructed epoch in s1.

The optimality of IC-SF is provided by the following proposition.

Proposition 16 Algorithm IC-SF inserts the minimum number of chaff packets

to make an information flow mimic any (xn, s2) under the quantization in Case I.


¥

Since IC-SF is optimal, we can compute the minimum number of chaff packets

to mimic the measurements (xn, s2) using IC-SF. This idea leads to the following

detector.

165

Given (xn, s2), define a detector δI as

δI(xn, s2) =

1 if CI/N ≤ τI,

0 o.w.,

where N =n∑

j=1

xj + |S2|, and CI is the number of chaff packets found by IC-SF in

(xn, s2), excluding chaff packets in5 S2 ∩ [0, ∆). The implementation of δI can be

found in Appendix 5.B.

The actual number of chaff packets has to be at least CI, and therefore, δI has

vanishing miss probability for all the information flows with CTR bounded by τI

a.s.

The false alarm probability of δI is guaranteed by the following theorem.

Theorem 15 If under H0, S1 and S2 are independent Poisson processes of rates

bounded by λ, and T is large, then for any τI < 1√πλT

− ∆4T

, the false alarm probability

of δI decays exponentially with n.


¥

Theorem 15 tells us how to choose τI to have exponentially decaying false alarm

probability. Combining the theorem with our discussion on miss probability, we see

that a proper choice of threshold will enable δI to be r-consistent for r arbitrarily

close to

αI(T )∆=

1√πλT

− ∆

4T≈ 1√

πλT(5.9)

5This is because packets in this interval may be relays of packets transmitted before thedetector starts taking observations.

166

fraction of chaff noise, i.e., the consistency of δI is lower bounded by αI(T ). As

expected, αI(T ) is a decreasing function of T .

If we fix the quantization scheme as slotted quantization, then the capacity

constraints affect detection only through T . It is known that for Poisson processes

of maximum rate λ, the rate6

RI(T )∆= H(Poi(λT ))/T (5.10)

suffices to reliably deliver Xn for large n. By Gaussian approximation, we see

that H(Poi(λT ))/T ≈ log (2πeλT )/(2T ) for large T , i.e., the required rate under

slotted quantization decreases as O(log T/T ).

Combining Theorem 15 and (5.10) gives us an achievable rate-consistency pair

for each T , denoted by (RI(T ), αI(T )). Given a capacity constraint R, the achiev-

able consistency-rate function can be obtained by αI(R−1I (R)). The consistency-

rate functions for the other cases (i.e., Case II–IV) are characterized similarly.

5.5.2 Case II: Slotted Quantization, Equal Capacity Con-

straints

Consider equal capacity constraints (R1 = R2 = R < ∞), and qi (i = 1, 2) are

both slotted quantizers of the same slot length T . We follow the procedure in Case

I to develop a detector for this scenario.

We develop an optimal chaff-inserting algorithm called “Insert Chaff: Slotted,

Equal capacities” (IC-SE) based on ideas similar to IC-SF. Given (xn, yn), IC-SE

works as follows:6Here H(Poi(λT )) is the entropy of Poisson distribution with mean λT .

167

1. construct point processes si (i = 1, 2) as bursts of xj (or yj) simultaneous

points at (j − 1)T for j ≥ 1;

2. run BGM on (s1, s2) with delay bound ⌈∆T⌉T .

Algorithm IC-SE is optimal in minimizing the number of chaff packets, as stated

in the following proposition.

Proposition 17 For any (xn, yn), IC-SE inserts the minimum number of chaff

packets to make an information flow mimic these observations under the quanti-

zation in Case II.


¥

Algorithm IC-SE provides a method to compute the minimum amount of chaff

noise in the measurements, based on which we design a detector as follows.

Given (xn, yn), define a detector δII as

δII(xn, yn) =

1 if CII/N ≤ τII,

0 o.w.,

where N =n∑

j=1

(xj + yj), and CII is the number of chaff packets found by IC-SE

in (xn, yn), except for chaff packets in7 S2 ∩ [0, ⌈∆T⌉T ). See Appendix 5.B for an

implementation of δII.

7As in the computation of CI, this adjustment is needed because packets at the beginning ofs2 may be the relays of packets transmitted before the detector starts.

168

Under H1, the optimality of IC-SE implies that the actual number of chaff

packets in the measurements is no smaller than CII. Therefore, a CTR larger

than τII is required to evade δII, i.e., δII has vanishing miss probability for all the

information flows with CTR bounded by τII a.s.

Under H0, the following theorem guarantees the false alarm probability of δII.

Theorem 16 If S1 and S2 are independent Poisson processes of maximum rate λ,

and T is large, then for any τII < c12√

λTe−λT/6, where c1 = 0.0014, the false alarm

probability of δII decays exponentially with n.


¥

By Theorem 16, δII can achieve Chernoff-consistent detection for arbitrarily

close to

αII(T )∆=

c1

2√

λTe−λT/6 (5.11)

fraction of chaff noise, and thus its consistency is at least αII(T ). Note that as

T increases, αII(T ) decays exponentially at the rate O(e−λT/6); compared with

the O(1/√

T ) decay of αI(T ), the results suggest that the consistency in Case II

decays much faster than that in Case I due to the quantization of S2. The pair

(RII(T ), αII(T )), where RII(T ) = RI(T ), gives us a pair of achievable rate and

consistency.

169

5.5.3 Case III: One-Bit Quantization, Full Side-Information

Consider the scenario when S1 is compressed by one-bit quantization, and S2 is

fully available.

This case is similar to Case I in Section 5.5.1 except that the observations are

indicators instead of the exact counts. Clearly, more information is lost after one-

bit quantization because when xj = 1, there can be one or more epochs in slot

j in s1. To overcome this difficulty, we use a backward matching, i.e., matching

epochs in s2 with nonempty slots in s1. Specifically, we develop a chaff-inserting

algorithm called “Insert Chaff: One-bit, Full side-information” (IC-OF) which

works as follows. Given (xn, s2), IC-OF:

1. match every epoch in s2 with the earliest unmatched nonempty slot within

delay ∆, as illustrated in Fig. 5.5;

2. unmatched epochs become chaff; each unmatched nonempty slot contains a

chaff packet.

s2

xn:

∆

T

0 1111

Figure 5.5: IC-OF: Backward greedy matching. Each epoch is matched to thefirst unmatched nonempty slot that is no more than ∆ earlier.

Algorithm IC-OF is the optimal chaff-inserting algorithm in Case III, as stated

in the following proposition.

170

Proposition 18 Algorithm IC-OF inserts the minimum number of chaff packets

to make an information flow generate any given observations (xn, s2) under the

quantization in Case III.


¥

Based on IC-OF, we develop the following detector. Given (xn, s2), the detector

δIII is defined as

δIII(xn, s2) =

1 if CIII/(nN1 + |S2|) ≤ τIII,

0 o.w.,

where CIII is the number of chaff packets found by IC-OF in (xn, s2), excluding

chaff packets in S2∩ [0, ∆), and N1 = − log (1 − x) for x = 1n

n∑

j=1

xj. Here N1 is the

Maximum Likelihood estimate of the mean number of epochs per slot in S1 based

on the assumption that S1 is Poisson. See Appendix 5.B for an implementation of

δIII.

Under H1, Proposition 18 guarantees that the actual number of chaff packets

is no smaller than CIII. Moreover, under the Poisson assumption, N1 converges to

the average traffic size per slot in S1 a.s. Thus the statistic CIII/(nN1 + |S2|) is

upper bounded by the actual CTR a.s. as n → ∞, implying that δIII has vanishing

miss probability for CTR bounded by τIII a.s.

Under H0, the performance of δIII is guaranteed by the following theorem.

171

Theorem 17 If S1 and S2 are independent Poisson processes of maximum rate λ,

and T is large, then for any τIII < 12e−λT , the false alarm probability of δIII decays

exponentially with n.


¥

By this theorem, we see that the consistency of δIII is lower bounded by

αIII(T )∆=

1

2e−λT . (5.12)

For S1 considered in Theorem 17, it requires a rate of

RIII(T )∆=

log 2/T if λT ≥ log 2,

h(e−λT )/T o.w.,(5.13)

to deliver Xn reliably for large n, where h(p) is the binary entropy function defined

as h(p) = −p log p−(1−p) log (1 − p). Therefore, we see that (RIII(T ), αIII(T )) is an

achievable rate-consistency pair. As T increases, αIII(T ) decays exponentially with

the exponent λ. Note that this decay is much faster than the O(1/√

T ) decay of

αI(T ), indicating that for the same slot length, one-bit quantization significantly

reduces consistency compared with slotted quantization. It, however, does not

imply that slotted quantization is better because one-bit quantizers can use a

slot length much smaller than that of slotted quantizers under the same capacity

constraints.

172

5.5.4 Case IV: One-Bit Quantization, Equal Capacity Con-

straints

Suppose that one-bit quantizers with the same slot length T are used for both S1

and S2. This case is similar to Case II in Section 5.5.2, except that the measure-

ments (xn, yn) are binary vectors instead of exact packet counts.

To match the epochs, we can still use the idea of IC-SE, but since the number of

epochs in a nonempty slot can be one or more, we assume it to be the number that

minimizes the number of chaff packets over all positive integers. The amount of

chaff noise can be computed by an algorithm called “Insert Chaff: One-bit, Equal

capacities” (IC-OE) as follows. Given (xn, yn), IC-OE inserts a chaff packet in

slot j if

xj >

j+⌈∆/T ⌉∑

k=j

yk, or yj >

j∑

k=j−⌈∆/T ⌉xk

for j = 1, . . . , n.

Algorithm IC-OE computes the minimum amount of chaff noise as stated in

the following proposition.

Proposition 19 Algorithm IC-OE inserts the minimum number of chaff pack-

ets to make an information flow mimic given binary vectors (xn, yn) under the

quantization in Case IV.


¥

173

Based on IC-OE, we develop a detector δIV as follows. Given (xn, yn), the

detector is defined as

δIV(xn, yn) =

1 if CIV/[n(N1 + N2)] ≤ τIV,

0 o.w.,

where CIV is the number of chaff packets inserted by IC-OE in (xn, yn), excluding

chaff packets in S2 ∩ [0, ⌈∆T⌉T ), and Ni (i = 1, 2) are defined as in δIII as functions

of xn and yn, respectively. See Appendix 5.B for its implementation.

Under H1, δIV has vanishing miss probability as long as the CTR is bounded

by τIV a.s. because of Proposition 19 and arguments similar to those in Section

5.5.3.

Under H0, the following theorem tells us how to choose the threshold to guar-

antee vanishing false alarm probability.

Theorem 18 If S1 and S2 are independent Poisson processes of maximum rate

λ, and T is large, then for any τIV < (1−e−λT )2λT

e−2λT , the false alarm probability of

δIV decays exponentially with n.


¥

By this theorem, the consistency of δIV is lower bounded by

αIV(T )∆=

(1 − e−λT )

2λTe−2λT . (5.14)

Thus we have an achievable rate-consistency pair (RIV(T ), αIV(T )), where RIV(T ) =

RIII(T ). The value of αIV(T ) decays exponentially with the increase of T with the

174

exponent 2λ. Comparing this result with the decay of αII(T ), we see the analysis

suggests that for the same T , the consistency under one-bit quantization decays

12 times faster than that under slotted quantization. Again, it does not mean that

slotted quantization is better because the slot lengths under different quantization

schemes are different.

5.6 Analysis and Comparison

Recall that we have taken a separated approach by breaking the distributed de-

tection process into three steps—quantization, data transmission, and detection.

In this section, we analyze the consistency of the proposed detectors and then

compare their consistency to gain insights into the quantizer design.

5.6.1 Performance Analysis

Assume that S1 and S2 are independent Poisson processes of maximum rate λ

under H0. We will analyze the consistency of the detectors proposed in Section

5.5 and give bounds on the maximum consistency in each of the four cases.

Conceptually, we can calculate the exact consistency of the proposed detectors

as follows.

Theorem 19 There exist functions α∗i (T ) (i = I, . . . , IV) such that δi has van-

ishing false alarm probability if and only if τi < α∗i (T ).


175

¥

The theorem implies that the consistency of δi (i = I, . . . , IV) is equal to α∗i (T ).

The definition of α∗i (T ) can be found in the proof. Their computation is rather

involved; instead, we resort to closed-form lower bounds that will also guarantee

vanishing false alarm probabilities, which leads to αi(T ) in (5.9, 5.11, 5.12, 5.14).

Fixing quantization schemes as in Case i (i = I, . . . , IV), we provide a converse

result in the following theorem.

Theorem 20 The level of undetectability in Case i (i = I, . . . , IV) is bounded by

φ(H0; q1, q2) ≤E[|X − Y |]

2λT,

where qj (j = 1, 2) are the quantizers in Case i, and X and Y are independent

Poisson random variables with mean λT .


¥

Note that detectors δi (i = I, . . . , IV) are not necessarily optimal because the

chaff-inserting algorithms used in these detectors only make an information flow

mimic the joint distribution of quantized processes under H0. The marginal distri-

butions are different from H0 (e.g.,the process constructed by IC-SF is not Poisson)

and can still be used to distinguish the two hypotheses. In the proof of Theorem

20, we give a method to mimic both the marginal and the joint distributions under

176

H0 and analyze the CTR of that method to obtain the upper bound on the level

of undetectability.

Combining Theorems 19 and 20 yields the following result.

Corollary 6 The maximum consistency in Case i (i = I, . . . , IV) is lower bounded

by α∗i (T ) and upper bounded by E[|X − Y |]/(2λT ), where X and Y are defined as

in Theorem 20.

The relationship of the quantities discussed so far regarding the consistency in

Case i (i = I, . . . , IV) can be summarized as follows:

αi(T ) ≤ α∗i (T ) ≤ max consistency in Case i

≤ φ(H0; q1, q2) ≤E[|X − Y |]

2λT,

where qj (j = 1, 2) are the quantizers in Case i.

5.6.2 Numerical Comparison

We now give some heuristics on quantizer design by comparing the consistency

of δi (i = I, . . . , IV) as functions of capacity constraints. Specifically, let the

capacity constraints be (R, ∞) in Cases I and III, and (R, R) in Cases II and

IV. The consistency-rate functions are computed by α∗i (R

−1i (R)) (i = I, . . . , IV).

Since the form of α∗i (T ) is not explicit, we calculate it numerically as the CTR of

the optimal chaff-inserting algorithms (i.e., IC-SF, SE, OF, OE) on independent

Poisson processes of rate λ. In addition, we compare the computed consistency-

rate functions with the upper bound u(R)∆= E[|X − Y |]/(2λT ) for T = R−1

I (R),

177

where X and Y are defined in Theorem 20 (it can be shown that the upper bound

for T = R−1III (R) is much looser, and thus this bound is omitted). For algorithmic

simplicity, we choose the range of R to guarantee that R−1i (R) ≥ ∆ (i = I, . . . , IV).

See Fig. 5.6–5.8 for plots of the consistency-rate functions under different traffic

rates (i.e., different λ).

The plots yield the following observations: i) for small λ (Fig. 5.6), the consis-

tency of δI is similar to that of δIII, and the same holds for δII and δIV; as λ increases

(Fig. 5.7, 5.8), the consistency of δI (or δII) becomes increasingly larger than that

of δIII (or δIV); at λ = 1 (Fig. 5.8), the consistency of δII exceeds that of δIII even

though δIII has full side-information; ii) at the same R, the consistency of all the

detectors decreases with the increase of λ; iii) the consistency of δI is close to the

upper bound on the maximum consistency in Case I, especially at small R.

Observation (i) clearly suggests that which quantizer to use should depend on

the traffic rate. For very light traffic, we can use the simpler one-bit quantizer to

achieve the same performance as the more complicated slotted quantizer, whereas

we should use the slotted quantizer to obtain better performance if the traffic is

not so light. The intuition behind this observation is that for very small λ, the

probability that a slot contains more than one epoch is small, and thus we will

not lose much information by further compressing the results of slotted quanti-

zation by one-bit quantization; otherwise, there is nonnegligible probability for a

nonempty slot to contain multiple epochs, and this information will be lost after

one-bit quantization, making it more difficult to distinguish the two hypotheses.

Observation (ii) says that it is more difficult to detect information flows in heavy

traffic. The intuition behind is that if we normalize the maximum delay by the

average interarrival time, the normalized maximum delay constraint λ∆ will be

178

The consistency-rate functions of δI, . . . , δIV for various traffic rates:∆ = 1; α∗

i is computed over 104 slots.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Case I Case II Case III Case IV Upper bound

R

α∗ i

α∗I

α∗II

α∗III α∗

IV

u

Figure 5.6: λ = 0.1.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6Case I Case II Case III Case IV Upper bound

R

α∗ i

α∗I

α∗II

α∗III

α∗IV

u

Figure 5.7: λ = 0.5.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.05

0.1

0.15

0.2

0.25

0.3

0.35Case I Case II Case III Case IV Upper bound

R

α∗ i

α∗I

α∗II

α∗III

α∗IV

u

Figure 5.8: λ = 1.

relatively loose for large λ, making the detection more difficult (see parallel results

in Section 3.5.2). Observation (iii) implies that the detector δI is close to the opti-

mal in Case I. Its consistency and the upper bound jointly specify the maximum

consistency under slotted quantization and full side-information.

179

5.7 Summary

In this chapter, we consider distributed detection of bounded delay flows in chaff

noise. We give a theoretical characterization of the optimal performance and then

focus on the development of practical detection systems. We are especially inter-

ested in the detection performance at extremely low rates. Our results suggest that

slotted quantization coupled with the proposed detector gives satisfactory perfor-

mance in terms of the consistency-rate tradeoff. What remains to be solved by the

future work includes tighter converse results, better lower bounds, and ultimately,

the optimal quantizers and detectors.

180

APPENDIX 5.A

PROOF OF CHAPTER 5

Proof of Theorem 14

For the converse result, since for any quantizers Qi (i = 1, 2), the consistency is

upper bounded by φ(H0; Q1, Q2), the largest of such upper bounds under the given

capacity constraints gives an upper bound on the maximum consistency, i.e., the

level of detectability.

For the achievability result, consider the detection system8 (Q∗(n)1 , Q∗(n)

2 , δ∗n),

where δ∗n is a threshold detector defined as follows. Given (xn, yn),

δ∗n(xn, yn) =

1 if CTR∗(xn, yn) ≤ τ

0 o.w.,

where

CTR∗(xn, yn)∆= min

P2

CTR(nT )

for

P2=(fi, wi)2i=1 :

1)(n)

Q∗1(x

n|f1⊕

w1) > 0,(n)

Q∗2(y

n|f2⊕

w2) > 0;

2) (f1, f2) is a realization of an information flow..

That is, CTR∗(xn, yn) is the minimum CTR over all realizations of informa-

tion flows and chaff noise, which can generate (xn, yn) after being quantized by

8Quantizers Q∗(n)i (i = 1, 2) are the marginalization of Q∗

i on [0, nT ].

181

(Q∗(n)1 , Q∗(n)

2 ). The detector δ∗n makes detection if this minimum CTR is upper

bounded by a predetermined threshold τ .

Since the statistic CTR∗(xn, yn) is a lower bound on the actual CTR in the

measurements, it is easy to see that δ∗n has vanishing miss probability as long as

the CTR is upper bounded by τ a.s.

Generally, the statistic is smaller than the CTR required to mimic H0. If,

however, Q∗i (i = 1, 2) are deterministic, then CTR∗(Xn, Yn) is the minimum CTR

for an information flow to mimic the distribution of (Xn, Yn) after quantization,

and the minimum CTR∗(Xn,Yn) under H0 is the minimum CTR to mimic the joint

distribution of the quantization results of some (S1, S2) under H0. By definition,

this is the CTR of (F∗i

⊕

W∗i )

2i=1. Note that the processes achieving CTR∗(Xn,Yn)

do not necessarily mimic the marginal distributions of Si (i = 1, 2) under H0. By

assumption, there exists a constant α∗(R1, R2) such that the CTR of (F∗i

⊕

W∗i )

2i=1

converges to α∗(R1, R2) a.s. Thus, for any τ < α∗(R1, R2), we have that

limn→∞

PF (δ∗n) = limn→∞

PrCTR∗(Xn, Yn) ≤ τ = 0.

Combining this result and the arguments on miss probability, we conclude that

the consistency of δ∗n is at least α∗(R1, R2). Therefore, α∗(R1, R2) is a lower bound

on α(R1, R2).

¥

182


First, we show that the matched pairs found by IC-SF indeed form a realization

of an information flow. Let x′n be the vector of the numbers of matched epochs

in s1, and f2 = (t1, t2, . . .) be the sequence of matched epochs in s2. We construct

a sequence f1 = (sj)j≥1 as follows. As illustrated in Fig. 5.9, for an epoch t1 in

f2 matched to the same slot, we construct an epoch s1 = t1; for an epoch t2 in

f2 matched to a previous slot, we construct s2 at the end of that slot. It is easy

to see that slotted quantization of f1 yields x′n, and (f1, f2) is a realization of an

information flow.

f1

f2

x′1 = 2 x′

2 = 1

∆ 2T0 T

s1 s2

t1 t2

Figure 5.9: Construct f1: : original epochs; •: constructed epochs.

Then we show that IC-SF is optimal. Since it is known that for given real-

izations and delay bound, BGM inserts the minimum number of chaff packets, it

only remains to show that our construction of s1 and choice of delay bound min-

imize the need of chaff. Given xn, the xj packets in slot j can be anywhere in

[(j − 1)T, jT ). By the causality and bounded delay constraints, the maximum

interval for these packets to be relayed is [(j − 1)T, jT +∆). By putting all the xj

packets at (j−1)T and allowing delays up to T +∆, we allow the matched packets

in s2 to be anywhere in the maximum interval. Thus, any other chaff-inserting

algorithm will have to insert no fewer chaff packets than IC-SF. Therefore, IC-SF

mimics (xn, s2) by inserting the minimum number of chaff packets.

183

¥

Proof of Theorem 15

Let Ck be the number of chaff packets inserted in the kth slot. Since T ≫ ∆, we

see that Ck (k = 1, 2, . . .) are approximately i.i.d. , and

Ckd= max(Y (υ, T ) − X1, 0) + max(X1 − Y (υ, T + ∆), 0),

where υ is a random variable in [0, ∆] denoting the time used in each slot to relay

packets sent in the previous slot. It is easy to see that the CTR is minimized when

the processes have equal rate because unequal rates will make υ drift towards 0 or

∆ and increase the mean of Ck. Moreover, if we prove the theorem for processes

of equal rate λ, then the result also holds for smaller rates. For example, if Si

(i = 1, 2) have rate λ′ < λ, then by the result of the theorem, we have that

PrCI/N < 1√πλ′T

− ∆4T decays exponentially, implying that PrCI/N < 1√

πλT− ∆

4T

also decays exponentially. Therefore, it suffices to consider independent Poisson

processes of equal rate λ.

We first show that Pr 1n

n∑

k=1

Ck ≤ η decays exponentially for any η <

2λTαI(T ). By Cramer’s Theorem [9], this result holds if we show that E[Ck] ≥

2λTαI(T ). Fix a value of υ in [0, ∆]. By Gaussian approximation of Poisson

random variables, we have that

Y (υ, T ) − X1 ∼ N (−λυ, λ(2T − υ)) ≈ N (−λυ, 2λT ).

184

Then

E[max(Y (υ, T ) − X1, 0)]

≈∞

∫

0

z√4πλT

e−(z+λυ)2/(4λT )dz

=

√

λT

πe−λυ2/(4T ) − λυQ

(

λυ√2λT

)

≈(

√

λT

π− 1

2λυ

)

e−λυ2/(4T )

≈√

λT

π− 1

2λυ.

Similarly, E[max(X1 − Y (υ, T + ∆), 0)] ≈√

λT/π − 12λ(∆ − υ). Therefore,

E[Ck] ≈ 2

√

λT

π− 1

2λ∆ = 2λTαI(T ).

Next, let β∆= τI/αI(T ) (β ∈ [0, 1)). A necessary condition for false alarm is

that 1n

n∑

k=1

Ck ≤ √β2λTαI(T ) or N/n ≥ (2λT )/

√β. By union bound, we have that

PF (δI) ≤ Pr 1

n

n∑

k=1

Ck ≤√

β2λTαI(T ) + PrN

n≥ 2λT√

β.

We have shown that the first term decays exponentially with n, and by Cramer’s

Theorem, the second term can be shown to decay exponentially as well. Therefore,

the overall false alarm probability decays exponentially. This completes the proof.

¥


First, we show that IC-SE indeed finds realizations of an information flow and

chaff noise such that the slotted quantization results are equal to (xn, yn). Let

185

(x′n, y′n) denote vectors of the matched numbers found by IC-SE. We will show

that (x′n, y′n) is the result of slotted quantization of a pair of sequences (f1, f2)

which is a realization of an information flow. As illustrated in Fig. 5.10, for T ≥ ∆,

we construct f1 as x′j (j ≥ 1) epochs at the end of slot j and f2 as y′

j, 1 epochs at the

beginning and y′j, 2 epochs at the end of slot j, where y′

j, 1 is the number of epochs

out of y′j which are matched to the (j − 1)th slot, and y′

j, 2 is the number of epochs

matched to the jth slot (we have y′j = y′

j, 1 +y′j, 2). Such construction preserves the

quantization results, and (f1, f2) forms a realization of an information flow. For

T < ∆, the construction of fi (i = 1, 2) is the same except that for f2, y′j, 1 is the

number of epochs matched to slots before the jth slot, and y′j, 2 is the number of

epochs matched to the jth slot.

f1

f2

x′1 = 2 x′

2 = 3

y′1, 1 = 0

y′1, 2 = 1

y′2, 1 = 1

y′2, 2 = 3

y′3, 1 = 0

0 T 2T

Figure 5.10: Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found byIC-SE guarantees that x′

j = y′j, 2 + y′

j+1, 1.

Next, we show that IC-SE is optimal. Due to the constraints of causality

and bounded delay, a packet in slot j can only be matched to packets from slots

j, . . . , j + ⌈∆T⌉, and IC-SE allows all such matches. Combining this argument with

the fact that BGM is optimal yields the optimality of IC-SE.

¥

186

Proof of Theorem 16

By arguments parallel to those in the proof of Theorem 15, we only need to consider

independent Poisson processes of equal rate λ. Following the idea of that proof,

we will prove Theorem 16 if we show that PrCII/n ≤ η decays exponentially for

any η < 2λTαII(T ).

In δII, no matter how large T is (relative to ∆), the numbers of chaff packets

in consecutive slots are still correlated. If, however, we run δII only on every other

slot, and let C2i (i = 1, 2, . . .) be the number of chaff packets inserted in the (2i)th

slot, then C2, C4, C6, . . . will be i.i.d. . Obviously9, CII ≥n/2∑

i=1

C2i. Then we have

PrCII/n ≤ η ≤ Pr 2

n

n/2∑

i=1

C2i ≤ 2η.

By Cramer’s Theorem, we can prove the exponential decay if we show that E[C2] ≥

4λTαII(T ). Since

E[C2] = E[max(Y2 − X1 − X2, 0) + max(X2 − Y2 − Y3, 0)],

and (Y2 − X1 − X2), (X2 − Y2 − Y3) ∼ N (−λT, 3λT ) for large T , we have

1

2E[C2] = E[max(Y2 − X1 − X2, 0)]

≈∫ ∞

0

z√6πλT

e−(z+λT )2/(6λT )dz

=

√

3λT

2πe−λT/6 − λTQ

(

√

λT

3

)

≈ c1

√λTe−λT/6 = 2λTαII(T ), (5.15)

where c1 = 0.0014, and (5.15) is obtained by the following approximation of Q(·)

in [25]

Q(x) ≈ e−x2/2

1.64x +√

0.76x2 + 4.

9Note that CII is not exactly equal to the total number of chaff packets found by IC-SE, butthe difference becomes negligible for large n.

187

This completes the proof.

¥


By the construction in the proof of Proposition 16, we can construct an information

flow based on the matching found by IC-III. Since IC-III is a backward bounded

greedy match, its optimality can be proved by arguments parallel to those proving

the optimality of BGM; see [5].

¥

Proof of Theorem 17

By arguments similar to those in the proof of Theorem 15, we only need to consider

independent Poisson processes of equal rate λ. Let Ck be the number of chaff

packets inserted in the kth slot. Note that if chaff is inserted in S1, then only one

chaff packet is needed in each slot. For large T , C1, . . . , Cn are approximately i.i.d.

.

First, we show that Pr 1n

n∑

k=1

Ck ≤ η decays exponentially for any η <

2λTαIII(T ). By Cramer’s Theorem, it is reduced to showing that E[Ck] ≥

2λTαIII(T ). Then note that if Xk = 0 (with probability e−λT ), then all the epochs

in [(k − 1)T + ∆, kT ) in S2 will be chaff; if Xk = 1 and Y ((k − 1)T, kT + ∆) = 0,

188

then there will be at least one chaff packet in S1 in slot k. Thus,

E[Ck] ≥ e−λT λ(T − ∆) + (1 − e−λT )e−λ(T+∆) ≈ 2λTαIII(T ).

Next, since Xk’s are i.i.d. , X converges to the true mean 1−e−λT exponentially

(by Cramer’s Theorem), and thus N1 converges to λT exponentially. Following the

arguments in the proof of Theorem 15 leads to the conclusion that PF (δIII) decays

exponentially for any τIII < αIII(T ).

¥


If IC-OE inserts a chaff packet in slot j, then we must have either xj = 1 and

yk = 0 (k = j, . . . , j + ⌈∆/T ⌉) or yj = 1 and xk = 0 (k = j −⌈∆/T ⌉, . . . , j). That

is, there is either a nonempty slot in s1 such that all the slots that can relay its

epochs are empty, or a nonempty slot in s2 such that all the slots that can generate

a relay packet in that slot are empty. Thus, any other chaff-inserting algorithm

will have to insert a chaff packet in slot j as well.

To find a realization of an information flow, we use the following variation of

BGM. For every xj = 1, we match it with the first yk = 1 for k ∈ j, . . . , j +

⌈∆/T ⌉. Let xj (or yj) be the number of times that xj (or yj) is matched. Then

we can construct a realization of an information flow based on (xn, yn) as in the

proof of Proposition 17. This realization plus the chaff noise inserted by IC-OE

generates (xn, yn) after one-bit quantization. This completes the proof.

¥

189

Proof of Theorem 18

It suffices to consider independent Poisson processes of equal rate λ. Let Ti (i ≥ 1)

denote the number of slots after the (i−1)th chaff packet until the ith chaff packet

is inserted (including the slot with the ith chaff packet). Then we can characterize

the number of chaff packets using Ti as

CIV

n≤ τ

d=

1

nτ

nτ∑

i=1

Ti ≥1

τ

. (5.16)

We will be able to bound the probability of this event if we know the joint dis-

tribution of Ti’s. Unfortunately, it is difficult to characterize the joint distribution.

If, however, we only consider even slots, and let Ti be the number of slots between

chaff packets, then Ti’s are i.i.d. and Ti ≤ Ti. We claim that Tid= 2Z, where Z

has the geometric distribution

PrZ = n = (1 − ρ)n−1ρ, n ≥ 1,

in which ρ = 2e−2λT (1 − e−λT ). To understand this result, note that the event

∃ chaff in slot 2i is equivalent to

A2i∆= X2i−1 + X2i < Y2i, or X2i > Y2i + Y2i+1,

which has probability ρ, and is i.i.d. for i = 1, 2, . . .. Then Z is the number of slot

pairs until the event A2i occurs, i.e., Zd= infi ≥ 1 : A2i occurs .

By (5.16), we have that

PrCIV/n ≤ τ = Pr 1

n

n∑

i=1

Ti ≥1

τ

≤ Pr 1

n

n∑

i=1

Ti ≥1

τ,

190

where n = nτ . By Cramer’s Theorem, we can be show that PrCIV/n ≤ τ

decays exponentially for any τ < 2λTαIV(T ) if E[Ti] ≤ 1/(2λTαIV(T )). Since

E[Ti] = 2E[Z] = 2/ρ, and 2λTαIV(T ) = e−2λT (1 − e−λT ) = ρ/2, we have that

E[Ti] = 1/(2λTαIV(T )).

This result coupled with arguments similar to those in the proof of Theorem

17 completes the proof.

¥

Proof of Theorem 19

By arguments similar to those in the proof of Theorem 14, we can prove the results

of Theorem 19 if the minimum statistics in δi (i = I, . . . , IV) converge a.s. under

H0. The limits give definitions to α∗i (T ) (i = I, . . . , IV). Since equal rate Poisson

processes of the maximum rate λ generates the minimum statistics (see proofs of

Theorems 15, 16, 17, and 18), it suffices to consider this case.

Case I

In Case I, it suffices to show that the CTR of IC-SF converges a.s. Let the kth

interarrival time in S2 be

Vk∆= S2(k) − S2(k − 1), k ≥ 1.

Let Zj (j ≥ 0) be the starting time for finding matches in the jth slot of S2 (Z0 = 0

by definition). For each Xj (j ≥ 1), IC-SF matches the Xj reconstructed epochs

with epochs in S2. Let Kj (j ≥ 1) be the index of the last epoch in S2 that is

191

matched or assigned as chaff after the Xj epochs are matched, and K0 = 0. Then

Zj (j ≥ 1) satisfies the following recursion

Zj = min(max(Zj−1 + VKj−1+1 +

Kj−1+Xj∑

k=Kj−1+2

Vk − T, 0), ∆),

where VKj−1+1 = S2(Kj−1 + 1) − (j − 1)T − Zj−1 is a truncated inerarrival time.

Since Vk’s are i.i.d. exponential random variables, by the memoryless property of

exponential distribution, we have that Zj∞j=0 is a random walk with reflecting

barriers at 0 and ∆, and the step distribution is equal to the distribution ofX1∑

k=1

Vk−

T, where X1 is a Poisson variable with mean λT , and Vk (k ≥ 1) are i.i.d. and

independent of X1, distributed by exponential distribution with mean 1/λ. It is

easy to check that Zj∞j=0 is an ergodic process, and thus its limiting distribution

exists.

Let Cj (j ≥ 1) denote the number of chaff packets in slot j. Then we have that

Cj = max(Y ((j − 1)T + Zj−1, jT ) − Xj,

Xj − Y ((j − 1)T + Zj−1, jT + ∆), 0).

It is clear that Cj’s only depend on each other through Zj’s. Now that Zj∞j=0 is

ergodic, (n∑

j=1

Cj)/n converges a.s. Since we know that the average traffic size per

slot converges a.s., their ratio, which is equal to the CTR, converges a.s. The limit

is given by

α∗I (T )

∆=

E[max(Y (Z, T ) − X1, X1 − Y (Z, T + ∆), 0)]

2λT,

where Z is a random variable with the limiting distribution of Zj∞j=0.

192

Case II

In Case II, it suffices to show that the CTR of IC-SE converges a.s. Consider the

case T ≥ ∆ for simplicity; the proof can be easily modified for T < ∆.

Let Zj (j ≥ 0) be the number of packets in slot j in S2 which are matched with

packets before slot j in S1 (Z0 = 0 by definition). It can be shown that Zj satisfies

the following recursion

Zj+1 = min(max(Zj + Xj − Yj, 0), Yj+1).

Note that Zj∞j=0 is not Markovian because given Zj, Zj+1 still depends on Zj−1

through Yj. We can solve this issue by including Yj in the state. Specifically, it

can be shown that (Zj, Yj)∞j=0 is a Markov chain (note that it is not a random

walk) and is ergodic.

If Cj (j ≥ 1) denotes the number of chaff packets in slot j, we have that

Cj = max(Yj − Zj − Xj, Xj − Yj + Zj − Yj+1, 0).

Since (Zj, Yj)∞j=0 is ergodic, and Xj’s are i.i.d., we see that (n∑

j=1

Cj)/n converges

a.s., implying that the CTR converges a.s. The limiting CTR can be computed by

α∗II(T )

∆=

max(Y − Z − X1, X1 − Y + Z − Y1, 0)

2λT,

where (Z, Y ) is distributed by the limiting distribution of (Zj, Yj)∞j=0, and X1 and

Y1 are independent Poisson variables with mean λT , which are also independent

with (Z, Y ).

193

Case III

In Case III, we need to show that the CTR of IC-OF converges a.s. Let Uj (j ≥ 0)

be the starting time for finding matches in the (j + 1)th slot in S2 if Xj+1 = 0

(define U0 = ∆); similarly, let Lj (j ≥ 0) be the starting time if Xj+1 = 1 (define

L0 = 0). Then it can be shown that Uj and Lj satisfy the following recursions

Uj =

max(Uj−1, jT ) if Xj = 0,

jT + ∆ o.w.

Lj =

max(Lj−1, jT ) if Xj = 0,

S2(K) if Xj = 1,

S2(K) ≥ jT + ∆,

max(S2(K + 1), jT ) o.w.,

where K∆= infk : S2(k) ≥ Lj−1. It is easy to see that (Uj, Lj)∞j=0 is a Markov

process. Moreover, it can be shown that the process (Uj − jT, Lj − jT )∞j=0 is

ergodic.

Let Cj (j ≥ 1) be the number of chaff packets in slot j. Then we have

Cj =

Y (Uj−1, jT ) if Xj = 0,

IY (Lj−1, jT+∆)=0 o.w.

By the ergodicity of (Uj −jT, Lj −jT )∞j=0 and the (time) homogeneous property

of Poisson processes, one can show that (n∑

j=1

Cj)/n converges a.s., and so does the

CTR. The limit is given by

α∗III(T )

∆=

E[Y (U, T )]e−λ1T + PrY (L, T + ∆) = 0(1 − e−λ1T )

2λT,

where (U, L) is distributed according to the limiting distribution of (Uj−jT, Lj−

jT )∞j=0.

194

Case IV

In Case IV, it suffices to show that the CTR of IC-OE converges a.s. It is easy

to check that the process (Xj−1, Xj, Yj, Yj+1)∞j=1 (where X0 ≡ 1) is an ergodic

Markov chain. Since the number of chaff packets in slot j is computed by

Cj = max(Yj − Xj − Xj−1, Xj − Yj − Yj+1, 0),

we see that (n∑

j=1

Cj)/n converges a.s. Therefore, the CTR converges a.s. The limit

is given by

α∗IV(T )

∆=

max(Y − X − X1, X − Y − Y1, 0)

2λT,

where (X1, X, Y, Y1) has the limiting distribution of (Xj−1, Xj, Yj, Yj+1)∞j=1.

¥

Proof of Theorem 20

Let “id” denote the identity function. Since the quantization in Cases II–IV can be

viewed as further compression of the quantization in Case I, it suffices to prove the

bound for Case I, i.e., φ(H0; γ, id) ≤ E[|X − Y |]/(2λT ) for independent Poisson

random variables X, Y with mean λT .

Consider the following method to mimic H0 in Case I. Given (xn, s2), construct

a realization s1 of a point process (over n slots) as follows. For j = 1, . . . , n, if

xj ≤ y((j−1)T, jT ), then randomly select xj epochs from the epochs in the jth slot

of s2; otherwise, select the epochs in the jth slot of s2, and select xj−y((j−1)T, jT )

more epochs randomly (i.i.d. uniformly) from [(j − 1)T, jT ). The overall method

is the following:

195

1. generate a Poisson process S2 of rate λ, and i.i.d. Poisson random variables

Xj (j ≥ 1) of mean λT , which are independent of S2;

2. construct a process S1 as described above;

3. use BGM with delay bound ∆ to decompose (S1, S2) into an information

flow and chaff noise.

The traffic containing an information flow generated by this method is identical

with traffic under H0 (independent Poisson processes of equal rate λ, to be specific)

both marginally and jointly after quantization. Therefore, its CTR will give an

upper bound on φ(H0; γ, id), the minimum CTR to mimic H0.

The rest of the proof follows by observing that the construction of S1 guarantees

at least min(Xj, Yj) (j ≥ 1) pairs of epochs can be matched in slot j (BGM may

find even more matches), which implies that the average number of chaff packets

per slot is upper bounded by E[|X1 − Y1|]. Thus, the CTR of the generated traffic

is no more than E[|X1 − Y1|]/(2λT ).

¥

196

APPENDIX 5.B


The pseudo code implementation of δI is presented in Table 5.1. In this imple-

mentation, δI uses BGM with delay bound T + ∆ to compute the number of chaff

packets in (s1, s2), denoted by CI. Then it makes detection if the fraction of chaff

packets is upper bounded by a threshold τI.

Table 5.1: Detector for Case I.δI(x

n, s2, ∆, τI):i = 1;CI = 0;for k = 1 : n

if s2(i + xk − 1) < kTCI = CI + |S2 ∩ [max(s2(i + xk), ∆), kT )|;i = infj : s2(j) ≥ kT;

else if s2(i + xk − 1) > kT + ∆CI = CI + xk − |S2 ∩ [s2(i), kT + ∆)|;i = infj : s2(j) ≥ kT + ∆;

elsei = i + xk;

endend

end

return

H1 if CI/(n∑

k=1

xk + |S2|) ≤ τI,

H0 o.w.;

Detector δII for T ≥ ∆ is implemented in Table 5.2. This implementation can be

generalized for arbitrary T . In this implementation, δII computes CII, the number

of chaff packets inserted by BGM in the batched processes (s1, s2) constructed

from (xn, yn) with delay bound T (generally, the delay bound should be ⌈∆T⌉T ).

Then it returns H1 if the fraction of chaff packets is bounded by a given threshold

τII.

197

Table 5.2: Detector for Case II.δII(x

n, yn, ∆, τII):i = max(0, y1 − x1);CII = 0;for k = 1 : n

if xk < yk − iCII = CII + yk − i − xk;i = 0;

else if xk > yk − i + yk+1

CII = CII + xk − yk + i − yk+1;i = yk+1;

elsei = i + xk − yk;

endend

end

return

H1 if CII/(n∑

k=1

(xk + yk)) ≤ τII,

H0 o.w.;

The implementation of δIII is presented in Table 5.3. Detector δIII computes

the number of chaff packets CIII inserted by BGM in (s1, s2) with delay bound

T + ∆. If xk = 1, then we assume that the kth slot contains the number of epochs

that minimizes the number of chaff packets among all positive integers. Then δIII

estimates the total number of packets and returns H1 if the estimated fraction of

chaff packets is bounded by τIII.

An implementation of δIV for the case T ≥ ∆ is presented in Table 5.4. This im-

plementation can be easily amended for other values of T . In the implementation,

δIV uses a variable CIV to count the number of chaff packets inserted by IC-OE,

estimates the total traffic size, and then reports H1 if their ratio is bounded by τIV.

198

Table 5.3: Detector for Case III.δIII(x

n, s2, ∆, τIII):v = 0;u = ∆;CIII = 0;for k = 1 : n

if xk == 0if y(u, kT ) > 0

CIII = CIII + y(u, kT );endv = max(v, kT );u = max(u, kT );

else if y(v, kT + ∆) == 0CIII = CIII + 1;

endj′ = infj : s2(j) ≥ v;if s2(j

′) < kT + ∆v = max(s2(j

′ + 1), kT );else

v = s2(j′);

endu = kT + ∆;

endend

N = |S2| − n log (1 − 1n

n∑

k=1

xk);

return

H1 if CIII/N ≤ τIII,H0 o.w.;

199

Table 5.4: Detector for Case IV.δIV(xn, yn, ∆, τIV):

CIV = 0;x0 = 1;for k = 1 : n

if (xk > yk + yk+1) or (yk > xk−1 + xk)CIV = CIV + 1;

endend

N = −n(log (1 − 1n

n∑

k=1

xk) + log (1 − 1n

n∑

k=1

yk));

return

H1 if CIV/N ≤ τIV,H0 o.w.;

200

Chapter 6

Conclusions

In this dissertation, we investigate statistical inference in sensor networks and

general ad hoc networks when there is no or incomplete parametric description of

the underlying distributions.

In Chapter 2, we considered the problem of detecting unknown changes in the

unknown distribution of alarmed sensors in a randomly deployed sensor field. We

proposed a threshold detector based on the distance between the empirical dis-

tributions in two data collections and provided an estimate of the set with the

maximum change by the set with the maximum change in the empirical distri-

butions. By applying the Vapnik-Chervonenkis Theory, we derived exponential

upper bounds on detection error probabilities and proved the consistency of the

estimator under certain regularity conditions for arbitrary distributions. We also

developed several practical algorithms to implement the detector and the estimator

efficiently. Specifically, we simplified the search in infinitely many sets to a search

in a finite number of sets defined by sample points and developed polynomial-time

algorithms for regular sets such as disks, rectangles, and stripes. Comparison of

their complexity and performance suggests that prior knowledge about the changes

allows us to design searching sets to fit the changed sets and therefore significantly

improve the performance.

In Chapter 3, we considered the problem of detecting information flows by

timing analysis when there is no chaff noise in the measurements. We modelled in-

formation flows by constraints such as causality, packet conservation, and bounded

delay or bounded memory. While the bounded delay condition is only applicable to

201

interactive information flows, the bounded memory condition is always satisfied in

sensor networks due to limited memory size per sensor. We proposed a matching-

based algorithm under the bounded delay model and a rank-based algorithm under

the bounded memory model. We showed that the algorithms have zero miss de-

tection and exponentially decaying false alarm probabilities if independent traffic

can be modelled as Poisson processes. A comparison of error exponents and sim-

ulations both show that the proposed algorithms outperform existing algorithms.

Comparison between the proposed algorithms suggests that it is easier to detect

information flows with bounded delay than with bounded memory if the traffic

rate is sufficiently low and vice versa. Since pairwise detection already yields suf-

ficiently good performance, we can safely decompose the detection of multi-hop

flows into subproblems of detecting 2-hop flows for every pairs.

In Chapter 4, we generalized the detection of information flows to allow chaff

noise in the measurements. The insertion of chaff noise makes it impossible to

detect information flows when the mixture of information flows and chaff noise

becomes statistically independent. We used the minimum fraction of chaff noise

required to mimic independent traffic to characterize the level of detectability of

information flows. Optimal chaff-inserting algorithms are developed to compute

the minimum fraction of chaff, and threshold detectors based on these algorithms

are proposed to achieve Chernoff-consistent detection in the presence of chaff noise.

Our analysis shows that pairwise detection can be easily defeated by a relatively

small amount of chaff noise. Thus, unlike the case of no chaff noise, pairwise detec-

tion alone can no longer provide satisfactory performance. To solve this problem,

we extend the scope of detector to multiple hops. Such extension significantly

improves the robustness against chaff noise. In particular, for Poisson null hy-

pothesis, the fraction of chaff noise for which Chernoff-consistent detection can be

202

achieved converges to one as the number of hops increases, implying that it is al-

most impossible to hide arbitrarily long paths. Although Poisson assumption has

been made under the null hypothesis to facilitate analysis, we showed both theo-

retically and experimentally that independent traffic in practice is even easier to

distinguish from information flows, implying that our results for Poisson processes

provide lower bounds on the detection performance of practical information flows.

In Chapter 5, we further extended the detection of information flows to the

scenario where there are capacity constraints in data collection. In this chapter,

we focus on bounded delay information flows through a pair of nodes. Still mea-

suring performance by the maximum fraction of chaff noise for Chernoff-consistent

detection, we extended the definitions in Chapter 4 to incorporate quantization

performed at the eavesdroppers. The minimum fraction of chaff noise required

to mimic both the marginal distributions at the eavesdroppers and the joint dis-

tribution of quantized measurements at the fusion center gives an upper bound

on the level of detectability as a function of the capacity constraints. Although

the optimal performance remains unknown, we designed practical detection sys-

tems to give achievable lower bounds. The detection systems consist of simple

slot-based quantizers and threshold detectors based on the optimal chaff-inserting

algorithms for quantized measurements. Specifically, we proposed a slotted quan-

tizer which quantizes transmission epochs to numbers of epochs in each slot and

a one-bit quantizer which further compresses the results of slotted quantization

to binary indicators of empty or nonempty slots. For each quantizer, linear-time

algorithms are developed to implement the detector both with and without full

side-information. Numerical comparison of the performance of the proposed de-

tection systems for Poisson processes shows that the two types of quantization

schemes have similar performance at low traffic rate, but slotted quantization be-

203

comes increasingly advantageous as traffic rate increases. This result combined

with previous results in [31] suggests that slotted quantization is a reasonably

good method to compress Poisson processes.

The change detection and estimation problem in Chapter 2 is purely non-

parametric. The information flow detection problem in Chapter 3–5 is partially

nonparametric because no parametric assumption is made for information flows,

but distributions under the null hypothesis are assumed to be known (indepen-

dent Poisson processes in the analysis). Moreover, in Chapter 5, the processes are

assumed to have the same marginal distributions under both hypotheses.

6.1 Publications

The following is a list of journal publications/submissions that contain parts of

this thesis.

• T. He, S. Ben-David, and L. Tong, “Nonparametric Change Detection and

Estimation in Large-Scale Sensor Networks,” IEEE Transactions on Signal

Processing, vol. 54, no. 4, pp. 1204–1217, April 2006.

• T. He and L. Tong, “Detecting Encrypted Stepping-Stone Connections,”

IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1612–1623,

May 2007.

• T. He and L. Tong, “Detection of Information Flows,” submitted to IEEE

Transactions on Information Theory, 2007.

• T. He and L. Tong, “Distributed Detection of Information Flows,” submitted

to IEEE Transactions on Information Theory, 2007.

204

6.2 Future Directions

The advantage of nonparametric techniques over their parametric counterparts lies

in that they provide reasonable performance without specific parametric knowledge

about the actual distributions. Therefore, it is crucial that in partially nonpara-

metric techniques such as those for information flow detection, the parametric

assumptions are generally satisfied in applications of practical interest. Although

we have shown by analytical arguments and some experimental data that the pro-

posed detectors will probably have even better performance on real traffic, it is

desirable to verify the statement by more extensive study and experiments with

actual traces. Moreover, since most of the related experimental work has been

done in the context of internet, it is of interest to implement the detection schemes

in wireless networks, especially wireless sensor networks, to investigate the oppor-

tunities and challenges present in these contexts.

205

BIBLIOGRAPHY

[1] R. Ahlswede and I. Csiszar. Hypothesis testing with communication con-straints. Information Theory, IEEE Transactions on, 32(4):533–542, 1986.

[2] S. Ben-David, J. Gehrke, and D. Kifer. Detecting Change in Data Streams.In Proc. 2004 VLDB Conference, Toronto, Canada, 2004.

[3] S. Ben-David, T. He, and L. Tong. Non-Parametric Approach to ChangeDetection and Estimation in Large Scale Sensor Networks. In Proceedings ofthe 2004 Conference on Information Sciences and Systems, Princeton, NJ,March 2004.

[4] D. P. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1992.

[5] A. Blum, D. Song, and S. Venkataraman. Detection of Interactive Step-ping Stones: Algorithms and Confidence Bounds. In Conference of RecentAdvance in Intrusion Detection (RAID), Sophia Antipolis, French Riviera,France, September 2004.

[6] O. Bousquet, U. V. Luxburg, and G. R atsch. Advanced Lectures on MachineLearning. Springer, Heidelberg, Germany, 2004.

[7] T. Cover and J. Thomas. Elements of Information Theory. John Wiley &Sons, Inc., 1991.

[8] D.R. Cox and H.D. Miller. The Theory of Stochastic Processes. John Wiley& Sons Inc., New York, 1965.

[9] Frank den Hollander. Large Deviations (Fields Institute Monographs, 14).American Mathematical Society, 2000.

[10] J. Deng, R. Han, and S. Mishra. Intrusion tolerance and anti-traffic analysisstrategies for wireless sensor networks. In IEEE International Conferenceon Dependable Systems and Networks (DSN), pages 594–603, Florence, Italy,June 2004.

[11] D. Donoho, A.G. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford.Multiscale stepping-stone detection: Detecting pairs of jittered interactivestreams by exploiting maximum tolerable delay. In 5th International Sympo-sium on Recent Advances in Intrusion Detection, Lecture Notes in ComputerScience 2516, 2002.

206

[12] N. Ferguson and B. Schneier. Practical Cryptography. John Wiley & Sons,Inc., Indianapolis,IN, 2003.

[13] J. D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference. Mar-cel Dekker, 2003.

[14] J. Giles and B. Hajek. An Information-Theoretic and Game-Theoretic Studyof Timing Channels. IEEE Transactions on Information Theory, 48(9):2455–2477, September 2002.

[15] Piyush Gupta and P. R. Kumar. The capacity of wireless networks. IEEETrans. Inform. Theory, 46(2):388–404, March 2000.

[16] Te Sun Han and S. Amari. Statistical inference under multiterminal datacompression. IEEE Trans. Inform. Theory, 44(6):2300–2324, Oct. 1998.

[17] T. He and L. Tong. On A-distance and Relative A-distance.Technical Report ACSP-TR-08-04-02, Cornell University, August 2004.http://acsp.ece.cornell.edu/pubR.html.

[18] T. He and L. Tong. An Almost Surely Complete Subset of PlanarDisks. Technical Report ACSP-TR-04-05-01, Cornell University, April 2005.http://acsp.ece.cornell.edu/pubR.html.

[19] T. He and L. Tong. Detecting Encrypted Stepping-Stone Connections. IEEETransactions on Signal Processing, 55(5):1612–1623, May 2007.

[20] T. He, L. Tong, and A. Swami. Nonparametric Change Estimation in 2DRandom Fields. In Proc. of IEEE MILCOM’05, Atlantic City, NJ, October2005.

[21] T. Hettmansperger and M. Keenan. Tailweight, Statistical Inference, andFamilies of Distributions - A Brief Survey. Statistical Distributions in Scien-tific Work, 1:161–172, 1980.

[22] Myles Hollander and Douglas A. Wolfe. Nonparametric Statistical Methods.Wiley Interscience, 1973.

[23] X. Hong, P. Wang, J. Kong, Q. Zheng, and J. Liu. Effective ProbabilisticApproach Protecting Sensor Traffic. In Military Communications Conference,2005, pages 1–7, Atlantic City, NJ, Oct. 2005.

207

[24] Y. Hong and A. Scaglione. Distributed change detection in large scale sensornetworks through the synchronization of pulse-coupled oscillators. In Proc.Intl. Conf. Acoust., Speech, and Signal Processing, pages 869 – 872, Montreal,Canada, May 2004.

[25] N. Kingsbury. Approximation formulae for the Gaussian error in-tegral Q(x). Technical Report m11067, Connexions, June 2005.http://cnx.org/content/m11067/latest/.

[26] D. Kotz and K. Essien. Analysis of a campus-wide wireless network. ACMWireless Networks Journal, 11(1-2):115–133, Jan. 2005.

[27] N. Patwari, A. O. Hero, and B. M. Sadler. Hierarchical censoring sensors forchange detection. In 2003 IEEE Workshop on Statistical Signal Processing,pages 21–24, St. Louis, MO, September 2003.

[28] V. Paxson and S. Floyd. Wide-Area Traffic: The Failure of Poisson Modeling.IEEE/ACM Transactions on Networking, 3(3):226–244, June 1995.

[29] P. Peng, P. Ning, D.S. Reeves, and X. Wang. Active Timing-Based Correla-tion of Perturbed Traffic Flows with Chaff Packets. In Proc. 25th IEEE In-ternational Conference on Distributed Computing Systems Workshops, pages107–113, Columbus, OH, June 2005.

[30] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag, New York, 1994.

[31] I. Rubin. Information Rates and Data-Compression Schemes for Poisson Pro-cesses. IEEE Transactions on Information Theory, 20(2):200–210, March1974.

[32] J. Shao. Mathematical Statistics. Springer, 1999.

[33] David J. Sheskin. Handbook of Parametric and Nonparametric StatisticalProcedures. Chapman & Hall/CRC, 2004. 3rd Ed.

[34] S. Staniford-Chen and L.T. Heberlein. Holding intruders accountable on theinternet. In Proc. the 1995 IEEE Symposium on Security and Privacy, pages39–49, Oakland, CA, May 1995.

[35] D. Tang and M. Baker. Analysis of a local-area wireless network. In MOBI-COM, pages 1–10, Boston, MA, Aug. 2000.

208

[36] L. Tong, Q. Zhao, and S. Adireddy. Sensor Networks with Mobile Agents. InProc. 2003 Intl. Symp. Military Communications, Boston, MA, Oct. 2003.

[37] John N. Tsitsiklis. Decentralized Detection. Advances in Statistical SignalProcessing, 2:297–344, 1993.

[38] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,New York, NY, 1995.

[39] V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York,NY, 1998.

[40] V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of rela-tive frequencie of events to their probabilities. Theory of Probability and itsApplications, 16:264–280, 1971.

[41] P. Venkitasubramaniam, T. He, and L. Tong. Anonymous Networking amidstEavesdroppers. Submitted to IEEE Transactions on Information Theory: Spe-cial Issue on Information-Thoeretic Security, Feb. 2007.

[42] S. Verdu. The Exponential Distribution in Information Theory. Problems ofInformation Transmission, 32(1):86–95, 1996.

[43] X. Wang. The loop fallacy and serialization in tracing intrusion connectionsthrough stepping stones. In Proc. of the 2004 ACM Symposium on AppliedComputing, pages 404–411, Nicosia, Cyprus, March 2004.

[44] X. Wang and D. Reeves. Robust correlation of encrypted attack traffic throughstepping stones by manipulation of inter-packet delays. In Proc. of the 2003ACM Conference on Computer and Communications Security, pages 20–29,2003.

[45] X. Wang, D. Reeves, and S. Wu. Inter-packet delay-based correlation for trac-ing encrypted connections through stepping stones. In 7th European Sympo-sium on Research in Computer Security, Lecture Notes in Computer Science2502, pages 244–263, 2002.

[46] X. Wang, D. Reeves, S. Wu, and J. Yuill. Sleepy watermark tracing: Anactive network-based intrusion response framework. In Proc. of the 16th In-ternational Information Security Conference, pages 369–384, 2001.

209

[47] R. S. Wenocur and R. M. Dudley. Some Special Vapnik-Chervonenkis Classes.Discrete Mathematics, 33:313–318, 1981.

[48] K. Yoda and H. Etoh. Finding a connection chain for tracing intruders. In6th European Symposium on Research in Computer Security, Lecture Notesin Computer Science 1895, Toulouse, France, October 2000.

[49] L. Zhang, A.G. Persaud, A. Johson, and Y. Guan. Stepping Stone AttackAttribution in Non-cooperative IP Networks. In Proc. of the 25th IEEE Inter-national Performance Computing and Communications Conference (IPCCC2006), Phoenix, AZ, April 2006.

[50] Y. Zhang, W. Lee, and Y. Huang. Intrusion detection techniques for mobilewireless networks. ACM Wireless Networks Journal, 9(5):545–556, Sept. 2003.

[51] Y. Zhang and V. Paxson. Detecting stepping stones. In Proc. the 9th USENIXSecurity Symposium, pages 171–184, August 2000.

[52] Y. Zhu, X. Fu, B. Graham, R.Bettati, and W. Zhao. On flow correlationattacks and countermeasures in mix networks. In Proceedings of Privacy En-hancing Technologies workshop, May 26-28 2004.

210