NONPARAMETRIC AND PARTIALLY
NONPARAMETRIC STATISTICAL INFERENCE IN
WIRELESS SENSOR NETWORKS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Ting He
August 2007
© 2007 Ting He
ALL RIGHTS RESERVED
NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL
INFERENCE IN WIRELESS SENSOR NETWORKS
Ting He, Ph.D.
Cornell University 2007
Statistical inference has been extensively studied in a parametric framework. In
wireless sensor networks, increased concerns about performance under unknown
conditions have urged researchers to reconsider many parametric assumptions that
have been widely accepted before. This thesis aims at solving selected statistical
inference problems arising in wireless sensor networks while minimizing parametric
assumptions about the underlying distributions.
The main problem considered in the thesis is the detection of information
flows based on timing information. The problem is to detect on-going flows of
information-carrying packets in a multi-hop network by measuring node trans-
mission epochs. This problem is formulated as a testing against correlated point
processes, where the correlation is modelled by constraints on relaying packets
such as causality, packet conservation, etc. The problem is divided into three
subproblems: centralized detection without chaff noise, centralized detection with
chaff noise, and distributed detection with chaff noise. For flows without chaff
noise, linear-time detection algorithms are proposed which are shown to outper-
form existing detectors in error exponents. For flows with chaff noise, it is shown
that there exists a threshold on the fraction of chaff noise such that the flows
are undetectable for noise level above the threshold and detectable with vanish-
ing error probabilities otherwise. The value of the threshold is characterized both
analytically and algorithmically. Optimal detectors that can tolerate the max-
imum amount of chaff noise are developed. The problem is then extended to
distributed detection where there are capacity constraints in data collection. Joint
quantization-detection schemes are developed and analyzed to give achievability
results.
The other problem considered in the thesis is the detection and estimation of
changes in large random sensor fields. The problem is formulated as the detection
of changes in the geographical distribution of alarmed sensors and the estimation
of the set with the maximum change. A nonparametric detector is proposed based
on the largest change in the empirical distributions. Exponentially decaying up-
per bounds on the error probabilities are derived by statistical learning theory.
Polynomial-time algorithms are developed to implement the detector, which also
give consistent estimation results.
BIOGRAPHICAL SKETCH
Ting He was born in 1980 in Urumqi, Xinjiang, in northwest China. She received
her BS degree in Computer Science and Technology from Peking University, China,
in 2003 as a top 2% student in her batch. Since then, she has been in the M.S/Ph.D
program in the School of Electrical and Computer Engineering at Cornell Univer-
sity. Ting joined Adaptive Communications & Signal Processing Group (ACSP)
under the supervision of Prof. Lang Tong upon her arrival at Cornell, and has
worked as a graduate research assistant. Previously, she worked as an under-
graduate research assistant in Micro Processor Research & Development Center of
Peking University from 2001 to 2003, during which period she participated in the
development of Unicore System, a key project in the National 863 Plan of China.
Ting has been a student member of IEEE since 2004. She received the Best Stu-
dent Paper Award at the 2005 International Conference on Acoustic, Speech and
Signal Processing (ICASSP). Her research topics include nonparametric change
detection and estimation, stepping-stone detection, and both centralized and dis-
tributed information flow detection in wireless and sensor networks. Her research
aims at applying signal processing tools to network layer analysis and design. Her
general research interests include signal processing, information theory, algorithm
design, and network security.
iii
To my parents and family.
iv
ACKNOWLEDGEMENTS
First and foremost, I would like to thank my advisor Prof. Lang Tong without
whom my Ph.D. would not have been possible. I am grateful for his guidance and
generous sharing of experiences. I have learnt from him not only methodologies,
but also what it means to be a great researcher. I have always been impressed by
his enthusiasm about research and high standards of excellence. I firmly believe
that the in-depth discussions and group meeting presentations benefit me more
than any courses I have taken.
I would like to thank Prof. Stephen Wicker for serving on my committee. I
would like to thank Prof. Toby Berger and Prof. Sergio Servetto for not only being
on my committee but also teaching me the fantastic courses such as Information
Theory, Communication Networks, and Network Information Theory. I would like
to thank Dr. Ananthram Swami and Prof. Shai Ben-David for our extremely
successful collaborations which have not only led to fruitful publications but also
left truly pleasant experiences in my memory.
I would also like to thank Prof. Narahari U. Prabhu, Prof. David B. Shmoys,
and Prof. Charles Van Loan for teaching great courses in applied mathematics
which lay the foundation of my research. It was a true pleasure and priviledge for
me to have learnt from these exemplary professors.
I would also like to thank all the people in the ACSP research group for their
help and company. I will always remember Parv and Youngchul with whom I share
numerous experiences and thoughts, Qing and Min who clearly set the standards
for me, Cris who is my only and best bowling partner, Zhiyu and Gokhan for their
sharpness and confidence. I have to thank Vidyut specially because he is remark-
ably talented in simplifying complicated problems, as a result of which this thesis is
based on his template. I will always remember Anima for her enthusiastic attitude
v
to life, Chin-Chen and Abhishek for never minding that I never really understand
what they are saying, Stefan and Saswat for their great work efficiency, Oliver for
his interesting view of many things in life and unbelievably broad interest, Matt
for his remarkable versatility and kindness, and Tae eung for always being polite
and modest. Without these people, my Ph.D. life would have been incomplete.
I also thank Min Jiang and Jun Liu for their selfless help when I first came to
Cornell, Jun Cui and Yanling Wu for sharing the happy moments when I was most
lonely, Juan for being the best roommate I have ever had, Jian Kong, Qing He, Jing
Tao, Junyun Yang, and all the friends at Winston Ct. for giving me tremendous
fun, Din, Edward, Wen, Matt, Hui, Birsen, Azadeh, Kwangtaik, Peter, Christina,
John, Frank, and all the acquaintance I have made in the ECE department for
making my years in Rhodes Hall a lot more enjoyable than it would have been
otherwise.
Special thanks to Huajun without whom life would become so much lonelier.
Last but not the least, I would like to thank my parents and all my family for
all that they have done for me. I can never be grateful enough for their endless
care, support, and love.
This work was supported in part by the Multidisciplinary University Research
Initiative (MURI) under the Office of Naval Research Contract N00014-00-1-
0564, Army Research Laboratory CTA on Communication and Networks under
Grant DAAD19-01-2-0011, the National Science Foundation under Contract CCR-
0311055, TRUST (The Team for Research in Ubiquitous Secure Technology) which
receives support from the National Science Foundation (NSF award number CCF-
0424422), and the National Science Foundation under award CCF-0635070.
vi
TABLE OF CONTENTS
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction 11.1 Nonparametric Statistical Inference in Wireless Sensor Networks . . 11.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Nonparametric Change Detection and Estimation in 2D Random
Sensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Information Flow Detection by Timing Analysis . . . . . . . . . . . 101.4.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 121.4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Nonparametric Change Detection and Estimation 182.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 The Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . 202.2.3 Detection and Estimation . . . . . . . . . . . . . . . . . . . 22
2.3 Performance Guarantee . . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Complete Algorithms . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . 472.5.2 Detector Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 482.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Extension to Finite-level Sensor Measurements . . . . . . . . . . . . 532.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.A Proof of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Detecting Information Flows Without Chaff Noise 623.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.2 Flow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.2.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
vii
3.3 Detecting Information Flows with Bounded Delay . . . . . . . . . . 653.4 Detecting Information Flows with Bounded Memory . . . . . . . . . 693.5 Comparing the Algorithms . . . . . . . . . . . . . . . . . . . . . . . 71
3.5.1 DMV vs. DA . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5.2 DM vs. DMV . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.A Proof of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.B Algorithms of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . 85
4 Detecting Information Flows With Chaff Noise 874.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.1 Multi-hop Flow Models . . . . . . . . . . . . . . . . . . . . . 884.2.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Flow Detectability . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.4 Detectability of Two-hop Flows . . . . . . . . . . . . . . . . . . . . 94
4.4.1 Two-hop Flows with Bounded Delay . . . . . . . . . . . . . 944.4.2 Two-hop Flows with Bounded Memory . . . . . . . . . . . . 97
4.5 Detectability of Multi-hop Flows . . . . . . . . . . . . . . . . . . . 994.5.1 Multi-hop Flows with Bounded Delay . . . . . . . . . . . . . 1004.5.2 Multi-hop Flows with Bounded Memory . . . . . . . . . . . 104
4.6 Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.7 Generalization of Poisson Assumption . . . . . . . . . . . . . . . . . 1144.8 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.8.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . 1174.8.2 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.A Proof of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.B Asymptotic CTR of MBMR . . . . . . . . . . . . . . . . . . . . . . 1404.C Algorithms of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 142
5 Distributed Detection of Information Flows 1535.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1535.2 The Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 154
5.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 1545.2.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . 155
5.3 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 1565.3.1 Level of detectability . . . . . . . . . . . . . . . . . . . . . . 1565.3.2 Level of Undetectability . . . . . . . . . . . . . . . . . . . . 1585.3.3 General Converse and Achievability . . . . . . . . . . . . . . 160
5.4 Quantizer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1625.5 Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5.1 Case I: Slotted Quantization, Full Side-Information . . . . . 164
viii
5.5.2 Case II: Slotted Quantization, Equal Capacity Constraints . 1675.5.3 Case III: One-Bit Quantization, Full Side-Information . . . . 1705.5.4 Case IV: One-Bit Quantization, Equal Capacity Constraints 173
5.6 Analysis and Comparison . . . . . . . . . . . . . . . . . . . . . . . 1755.6.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 1755.6.2 Numerical Comparison . . . . . . . . . . . . . . . . . . . . . 177
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1805.A Proof of Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1815.B Algorithms of Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . 197
6 Conclusions 2016.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2046.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Bibliography 206
ix
LIST OF TABLES
2.1 Time Complexity Comparison . . . . . . . . . . . . . . . . . . . . . 552.2 Space Complexity Comparison . . . . . . . . . . . . . . . . . . . . 55
3.1 Detect-Match (DM). . . . . . . . . . . . . . . . . . . . . . . . . . . 853.2 Alternative Implementation of (*). . . . . . . . . . . . . . . . . . . 853.3 Detect-Maximum-Variation (DMV). . . . . . . . . . . . . . . . . . 86
4.1 Levels of undetectabilities (Poisson null hypothesis). . . . . . . . . 1094.2 Parameters for Simulations on Synthetic Data. . . . . . . . . . . . 1184.3 Parameters for Simulations on Traces. . . . . . . . . . . . . . . . . 1214.4 Bounded-Greedy-Match (BGM). . . . . . . . . . . . . . . . . . . . 1424.5 Multi-Bounded-Delay-Relay (MBDR). . . . . . . . . . . . . . . . . 1434.6 Expanded-Multi-Bounded-Delay-Relay (E-MBDR). . . . . . . . . . 1444.7 Bounded-Memory-Relay (BMR). . . . . . . . . . . . . . . . . . . . 1454.8 Multi-Bounded-Memory-Relay (MBMR). . . . . . . . . . . . . . . 1464.9 Detect-Bounded-Delay (DBD). . . . . . . . . . . . . . . . . . . . . 1474.10 Detect-Multi-Bounded-Delay (DMBD). . . . . . . . . . . . . . . . 1494.11 Detect-Bounded-Memory (DBM). . . . . . . . . . . . . . . . . . . . 1504.12 Detect-Multi-Bounded-Memory (DMBM). . . . . . . . . . . . . . . 152
5.1 Detector for Case I. . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.2 Detector for Case II. . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.3 Detector for Case III. . . . . . . . . . . . . . . . . . . . . . . . . . 1995.4 Detector for Case IV. . . . . . . . . . . . . . . . . . . . . . . . . . 200
x
LIST OF FIGURES
1.1 Reported alarmed sensors (red) in two collections. . . . . . . . . . 61.2 Is S communicating with D through R? . . . . . . . . . . . . . . . 101.3 Transmission patterns of S, R, and D suggest a communication
between S and D through R. . . . . . . . . . . . . . . . . . . . . . 101.4 In a wireless network, eavesdroppers are deployed to report the
transmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmissionepochs of node A (or B). . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Members of HD and HCD; : sample point in S1, •: sample pointin S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 The set s1, s2, s3, s4 is shatterable by axis-aligned rectangles, butthe set s1, s2, s3, s4, s5 is not. . . . . . . . . . . . . . . . . . . . . 31
2.3 Members of HR; : sample point in S1, •: sample point in S2. . . . 322.4 The set s1, s2, s3, s4 is shatterable by A ∪ B. . . . . . . . . . . . 352.5 Members of HV and HH; : sample point in S1, •: sample point in
S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6 Members of HDR; : sample point in S1, •: sample point in S2. . . 442.7 Detection threshold as a function of the sample size for different
VC-dimension’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.8 Detection threshold as a function of the detector size for different
sample sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.9 Miss detection probability of δdA as a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.10 Miss detection probability of δφAas a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.11 Detection probability of δdA as a function of detector size, 1000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12 Detection probability of δφAas a function of detector size, 10000
Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 Detecting information flows through nodes A and B by analyzingtheir transmission activities S1 and S2. . . . . . . . . . . . . . . . 63
3.2 Both the solid and the dotted lines denote matchings that are causaland bounded in delay, but the dotted lines also preserve the orderof incoming packets. . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Finding the match of s1(1): there are three candidates in the ∆-length interval following s1(1). . . . . . . . . . . . . . . . . . . . . 68
3.4 (a) the cumulative counting functions ni(w) (i = 1, 2); (b) thecumulative difference d(w) and the maximum variation v(w). . . . 70
xi
3.5 The statistic of DA is no larger than that of DMV. . . . . . . . . 753.6 PF (δDA), PF (δDMV), and their bounds; M = 40 packets, 100000
Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.7 PF (δDM) under various rates; ∆ = 10 seconds, 100000 Monte Carlo
runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.8 PF (δDA), PF (δDMV), and PF (δDM); M = 40 packets, ∆ = 10 seconds,
100000 Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Detecting information flows through nodes R1, R2, . . . , Rn by mea-suring their transmission activities; dotted lines denote a potentialroute. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 An information flow along the path R1 → . . . → Rn. . . . . . . . . 884.3 BGM: a sequential greedy match algorithm. . . . . . . . . . . . . 954.4 Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated
by BMR. Initially, M1(0) = 0, indicating that the memory is empty.The first packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet isan arrival, and thus the memory size is increased by one. Suchupdating occurs at each arrival or departure. . . . . . . . . . . . . 98
4.5 Example: (a) The scheduling obtained by repeatedly using BGM.(b) Another scheduling. It shows that repeatedly using BGM issuboptimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.6 MBDR: a recursive greedy match algorithm. . . . . . . . . . . . . 1024.7 λ = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.8 λ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.9 λ = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.10 MBMR for n = 4 and M = 3 (s = s1
⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the10th packet), (M1(10), M2(10), M3(10)) = (1, 1, 0). . . . . . . . . 106
4.11 The level of undetectability βMn and its bounds as functions of n:
M = 4; compute βMn on 10000 packets. . . . . . . . . . . . . . . . 108
4.12 The c.d.f. of the CTR of BGM for ∆ = 5: CTR on traces vs. CTRon Poisson processes. . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.13 The c.d.f. of the CTR of BMR for M = 20: CTR on traces vs.CTR on Poisson processes. . . . . . . . . . . . . . . . . . . . . . . 116
4.14 Generating information flows with bounded memory (⌊M2⌋ = 3): f2
is generated by storing ⌊M2⌋ packets from f1 and randomly releasing
these packets during the arrival of the next ⌊M2⌋ packets. . . . . . 118
4.15 The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, 100 packets per process, 10000 Monte Carloruns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
xii
4.16 The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, 40 packets per process, 10000 Monte Carloruns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.17 The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, totally 200 packets over all process, 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.18 The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, totally 100 packets over all process, 10000Monte Carlo runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.19 PF (δBM), PF (δDAC), PF (δBD), and PF (δS-III) on LBL-PKT-4: M =20, ∆ = 5, threshold for δBD = 1/14, threshold for δBM = 1/21,tested on 134 × 133 trace pairs. . . . . . . . . . . . . . . . . . . . 122
4.20 PM(δBM) and PM(δDAC): M = 20, Nc = 1000, threshold for δBM =1/21, tested on 4000 bounded memory flows. . . . . . . . . . . . . 123
4.21 PM(δBD) and PM(δS-III): ∆ = 5, Nc = 1000, threshold for δBD =1/14, tested on 4000 bounded delay flows. . . . . . . . . . . . . . 124
4.22 Inserting virtual packets to calculate the delays of chaff packets. . 1264.23 The Markov chain formed by d′(w); p = λ1
λ1+λ2, q = 1 − p. . . . . 127
4.24 Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗. . . . . . . 130
4.25 Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets. 131
4.26 A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.27 The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15. . . . . . . . . . . . . . . 141
5.1 In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at thefusion center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.2 A distributed detection system. This system consists of two quan-tizers q
(t)1 and q
(t)2 and a detector δt. . . . . . . . . . . . . . . . . . 156
5.3 Inserting one chaff packet can destroy the alignment of measure-ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 IC-SF: Match s1 with s2 subject to delay bound T +∆. : directlyobserved epoch in s2; •: reconstructed epoch in s1. . . . . . . . . . 165
5.5 IC-OF: Backward greedy matching. Each epoch is matched to thefirst unmatched nonempty slot that is no more than ∆ earlier. . . 170
5.6 λ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1795.7 λ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1795.8 λ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
xiii
5.9 Construct f1: : original epochs; •: constructed epochs. . . . . . . 1835.10 Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found by
IC-SE guarantees that x′j = y′
j, 2 + y′j+1, 1. . . . . . . . . . . . . . . 186
xiv
Chapter 1
Introduction
1.1 Nonparametric Statistical Inference in Wireless Sensor
Networks
Wireless sensor networks have become increasingly popular in the past few years.
The development of such networks was originally motivated by military applica-
tions such as battlefield surveillance. Now the use of wireless sensor networks is ex-
tended to many civilian applications, including environment and habitat monitor-
ing, healthcare applications, home automation, process monitoring, traffic control,
etc. These applications all require the collaborative inference of certain physical
or environmental conditions based on information collected by the sensors.
In classical statistical inference, the conditions to be inferred are assumed to
be characterized by a known parametric family, and the problem is reduced to
find the correct index in this family. This approach corresponds to the case when
there is thorough understanding of the conditions and their influence on sensor
measurements so that it is possible to formulate a parametric model properly.
Unlike in classical inference, the phenomena to be monitored by wireless sensor
networks are often not known at the time of inference or too diverse to fit into
specific parametric models. Therefore, it is desirable that nonparametric statistical
inference is considered in applications of wireless sensor networks.
The need of nonparametric inference also arises out of the concern of network
security. Wireless sensors may be deployed in an open environment and thus sub-
1
ject to tempering by malicious intruders. In this case, it has been proposed to use
statistical inference methods to identify misbehaving sensors, but the knowledge
about how compromised sensors will behave is very limited. Recent research inter-
ests have grown in the area of defending against intelligent adversaries, when the
intruder can control compromised sensors to collaboratively disrupt the inference
in an intelligent manner. In the presence of intelligent adversaries, it is highly
desirable that inference methods can guarantee certain performance even in the
worst case.
It is generally impossible to design a single inference method that is optimal
for different underlying distributions. Thus the enforcement of nonparametric
methods will inevitably result in a loss of performance for a specific distribution. In
the presence of intelligent adversaries, there may even be scenarios in which reliable
inference is impossible. Therefore, it is crucial to investigate the performance of
nonparametric inference techniques and their fundamental limits.
1.2 Dissertation Outline
This thesis attempts to study nonparametric statistical inference in wireless sensor
networks from the perspectives of both theoretical analysis and practical algorithm
design. The thesis addresses two problems. The first problem is the nonparamet-
ric detection and estimation of changes in the geographical distribution of alarmed
sensors, where detectors with exponentially decaying error probabilities and consis-
tent estimators are developed. The second problem is the detection of information
flows by timing analysis. The problem is further divided into three subproblems,
which deal with the detection of information flows without chaff noise, the detec-
2
tion in the presence of chaff noise, and distributed detection. Various detectors are
developed under the assumption of intelligent adversaries, and their asymptotic
performance is evaluated by error exponents (in the case of no chaff) or the maxi-
mum amount of chaff noise to guarantee vanishing error probabilities (otherwise).
In Chapter 2, we consider nonparametric change detection and estimation in
planar random sensor fields. Sensors are deployed to measure certain underlying
phenomenon and make binary decisions (i.e., alarmed, normal). Given samples
of the locations of alarmed sensors from two data collections, we want to know
whether and where the underlying phenomenon has changed. Assuming that in
each collection, the samples are i.i.d. drawn from an unknown geographical dis-
tribution, we formulate the problem as detecting changes in this distribution be-
tween two collection periods and estimating the location of the maximum change
if changes do occur. Our main contributions include a threshold detector based on
the distance between empirical distributions and uniform upper bounds on its error
probabilities under arbitrary distributions. Polynomial-time algorithms are devel-
oped to implement the detector for several types of empirical distances. Solutions
to the detection problem also give an estimate of the set with the largest change.
We show that under certain regularity conditions, such estimation is consistent.
In Chapter 3, we consider the detection of information flows. Given a wireless
or wired ad hoc network, we want to know if there are flows of information-carrying
packets through the nodes of interest by measuring the transmission activities of
these nodes in timing. Timing analysis has the advantages of robustness against
encryption and padding and easily obtainable measurements (especially in wireless
networks). Its challenges include perturbations imposed by delays, permutations,
etc. and chaff noise, which consists of dummy traffic and unrelated traffic mul-
3
tiplexed at intermediate nodes. In this chapter, we decompose the detection to
pairwise detection of every two hops of the information flows and only consider
timing perturbations. Assuming that the perturbations are bounded in delay or
memory, and there is no chaff noise, we develop linear-time detectors which have no
miss detection. We show that the proposed detectors outperform existing detectors
in false alarm probability.
In Chapter 4, we generalize the detection of information flows to allow the
insertion of chaff noise. Assuming that nodes can collaboratively perturb timing
and insert chaff noise to evade detection, we show that there exists a threshold
on the fraction of chaff noise, beyond which Chernoff-consistent detection is im-
possible. The threshold is characterized as the minimum chaff noise required for
an information flow to mimic the distribution under the null hypothesis. Optimal
chaff-inserting algorithms are developed to compute the threshold, and closed-
form expressions are obtained under the assumption that traffic under the null
hypothesis can be modelled as independent Poisson processes. Furthermore, we
develop a threshold detector based on the optimal chaff-inserting algorithms, which
can achieve Chernoff-consistent detection in the presence of chaff noise arbitrar-
ily close to the threshold. Therefore, we obtain a tight bound on the fraction of
chaff noise, within which the proposed detector is Chernoff-consistent, and beyond
which there exists an information flow embedded in chaff noise that is statisti-
cally identical with traffic under the null hypothesis so that no detector can be
Chernoff-consistent. We use this bound to characterize the level of detectability of
information flows in chaff noise. Furthermore, we show that joint detection over
multiple hops can greatly increase the level of detectability.
Chapter 5 addresses distributed detection of information flows. We focus on
4
pairwise detection of bounded delay flows in chaff noise. In distributed detection,
the collection of measurements is subject to capacity constraints in the commu-
nication channels, which makes the problem more applicable to wireless sensor
networks because the wide deployment and limited power supply make it neces-
sary to limit the communication rates. We derive theoretical upper and lower
bounds on the level of detectability as functions of the capacity constraints, and
then direct the focus to the design of practical detection systems. The detec-
tion systems consist of simple slot-based quantizers and threshold detectors based
on optimal chaff-inserting algorithms which compute the minimum chaff noise re-
quired to generate the received (compressed) measurements. Performance of the
proposed detection systems is analyzed and compared to gain heuristics on system
design.
The rest of this chapter elaborates each problem that we have introduced,
including our initial motivation for the problem, a summary of results, and a brief
overview of the related work.
1.3 Nonparametric Change Detection and Estimation in
2D Random Sensor Fields
We consider the detection of certain phenomenal change in a large-scale randomly
deployed sensor field. For example, sensors may be designed to detect certain
chemical components. When the sensor measurement exceeds certain threshold,
the sensor is “alarmed”. The state of a sensor depends on where it resides; sensors
in some area are more likely to be in the alarmed state than others are. We are not
interested in the event that certain sensors are alarmed. We are interested instead
5
in whether there is a change in the geographical distribution of alarmed sensors
from data collections at two different times. Such a change in distribution could
be an indication of abnormality.
First data collection Second data collection
Figure 1.1: Reported alarmed sensors (red) in two collections.
We assume that some (not necessarily all) of the alarmed sensors are reported
to a fusion center, either through the use a mobile access point (SENMA [36])
or using certain in-network routing scheme. Suppose that the fusion center ob-
tains reports of the locations of alarmed sensors, as illustrated in Fig. 1.1, from
two separate data collections. In the ith report, let the location of alarmed sen-
sors have some unknown distribution Pi, and each sample Si be a set of locations
drawn independently according to Pi. The change detection problem is one of
testing whether P1 = P2 without making prior assumptions about the data gener-
ating distributions Pi. Note that Pi only specifies the geographical distribution of
alarmed sensors. The joint distribution of alarmed and non-alarmed sensors is not
specified completely. A change in Pi may be caused by the change of the actual
phenomenon or the change of the sensor lay-out.
Such a general nonparametric assumption comes with a cost of usually requiring
large sample size, which renders the solution most applicable in large-scale sensor
networks where it is possible to obtain a large amount of sensor data.
There is also a related estimation problem in which, assuming that the detection
of change has been made, we would like to know where in the sensor field the change
has occurred, or where the change is the most significant (in a sense that will be
6
made precise later).
1.3.1 Summary of Results
We present a number of nonparametric change detection and estimation algorithms
based on an application of Vapnik-Chervonenkis Theory [40]. The basis of this ap-
proach has been outlined in [3] where we provided a mathematical characterization
of changes in distribution. Our focus is on the algorithmic side, aiming at obtain-
ing practical algorithms that scale with the sample size along with a certain level
of performance guarantee.
We first present results that establish a theoretical guarantee of performance.
The nonparametric detection problem considered here depends on the choice of the
distance measure between two probability distributions, and the choice is usually
subjective. We consider two distance measures. The first is the so-called A-
distance (also used in [3]) that measures the maximum change in probability on
A—a collection of measurable sets. The second is called relative A-distance—a
variation from that in [3]—for cases when the change in probability is concentrated
in areas of small probability. With these two distance measures, we apply the
Vapnik-Chervonenkis Theory to obtain exponential bounds1 on detection error
probabilities and establish the consistency results for the proposed detector and
estimator.
Next we derive a number of practical algorithms. The complexity of applying
the Vapnik-Chervonenkis Theory comes from the search among a (possibly infinite)
collection of measurable sets. In particular, given data S being the union of the
1Here we mean the error probabilities decay exponentially with the increase of sample size.
7
samples from the two collections, i.e., S = S1
⋃
S2, the key is to reduce the search
in an infinite collection of sets (e.g.,planer disks) to a search in a finite collection
H(S) (a function of S). Here we need a constraint on H(S) such that this reduction
does not affect the performance.
We consider three commonly used geometrical shapes—disks, rectangles, and
stripes—as our choices of measurable sets A. For the A-distance measure, if
M = |S| is the total number of data points in the two collections, we show that
a direct implementation of exhaustive search among the collection of all planer
disks has the complexity O(M4). We present a suboptimal algorithm, the Search
in sample-Centered Disks (SCD), that has the complexity O(M2 log M). Under
mild assumptions on Pi, the loss of performance of SCD diminishes as the sample
size increases. For the class of axis-aligned rectangles, we show that the opti-
mal search Search in Axis-aligned Rectangles (SAR) has complexity O(M3). A
suboptimal approach Search in Diagonal-defined axis-aligned Rectangles (SDR)
reduces the complexity to O(M2), again, with diminishing loss of performance
under mild assumptions. For the collection of strips, we present two algorithms:
Search in Axis-aligned Stripes (SAS) and Search in Random Stripes (SRS), both
have complexity O(M log M). Similar analysis has also been obtained for the
relative distance metric. See Table 2.1.
We implement several algorithms and verify their performance through simu-
lation. We also answer some practical questions arising in the implementation of
the detector, e.g.,how to decide the detection threshold and how to estimate the
minimum sample size.
8
1.3.2 Related Work
The problem of change detection in sensor field has been considered in different
(mostly parametric) settings [24,27]. The underlying statistical problem belongs to
the category of two-sample nonparametric change detection. A classical approach
is the Kolmogorov-Smirnov two-sample test [13] in which the empirical cumulative
distributions are compared, and the maximum difference in the empirical cumu-
lative distribution functions are used as test statistics. In a way, the proposed
methods generalize the idea of Kolmogorov-Smirnov test to a more general col-
lection of measurable sets using general forms of distance measures. Indeed, the
Kolmogorov-Smirnov two-sample test becomes a special case of the SAR (Search
in Axis-aligned Rectangles) algorithm presented in Section 2.4.1.
There is a wealth of nonparametric change detection techniques for one-
dimensional data set in which data are completely ordered. Examples include
testing the number of runs (successive sample points from the same collection) such
as Wald-Wolfowitz runs test, or testing the relative order of the sample points, e.g.
median test, control median test, Mann-Whitney U test, and linear rank statistic
tests [13,22,33]. Such techniques, however, do not have natural generalizations for
the two dimensional sensor network applications.
Vapnik-Chervonenkis Theory (VC Theory) is a statistical theory about com-
putational learning processes developed by Vapnik and Chervonenkis [38–40]. The
theory lays the theoretical foundation for learning-based inference methods. The
parts of the theory most related to our problem are the theory of consistency in
learning and the nonasymptotic theory of convergence rates. The theory has been
substantially developed since the original study of Vapnik and Chervonenkis; see
the book chapter by Bousquet et al. in [6] and the references therein.
9
1.4 Information Flow Detection by Timing Analysis
Consider a wireless ad hoc network illustrated in Fig. 1.2. We want to know if there
is information flowing between nodes S, R, and D. Suppose that we can observe
their transmission epochs2, as shown in Fig. 1.3. Then from the transmission
patterns, we can probably infer that S is communicating with D, and R is the
relay node.
S
DR
Figure 1.2: Is S communicating with D through R?
t
t
t
S
D
R
Figure 1.3: Transmission patterns of S, R, and D suggest a communicationbetween S and D through R.
This example belongs to the problem of detecting information flows by timing
analysis. Generally, in a wireless ad hoc network illustrated in Fig. 1.4, there
may be information flows along multiple potential routes. We want to decide
whether a particular information flow is going on by eavesdropping the traffic on
the route. Suppose that we cannot rely on nodes in the network to report the
information flows, and all the packets are encrypted and padded at every hop,
leaving only timing information to be observable. Given eavesdroppers deployed
2For example, if the nodes use transmitter directed signaling to communicate, and we knowthe transmitters’ codes, then we can deploy eavesdroppers turned to the transmitters’ codes todetect transmission activities.
10
to record transmission epochs of the nodes of interest, the problem is how to
correlate these transmission epochs to detect whether the corresponding nodes are
transmitting an information flow.
A B
S1: S2:
Uplinkchannels
Detector
Wireless node Eavesdropper
Figure 1.4: In a wireless network, eavesdroppers are deployed to report thetransmission activities of nodes A and B to a detector at the fusioncenter, which in turn decides whether there are information flowsthrough these nodes. Si (i = 1, 2): a sequence of transmission epochsof node A (or B).
Timing measurements are subject to a number of sources of perturbations. For
example, a relay node can hold the incoming packets for random periods of time,
reshuffle them, relay them in batches, etc. Furthermore, traffic on different routes
will multiplex at the intersecting nodes, and relay nodes may selectively drop
certain packets or insert dummy packets. Both traffic multiplexing and packet
dropping/insertion cause our measurements to contain packets that do not belong
to the information flow of interest. We will refer to such packets as chaff noise.
The presence of chaff noise significantly increases the difficulty of the problem.
Another challenge comes from capacity constraints in the uplink channels. In
wide-area networks such as wireless sensor networks, eavesdroppers are often pow-
ered by batteries and have to report to the fusion center with limited power.
11
Therefore, the uplink channels are subject to limited capacity constraints. The
direct consequence is that the measurements received at the fusion center will not
be identical with the raw measurements of the eavesdroppers, but will be distorted
to a certain extent. An exception is when the detector is located at one of the
eavesdroppers, in which case the detector knows the raw measurements of that
eavesdropper perfectly, referred to as the case of full side-information.
1.4.1 Summary of Results
We consider the detection of information flows through certain nodes of interest
by measuring their transmission activities in timing. We first consider detecting
information flows with exact timing measurements (i.e., centralized detection) and
then add capacity constraints in data collection (i.e., distributed detection). With
transmission activities modelled by point processes, the problem is formulated as a
hypothesis testing against point processes conforming to certain flow models. We
consider two types of flow models derived from constraints in reliable communi-
cations: the bounded delay flow and the bounded memory flow. Chaff noise does
not need to satisfy any of the constraints.
For centralized detection of information flows without chaff noise, we develop
pairwise linear-time detection algorithms by packet matching or counting schemes.
We show that these algorithms have no miss detection and exponentially decaying
false alarm probabilities if traffic under the null hypothesis can be modelled as
independent Poisson processes. We compare our algorithms with existing detec-
tion algorithms by both error exponent analysis and numerical simulations. The
comparison shows that our algorithms outperform the existing ones.
12
For centralized detection of information flows with chaff noise, we give an ex-
act characterization of the level of detectability of information flows, defined as
the maximum fraction of chaff noise allowed for Chernoff-consistent detection (de-
tailed definition is in Chapter 4). Our contributions include a converse result
and an achievability result. For the converse, we show that there is a bound on
the fraction of chaff noise beyond which Chernoff-consistent detection is impos-
sible. Specifically, the bound is characterized as the minimum fraction of chaff
noise needed to make an information flow statistically identical with traffic under
the null hypothesis. This bound is used to establish a level of undetectability for
information flows. Optimal chaff-inserting algorithms are proposed to calculate
the level of undetectability, and closed-form expressions are derived under the as-
sumption that traffic under the null hypothesis can be modelled as independent
Poisson processes. For the achievability, we develop a detector based on the opti-
mal chaff-inserting algorithms, which claims detection if the fraction of chaff noise
in the measurements computed by these algorithms is bounded by a predeter-
mined threshold. Under Poisson null hypothesis, the proposed detector is proved
to be Chernoff-consistent for all the information flows with fractions of chaff noise
bounded by the level of undetectability. Therefore, the level of detectability is
equal to the level of undetectability, and the proposed detector is optimal. We
show that the level of detectability increases to one as the number of hops in the
information flow increases, indicating that it is impossible to hide information flows
over arbitrarily long paths.
For distributed detection of information flows (with chaff noise), we focus on
the bounded delay flow model and pairwise detection. Our results have both
theoretical and algorithmic elements. In the theoretical aspect, we extend the
notions of detectability and undetectability to the context of distributed detec-
13
tion. Theoretical upper and lower bounds on the level of detectability are derived.
In the algorithmic aspect, we propose a three-stage detection procedure which
consists of quantization, data transmission, and detection. Quantization is per-
formed based on fixed slot partition. We propose a slotted quantizer and a one-bit
quantizer which compress each slot into the number of epochs and the indicator
of nonempty slot, respectively. Under each quantization, we develop a threshold
detector based on the optimal chaff-inserting algorithm analogous to those in cen-
tralized detection except that its input is quantized. With the performance of a
detection system measured by the maximum fraction of chaff noise such that the
system remains Chernoff-consistent, we compare the proposed detection systems
together with their analytical upper bounds. The comparison shows that slotted
quantization outperforms one-bit quantization for heavy traffic, and the detector
under slotted quantization and full side-information is near optimal. Performance
of the proposed detection systems gives lower bounds on the level of detectability
as a function of capacity constraints.
1.4.2 Related Work
Information flow detection is a special case of timing analysis, which in turn belongs
to the family of traffic analysis problems [12]. For wireless networks, the idea of
traffic analysis is especially promising because the shared wireless medium is open
to interception. Most of the existing work on traffic analysis in a wireless context
is experiment-oriented, e.g., [26, 35,50].
The problem of detecting information flows has mainly been addressed in the
framework of intrusion detection in wired networks, especially internet. In 1995,
Staniford and Heberlein [34] first considered the problem of stepping-stone detec-
14
tion. The key problem in stepping-stone detection is to reconstruct the intrusion
path by analyzing various characteristics of the attacking traffic. Related work in
the literature only considers pairwise detection.
Early detection techniques are based on the content of the traffic; see, e.g.,
[34, 46]. To deal with encrypted traffic, timing characteristics are used in detec-
tion, such as the On-Off detection by Zhang and Paxson [51], the deviation-based
detection by Yoda and Etoh [48], and the packet interarrival-based detection by
Wang et al. [45]. The drawback of these approaches is that they are vulnerable to
active timing perturbations by the attacker.
Donoho et al. [11] were the first to consider bounded delay perturbation. They
showed that if packet delays are bounded by a maximum amount, then it is possible
to distinguish traffic containing information flows from independent traffic. Their
work was followed by several practical detectors, including the watermark-based
detector by Wang and Reeves [44] and the counting-based detector by Blum et
al. [5].
The problem becomes much more challenging when chaff can be inserted, with
only incomplete solutions in the literature, e.g., [5, 11, 29, 49]. Donoho et al. [11]
showed that there will be distinguishable difference between information flows in
chaff and independent traffic if chaff noise is independent of the information flows.
Peng et al. [29] and Zhang et al. [49] separately proposed active and passive packet-
matching schemes which can detect information flows with bounded delay in chaff
if chaff packets only appear in the outgoing traffic of the relay node. Blum et al. [5]
modified their counting-based detector to handle a limited number of chaff packets
at the cost of an increased false alarm probability. These techniques can only deal
with a fixed number of chaff packets if nodes can insert chaff noise intelligently.
15
The dual problem of information flow detection is how to randomize transmis-
sion activities to maximally conceal information flows. This is a critical problem
in protecting anonymous communications against timing analysis attacks. In the
context of wireless ad hoc networks, Hong et al. in [23] proposed to add random
delays to prevent correlation of specific packets, and Deng et al. in [10] proposed to
randomize sensor transmission epochs within each data collection period to thwart
the detection of routes. At flow level, however, transmission activities of nodes on
the same information flow are still correlated. Zhu et al. in [52] proposed to make
traffic on all the outgoing links from a certain node identical in timing by inserting
chaff noise. Although this approach completely hides the information flow, it is
inefficient in terms of the required amount of chaff noise. More efficient methods
to hide information flows are developed based on the chaff-inserting algorithms
developed in Chapter 4; see [41].
Distributed detection of information flows belongs to hypothesis testing un-
der multiterminal data compression [16]. Solutions in this field can model spatial
correlation across nodes, which generalize the conditional i.i.d. assumption made
in classical distributed detection [37], but they can not deal with temporal cor-
relation. Specifically, existing work only deals with temporal i.i.d. data, i.e., the
observations (xi, yi) (i = 1, 2, . . .) are drawn i.i.d. from a distribution P , where
P = P0 under H0, and P = P1 under H1. The best error exponent (or its lower
bounds) is derived as a function of data compression rates under the Neyman Pear-
son framework; see [16] and the references therein. Our problem is fundamentally
different because the timing measurements of information flows are not i.i.d. , and
our hypotheses do not have a single-letter characterization.
The problem of compressing Poisson processes has been studied previously
16
in [31, 42]. Rubin in [31] derived the rate distortion function and practical com-
pression schemes under the absolute-error distortion measure. Verdu in [42] derived
a closed-form expression for the rate distortion function under an asymmetric dis-
tortion measure on interarrival times. One of our quantization schemes, slotted
quantization, is the same as the quantization scheme proposed by Rubin. Although
slotted quantization is near optimal in Rubin’s problem, it is not necessarily near
optimal in our problem since we want to optimize the overall detection performance
whereas Rubin just wanted to reconstruct the processes.
17
Chapter 2
Nonparametric Change Detection and
Estimation
2.1 Outline
In this chapter, statistical learning based techniques are proposed to detect changes
in a 2D random field without statistical knowledge about the underlying distribu-
tions. Section 2.2 specifies the model and defines the detector and the estimator.
Section 2.3 states the main theorems about the exponential bounds on error prob-
abilities of the detector and the consistency of the estimator. Section 2.4 presents
the detection and estimation algorithms, and Section 2.5 provides simulation re-
sults. The chapter is concluded with comments about the strengths and weaknesses
of the proposed approach.
2.2 The Problem Statement
2.2.1 The Model
Let set Ω denote the sensor field and F the σ-field on Ω. We assume that in each
data collection, we draw i.i.d. samples from the locations of alarmed sensors. Let
Pi (i = 1, 2) be a probability measure on (Ω, F) modelling the drawing in the ith
collection. Drawings in different collections are independent. Let Si denote the set
of locations collected in the ith collection and S = S1
⋃
S2 the set that contains
18
data from the two collections.
We point out that the joint distribution of sensor location and report, which
is influenced by sensor layout, readings, and sampling strategy, is not completely
specified. Note that although the i.i.d. assumption implies that the decisions of
alarm occur independently, the decisions are not necessarily identically distributed,
and the probability of alarm may vary at different locations. Moreover, the prob-
ability that an alarmed sensor reports to the fusion center may also be different
across sensors. Both of these probabilities can be incorporated into Pi. Note that
how unalarmed sensors are distributed is not specified; we can model arbitrary
correlations among them, and they will not have any impact on our result. This
allows us to model certain types of correlated sensor readings.
The probability measures Pi (i = 1, 2) are not known. Instead of making specific
assumptions on the form of Pi, we introduce a collection A ⊆ F of measurable sets
to model the geographical areas of practical interest and only look for changes in
the probabilities of sets in A. The collection A represents our prior knowledge of
what changes are expected. It does not have to be finite or even countable, and
is part of the algorithm design. For example, if we expect changes in the mean
of a symmetric distribution with monotone decreasing density from a center, it
may be good to choose A as the collection of disks. The choice of A is subjective,
and it depends on the application at hand. We will focus in this chapter on
regular geometrical shapes: disks, rectangles, and stripes. Intuitively, disks and
rectangles are suitable for changes in the location or spread of the probability mass,
and stripes (a special type of rectangles) are better for changes in correlation or
marginal distributions. We point out that although parametric model for Pi is not
needed, prior knowledge helps detection by allowing us to choose A which “fits”
19
the changes best, as discussed after Theorem 1.
Given a pair of samples S1, S2 drawn i.i.d. from distributions P1, P2, and a
collection A ⊆ F , we are interested in whether there is a change in probability
measure on A and, if there is a change, which set in A has the maximum change
of probability. Specifically, the detection problem considered in this chapter is the
test of the following hypotheses on A
H0 : P1 = P2 vs. H1 : P1 6= P21
The estimation problem, conditioned on that there is a change, is to estimate
the set A∗ ∈ A that has the maximum change. For example, using the absolute
difference, we want to estimate
A∗ = arg maxA∈A
|P1(A) − P2(A)|.
We will also consider a normalized difference measure in Section 2.2.2.
2.2.2 Distance Measures
To measure changes, we need some notion of distance between two probability
distributions. In this chapter, we will consider two distance measures: A-distance
and relative A-distance.
Definition 1 (A-distance and empirical A-distance [3]) Given probability spaces
(Ω,F , Pi) and a collection A ⊆ F , the A-distance between P1 and P2 is defined as
dA(P1, P2) = supA∈A
|P1(A) − P2(A)|. (2.1)
1Here H0 means P1(A) = P2(A) for all A ∈ A and H1 ∃A ∈ A s.t. P1(A) 6= P2(A).
20
The empirical A-distance dA(S1, S2) is similarly defined by replacing Pi(A) by the
empirical measure
Si(A)∆=
|Si
⋂
A||Si|
(2.2)
where |Si ∩ A| is the number of points in both Si and set A.
This notion of empirical A-distance dA(S1, S2) is related to the Kolmogorov-
Smirnov two-sample statistic. For the case where the domain set is the real line,
the Kolmogorov-Smirnov test considers
supx
|F1(x) − F2(x)|, Fi(x)∆= Pi(y : y ≤ x)
as the measure of difference between two distributions. By setting A to be the
set of all the one-sided intervals (−∞, x), dA(S1, S2) is the Kolmogorov-Smirnov
statistic.
The A-distance does not take into account the relative significance of the
change. For example, one could argue that changing the probability of a set from
0.99 to 0.999 is less significant than a change from 0.001 to 0.01 because the latter
is a ten-fold increase whereas the former is just an increase by less than 1%. For
applications in which small probability sets are of interest, we introduce the fol-
lowing notion of relative A-distance that takes the relative magnitude of a change
into account.
Definition 2 (Relative and Empirical Relative A-distance) Given proba-
bility spaces (Ω,F , Pi) and a collection A ⊆ F , the relative A-distance between
P1 and P2 is defined as
φA(P1, P2) = supA∈A
fφ(P1(A), P2(A)), (2.3)
21
where fφ : [0, 1] × [0, 1] → [0,√
2] is defined as
fφ(x, y) =
0 if x = y = 0
|x−y|√x+y
2
o.w.. (2.4)
The empirical relative A-distance is defined similarly by replacing Pi(A) with
the empirical measure defined in (2.2).
The above definition is slightly different from that used in [3]. It is obvious
that |P1(A) − P2(A)| is a metric. The proof that |P1(A)−P2(A)|√
P1(A)+P2(A)2
is a metric follows
from [2]. Note that in general dA(P1, P2) = 0 or φA(P1, P2) = 0 does not imply
P1 = P2, but implies P1(A) = P2(A) for any A ∈ A. If we only care about sets in
A, then dA and φA defined above are pseudo-metrics.
2.2.3 Detection and Estimation
With the distance measure defined, we can now specify the class of detectors and
estimators considered in this chapter.
Definition 3 (Detector δ(S1, S2; ǫ):) Given two collections of sample points S1
and S2, drawn i.i.d from probability distributions P1 and P2 respectively, and thresh-
old ǫ ∈ (0, 1), for hypotheses H0 vs. H1, the detector using the A-distance is defined
22
as2
δdA(S1, S2; ǫ) =
1 if dA(S1, S2) > ǫ
0 otherwise(2.5)
The detector δφA(S1, S2; ǫ) using the relative A-distance is defined the same way by
replacing dA(S1, S2) by φA(S1, S2) and letting ǫ ∈ (0,√
2).
Assuming that a change of probability distribution has occurred, we define the
estimator for the event that gives the maximum change in probabilities.
Definition 4 (Estimator A∗(S1, S2):) Given two collection of sample points S1
and S2, drawn i.i.d from probability distributions P1 and P2 respectively, the esti-
mator for the event that gives the maximum change of probability is defined as3
A∗dA
(S1, S2) = arg maxA∈A
|S1(A) − S2(A)| .
The estimator A∗φA
(S1, S2) using the relative A-distance is defined similarly.
The definitions given above require searching in a possibly infinite collection
of sets. At the moment, we only specify what the outcome should be without
addressing the algorithmic procedure generating it. We will address that issue in
Section 2.4.
2We use the convention that the detector gives the value 1 for H1 and 0 for H0. Note that ifnǫ is an integer, then dA(S1, S2) will be equal to ǫ with positive probability. Ideally, in Neyman-Pearson framework, randomization is used when dA(S1, S2) = ǫ to achieve lower miss probabilityalthough it complicates the analysis. Instead, we stick to such a deterministic detector and derivean explicit expression for the threshold by Vapnik-Chervonenkis inequalities. See the discussionsfollowing Theorem 1.
3In the case of tie, choose any one of the sets achieving the maximum change of empiricalprobability.
23
2.3 Performance Guarantee
We present in this section consistency results for the detector and estimator pre-
sented earlier. The results are given in the forms of error exponents.
First let us look at some technical preliminary from [39]. For measurable space
(Ω,F), let A ⊆ F . We say a set S ⊂ Ω is shatterable by A if for all B ⊆ S,
∃A ∈ A s.t.
B = A ∩ S.
Definition 5 (VC-Dimension) The Vapnik-Chervonenkis dimension of a col-
lection A of sets is
VC-d(A) = supn : ∃S s.t. |S| = n and S is shatterable by A.
The VC-dimension of a class of sets quantifies its ability to separate sets of points.
Intuitively, the VC-dimension of a class A is the maximum number of free param-
eters needed to specify a set in A. For example, if A = 2D disks, then we see
that at most 3 free parameters are needed — x, y-coordinates of the center and a
radius, and it is shown that the VC-dimension of A is indeed 3 ( [47]).
Note that the VC-dimension of a class may be infinite,e.g.,VC-dimension of the
entire σ-field F is ∞ because any set is shatterable by F .
Theorem 1 (Detector Error Exponents) Given probability spaces (Ω,F , Pi)
and a collection A ⊆ F with finite VC-dimension d, let Si ⊂ Ω be a set of n sample
points drawn according to Pi. The false alarm probabilities for the detectors defined
24
in (2.5) are bounded by
PF (δdA) ≤ 8(2n + 1)de−nǫ2/32, (2.6)
PF (δφA) ≤ 2(2n + 1)de−nǫ2/4. (2.7)
Furthermore, if dA(P1, P2) > ǫ and φA(P1, P2) > ǫ, the miss detection probabil-
ities satisfy, respectively,
PM(δdA , P1, P2) ≤ 8(2n + 1)de−n[dA(P1,P2)−ǫ]2/32,
(2.8)
PM(δφA, P1, P2) ≤ 16(2n + 1)de−n[φA(P1,P2)−ǫ]2/16.
(2.9)
Proof: See Appendix 2.A.
A few remarks are in order. First, if the maximum change between P1 and P2
on A exceeds ǫ, the detector detects the change with probability arbitrarily close
to 1 as the sample size goes to infinity. Similarly, if there is no change in Pi on A,
then the probability of false alarm also goes to zero. Notice that the decay rates
of the error probabilities are different when the two different distance measures
are used; from (2.6,2.7), the decay rate of false alarm probabilities for the detector
using φA is eight times that using dA.
Second, the above theorem provides a way of deciding the detection threshold
ǫ for a particular detection criterion. For example, the threshold (not necessarily
optimal) of the Neyman-Pearson detection for a given size α can be obtained from
the bounds on false alarm probabilities. Theorem 1 suggests that we should choose
25
(n, ǫ) such that
8(2n + 1)de−nǫ2/32 ≤ α for δdA (2.10)
2(2n + 1)de−nǫ2/4 ≤ α for δφA. (2.11)
Taking ǫ(n) to make the inequalities equal gives a threshold4
ǫ(n) =
√
32n
log 8(2n+1)d
αfor δdA
√
4n
log 2(2n+1)d
αfor δφA
(2.12)
We shall think of ǫ(n) as a measure of detector sensitivity. From (2.8,2.9)
in Theorem 1, we see that miss detection probability starts to drop exponen-
tially when ǫ(n) < dA(P1, P2) or ǫ(n) < φA(P1, P2). Thus, roughly, ǫ(n) is a
lower bound on the amount of changes in order for the change to be detected
with high probability. Furthermore, the smaller the ǫ(n), the larger the values of
[dA(P1, P2)− ǫ(n)]2/32 and [φA(P1, P2)− ǫ(n)]2/16, and the lower the upper bound
on miss detection probability. One should be cautioned that although the error
probabilities decay exponentially, the error exponents could be small, and thus a
large sample size may be required. For example, for d = 2 and ǫ = 0.1, 105 sample
points are required to guarantee a false alarm probability bounded by 5% for the
A-distance based detector. We can reduce the sample size to 104 by using the
detector based on the relative A-distance.
Third, note that the VC-dimension d of A has diminishing effects on the rate
of decay of error probabilities. The selection of A, however, may affect the error
exponent through dA or φA. Furthermore, the selection of A has a significant
impact on the complexity of practically implementable algorithms.
Finally, we should also note that, while we have stated the above theorem
4All the log in this thesis is natural logarithm.
26
under |Si| = n, the results generalized easily to the case when two collections have
difference sizes.
The consistency of the estimator is implied by the following theorem.
Theorem 2 Given probability spaces (Ω,F , Pi) (i = 1, 2) and a collection A ⊆ F
with finite VC-dimension, if A∗dA
∆= arg max
A∈A|P1(A) − P2(A)| is separated from the
rest of A in the sense that5
|P1(A∗dA
) − P2(A∗dA
)| − supB∈A\A∗
dA|P1(B) − P2(B)| > 0,
then A∗dA
(S1, S2) converges to A∗dA
in probability. Similar result holds for A∗φA
.
Proof: See Appendix 2.A.
2.4 Algorithms
We now turn our attention to practically implementable algorithms and their com-
plexities. The key step is to obtain test statistics within a finite number of oper-
ations, preferably with the complexity that scales well with the total number of
data points M = |S1
⋃
S2|.
Given sample points S = S1
⋃
S2 and a possibly infinite collection of sets A,
we need to reduce the search in A to a search in a finite collection H(S) ⊂ A, and
replace dA(S1, S2) by dH(S1, S2). If H is not chosen properly, such a reduction of
5If Pi’s are continuous, and A∗dA
can be approximated arbitrarily by other sets in A, thenthis condition will not be satisfied. In that case, we have results on the estimation performanceevaluated by the amount of change in the estimated set [20].
27
the search domain may lead to a loss of performance. Thus we need the notion of
completeness when choosing the search domain.
Definition 6 (Completeness) Given A being a collection of measurable subsets
of space Ω, and S ⊂ Ω be a set of points in Ω. Let H(S) ⊂ A be a finite sub-
collection of measurable sets which is a function of S. We call the collection H(S)
complete for S with respect to A if ∀A ∈ A, there exists a B ∈ H(S) such that
S ∩ A = S ∩ B.
The significance of the completeness is that, if H(S1∪S2) is complete w.r.t. A,
then dA(S1, S2) = dH(S1, S2) and φA(S1, S2) = φH(S1, S2).
For the choice of A, we consider regular geometric areas, e.g.,disks, rectangles,
and stripes. We present next six algorithms for different choices of A and sub-
collection H. We first present complete algorithms, i.e. the sub-collection H is
complete with respect to A. Next we give a couple of heuristic algorithms which
simplify the computation at the cost of a loss in completeness.
Hereinafter all sets defined are closed sets unless otherwise stated.
2.4.1 Complete Algorithms
Search in Planar Disks (SPD)
Let A be the collection of two dimensional disks. Let VC-d denote the VC-
dimension of a class. The following result is proved by [15]:
28
Proposition 1
VC-d(A) = 3.
For the set of sample points S ⊆ Ω, consider the finite sub-collection of A
defined by
HD(S)∆=
⋃
(si,sj ,sk)∈THD(si, sj, sk) (2.13)
where
T ∆= si, sj, sk ∈ S3 : si, sj, sk are not collinear,
and
HD(si, sj, sk)∆= D(si, sj, sk), D(si, sj, sk) \ si,
D(si, sj, sk) \ sj, . . . , D(si, sj, sk) \ si, sj, sk
where D(si, sj, sk) is the disk with si, sj, and sk on its boundary, i.e., HD(si, sj, sk)
is D(si, sj, sk) and all the 7 variations for excluding some of the 3 boundary points.
See Figure 2.1.
s1
s2
s3
s4s5
D(s1, s2, s3) ∈ HDD′(s4, s5) ∈ HCD
Figure 2.1: Members of HD and HCD; : sample point in S1, •: sample point inS2.
In [18] we have proved the following result:
29
Proposition 2 Let A be the collection of two dimensional disks. For S1 and S2
drawn from P1 and P2, if P1 and P2 are such that any set with Lebesgue measure 0
has probability 06, then the finite collection HD(S1 ∪ S2) in (2.13) is complete with
respect to A a.s.(almost surely).
With HD(S) defined above, the algorithm SPD(dA)—Search in Planar Disks
using distance metric dA—is given by
maxA∈HD
|S1(A) − S2(A)| .
Algorithm SPD(dA) includes three steps: (i) generating elements of HD; (ii)
computing∣
∣
∣
|S1∩A||S1| − |S2∩A|
|S2|
∣
∣
∣by counting |S1 ∩ A| and |S2 ∩ A| for every A ∈ HD,
and (iii) finding the maximum.
Algorithm SPD(φA) (Search in Planar Disks using the metric φA) is the same
as SPD(dA) except in step (ii) where the relative empirical measure is computed.
We now analyze the complexity of SPD. The complexities of both SPD(dA)
and SPD(φA) are O(M4) for sample size M = |S1 ∪ S2|. This is because there are
O(M3) disks to consider, and the counting of |S1 ∩ A| and |S2 ∩ A| for each disk
takes M steps.
Search in Axis-aligned Rectangles (SAR)
We now consider the collection A of axis-aligned rectangles. Then we have the
following property:
6This is true if P1,P2 are absolutely continuous, i.e., having pdf, because any measurablefunction has integration 0 on a 0-measure set.
30
Proposition 3
VC-d(A) = 4.
Proof: It is easy to see that VC-d(A) ≥ 4. See Fig. 2.2. The set s1, s2, s3, s4 is
x
y
xmin xmax
ymin
ymax
s1
s2
s3
s4
s5
Figure 2.2: The set s1, s2, s3, s4 is shatterable by axis-aligned rectangles, butthe set s1, s2, s3, s4, s5 is not.
shatterable by A.
For any set S of more than 4 points. Let xmin, xmax, ymin, ymax be the minimum
and the maximum x, y-coordinates for points in S, and let the points with these
coordinates be s1, s2, s3, s4 (some of them can be the same). Then any axis-aligned
rectangle containing s1, s2, s3, s4 contains S. The subset s1, s2, s3, s4 cannot
be obtained by shattering S with A, and S is not shatterable. Hence VC-d(A) ≤ 4.
¥
Given samples S1 and S2, let S = S1 ∪ S2 = (x1, y1), · · · , (xM , yM) where, at
the cost of O(M log M), we may assume that x1 ≤ x2 ≤ · · · ≤ xM . Let the finite
collection HR(S) be defined by
HR(S)∆= R(yi, yj, xm, xn) : (xk, yk) ∈ S, k = i, j,m, n (2.14)
31
where R(yi, yj, xm, xn) is the rectangle defined by the four lines y = yi, y = yj, x =
xm, x = xn. See Figure 2.3.
y1
y4
x2 x3
s1
s2
s3
s4R(y1, y4, x2, x3) ∈ HR
Figure 2.3: Members of HR; : sample point in S1, •: sample point in S2.
Proposition 4 Let A be the class of two dimensional axis-aligned rectangles.
Given S1 and S2, the finite collection HR(S1∪S2) in (2.14) is complete with respect
to A.
The reason for this proposition is that for any axis-aligned rectangle R and
given S, we can find axis-aligned rectangle R′ such that R′∩S = R∩S and R′ has
at least one sample point on each side of the boundary, where points on different
sides are not necessarily different. Since HR includes all those rectangles, it is
complete w.r.t. A.
Algorithm SAR(dA) computes dHR(S1, S2). Because of the ordering in xi’s, the
collection HR allows a recursive calculation of distance measures. Specifically, for
fixed yi and yj s.t. yi ≤ yj, define
fkij(n)
∆= |Sk ∩ R(yi, yj, x1, xn)|/|Sk|, k = 1, 2 (2.15)
Fij(n) = f 1ij(n) − f 2
ij(n) (2.16)
32
Then fkij(n) (n = 1, . . . ,M) can be computed recursively by
fkij(n) =
fkij(n − 1) + 1
|Sk| yn ∈ [yi, yj], (xn, yn) ∈ Sk
fkij(n − 1) otherwise
.
Then find
imax = arg maxn
Fij(n), imin = arg minn
Fij(n)
l∆= minimax, imin + 1, u
∆= maximax, imin
The optimal rectangle, for fixed yi and yj, is then given by R(yi, yj, xl, xu), and
the maximum difference in empirical probabilities is given by Fij(imax) − Fij(imin).
Finally, compute
dHR(S1, S2) = max
i,j:yi≤yj
(Fij(imax) − Fij(imin)).
The pair (i, j) that achieves this maximum gives the best rectangle in HR.
Algorithm SAR(φA) computes φHR(S1, S2). For fixed yi and yj (yi ≤ yj),
we compute f 1ij(n) and f 2
ij(n) for n = 1, . . . ,M as before. Compute empirical
probabilities for every pair xm < xn by
Sk(R(yi, yj, xm, xn)) = fkij(n) − fk
ij(m), k = 1, 2 (2.17)
Then optimizing over all the pairs of x’s and y’s
maxi,j,m,n:
yi≤yj,m<n
|S1(R(yi, yj, xm, xn)) − S2(R(yi, yj, xm, xn))|√
S1(R(yi,yj ,xm,xn))+S2(R(yi,yj ,xm,xn))
2
gives φHR(S1, S2) and the best rectangle.
We now analyze the complexity of Algorithm SAR. SAR(dA) has complexity
O(M3), and SAR(φA) has complexity O(M4). This is because in computing dA
33
we can use the fact that
maxm,n
|(f 1ij(n) − f 1
ij(m)) − (f 2ij(n) − f 2
ij(m))|
= maxm,n
|(f 1ij(n) − f 2
ij(n)) − (f 1ij(m) − f 2
ij(m))| (2.18)
= maxn
(f 1ij(n) − f 2
ij(n)) − minm
(f 1ij(m) − f 2
ij(m)) (2.19)
and reduce the two-variable optimization to two one-variable optimizations, which
are done in linear time. To compute φA, however, we have to check all the O(M2)
(xm, xn) pairs. The search is then repeated for all the O(M2) (yi, yj) pairs. Note
that the VC-dimension of the collection of axis-aligned rectangles is 4 while the
VC dimension of the collection of planar disks is 3, which results in a larger sample
size M for Algorithm SAR as we discuss later.
Search in Axis-aligned Stripes (SAS)
The complexities of algorithms SPD and SAR may be formidable for large M . This
urgent need of reducing complexity gives birth to a simplified algorithm that deals
with axis-aligned stripes. The basic idea is to project sample points onto x and y
coordinates, and then perform change detection/estimation on each coordinate.
Let A be the collection of vertical stripes, i.e., axis-aligned rectangles with
height equal to the field height. Similarly, let B be the collection of horizonal
stripes. The following property is true:
Proposition 5
VC-d(A ∪ B) = 4.
34
Proof: It is easy to see that VC-d(A∪B) ≥ 4. See Fig. 2.4. The set s1, s2, s3, s4
is shatterable by A ∪ B.
x
y
s1s2
s3
s4
Figure 2.4: The set s1, s2, s3, s4 is shatterable by A ∪ B.
For any set S of more than 4 points. Let sl, sr, su, so be the points with
the minimum and the maximum x, y-coordinates in S accordingly (not necessarily
different). Then any vertical stripe containing sl, sr contains S, and any horizonal
stripe containing su, so also contains S. The subset sl, sr, su, so cannot be
obtained by shattering S with A ∪ B, and thus S is not shatterable by A ∪ B.
Hence VC-d(A ∪ B) ≤ 4.
¥
Given a collection of sample points S = S1∪S2, consider finite subsets HV(S) ⊂
A and HH(S) ⊂ B defined by
HV(S)∆= V (xi, xj) : si = (xi, yi), sj = (xj, yj) ∈ S
(2.20)
HH(S)∆= H(yk, yl) : sk = (xk, yk), sl = (xl, yl) ∈ S
(2.21)
where V (xi, xj) is the vertical stripe with left and right boundary xi and xj, and
H(yk, yl) is the horizonal stripe with lower and upper boundary yk and yl. See
Figure 2.5.
35
si
sj
sk
sl
xi xj
yk
yl
H(yk, yl) ∈ HH
V (xi, xj) ∈ HV
Figure 2.5: Members of HV and HH; : sample point in S1, •: sample point in S2.
Proposition 6 Let A be the class of vertical stripes and B be the class of horizonal
stripes. Given S1 and S2, the finite collection HV(S1 ∪ S2) ∪ HH(S1 ∪ S2) defined
in (2.20) and (2.21) is complete with respect to A ∪ B.
The proposition is easy to verify because for any axis-aligned stripe, we can
find another axis-aligned stripe with the same intersection with S and at least one
sample point on each boundary. Thus it suffices to consider stripes with sample
points on the boundary.
Given S, Algorithm SAS(dA) performs the following search
maxA∈HV∪HH
|S1(A) − S2(A)| .
The algorithm includes the following steps: (i) project sample points onto x
and y coordinates; (ii) sort the projected sample points into increasing order; (iii)
in the x coordinate (we have x1 ≤ x2 ≤ · · · ≤ xM), for i = 1, . . . ,M , compute
fkx (i)
∆= Sk(V (0, xi)) (k = 1, 2) recursively by
fkx (i) =
fkx (i − 1) + 1
|Sk| if si ∈ Sk
fkx (i − 1) otherwise
, (2.22)
36
and then compute Fx(i)∆= f 1
x(i) − f 2x(i); compute Fy(j)
∆= S1(H(0, yj)) −
S2(H(0, yj)) similarly; (iv) find
m1 = arg maxi
Fx(i), m2 = arg mini
Fx(i).
n1 = arg maxj
Fy(j), n2 = arg minj
Fy(j).
We then have
maxA∈HV∪HH
|S1(A) − S2(A)|
= max(Fx(m1) − Fx(m2), Fy(n1) − Fy(n2)) (2.23)
and the estimation of the changed area is V (xm1 , xm2) if Fx(m1) − Fx(m2) >
Fy(n1) − Fy(n2), or H(yn1 , yn2) otherwise.
Algorithm SAS(φA) does the same in steps (i),(ii) and (iii), but (iv) is changed
to finding
φHV(S1, S2) = max
i,j:i<j
|S1(V (xi, xj)) − S2(V (xi, xj))|√
S1(V (xi,xj))+S2(V (xi,xj))
2
(2.24)
where Sk(V (xi, xj)) is given by fkx (j) − fk
x (i). φHH(S1, S2) is computed similarly.
Then
φHV∪HH(S1, S2) = max(φHV
(S1, S2), φHH(S1, S2))
and the changed area is the stripe on which the maximum is attained.
Now we analyze the complexities of Algorithm SAS(dA) and Algorithm
SAS(φA). Given M = |S1 ∪ S2|, the complexity of Algorithm SAS(dA) is
O(M log M). This is because by projection, we only need to perform two linear-
complexity searches. Now the dominating part is the sorting of sample points,
37
which takes O(M log M). The complexity of Algorithm SAS(φA) is O(M2) be-
cause in the two-variable optimization there are O(M2) (xi, xj) pairs to consider.
Search in Random Stripes (SRS)
Note that in Algorithm SAS the choice of x and y axes for projection is subjective,
and this choice should be part of algorithm design. When we know nothing about
the change, introducing randomness may give more robustness to the algorithms.
For θ randomly selected from [0, π2], chose Aθ to be the collection of vertical
stripes rotated (counter-clockwise) by θ, and Bθ to be the collection of horizonal
stripes rotated by θ. Define HθV(S) and Hθ
H(S) to be members of Aθ,Bθ accordingly,
with sample points on the boundary, which is similar to definitions (2.20,2.21).
We claim similar properties for Aθ ∪ Bθ and HθV(S) ∪ Hθ
H(S), i.e., VC-d(Aθ ∪
Bθ) = 4 and HθV(S) ∪ Hθ
H(S) is complete with respect to Aθ ∪ Bθ. Note that
introducing θ does not increase the VC-dimension to 5 because the projection
direction is randomly chosen but not optimized over.
Algorithm SRS is a randomized variation of Algorithm SAS. It is based on
the same projection and search idea as in Algorithm SAS. The difference is
when performing the projection, we project sample points onto random directions
instead of the fixed directions of x and y axes. The rest of the algorithm is the
same as Algorithm SAS.
Algorithm SRS has the same order of complexity as Algorithm SAS in
computing both dA and φA. The advantage of Algorithm SRS is that it is more
38
robust than Algorithm SAS. Specifically, as a randomized algorithm, SRS
will perform equally well under a wider range of change patterns (the way change
occurs) while SAS can be affected significantly by the change pattern. For
example, SAS is vulnerable to the pattern where changes always occur along a
tilted line of angle 45 or 135, because in that case the increasing and decreasing
parts of the change will largely get cancelled when projected onto axes.
A quick comment is in order. Both Algorithm SAS and Algorithm SRS can
be easily generalized to algorithms of multiple projections. By doing multiple
projections and line searches, we can increase the accuracy of the algorithm at the
cost of a constant factor increase in the complexity.
2.4.2 Heuristic Algorithms
Some complete algorithms may be good in performance but too expensive to im-
plement in practice, while the simplified complete algorithms SAS and SRS may be
not sensitive enough to detect the changes despite their improved complexities. A
trade-off is heuristic algorithms which have lower complexities than their complete
counterparts and perform reasonably well for certain classes of distributions.
Search in sample-Centered Disks (SCD)
In calculating the distances on HD in SPD, it is difficult to reuse the calculation
since sample-defined disks may overlap in arbitrary ways. We define here a dif-
ferent sub-collection in which disks form nested sets, which allows the recursive
computation of distances.
39
Let A be the collection of two dimensional disks. Given sample S = S1
⋃
S2,
HCD(S) ⊂ A is the sub-collection of sample-centered disks defined by
HCD(S)∆= D′(si, sj) : si, sj ∈ S (2.25)
where D′(si, sj) is the disk with si at the center and sj on the boundary. See
Figure 2.1.
Proposition 7
VC-d(HCD) = 2.
Proof:
It is easy to see that VC-d(HCD) ≥ 2 because any set of two points can be
shattered (a singleton also belongs to HCD).
For any set S of 3 points, i.e., S = s1, s2, s3. Let
|s1s2| = maxi, j∈1, 2, 3
|sisj|.
Then s1, s2 cannot be shattered (i.e., obtained by shattering) because the only
way to shatter it is by D′(s1, s2) or D′(s2, s1), but they both contain s3. Hence
any such S is not shatterable, and VC-d(HCD) ≤ 2.
¥
Unfortunately, HCD is not complete with respect to A. For some classes of
probability distributions, however, it turns out that SCD has the same performance
40
as SPD asymptotically. For example, if there exists some center point such that any
neighborhood around the center has reasonably high probability, SCD is expected
to perform almost as well as SPD. Generally, if probability measures Pi are such
that any disk with positive Lebesgue measure has positive probability, then the
loss of performance vanishes asymptotically. Consider a disk and an arbitrary
neighborhood of its center, the strong law of large numbers guarantees that as
sample size goes to infinity, there is a sample point within this neighborhood of the
center almost surely. This implies that as sample size goes to infinity, Algorithm
SCD will give the same output as Algorithm SPD, i.e., the search of SCD is
asymptotically complete.
Algorithm SCD(dA) computes
maxA∈HCD
|S1(A) − S2(A)| .
The presence of increasing subsets allows the counting procedure to be incremental,
i.e. fix a center and count the number of sample points recursively from the inner-
most disk to the outer-most disk.
Algorithm SCD(dA) does the following:
Fix a center si and define
Fi(j)∆= S1(D
′(si, sj)) − S2(D′(si, sj)) (2.26)
where Sk(D′(si, sj)), k ∈ 1, 2 is the empirical probability of D′(si, sj) in Sk. First
sort the sample points into increasing order sj1 , sj2 , . . . according to their distance
to si7 (sj1 = si), and then set Fi(j0) = 0 and compute Fi(jk) (k = 1, 2, . . . ,M)
7This sort is at the cost of O(M log M).
41
recursively by
Fi(jk) =
Fi(jk−1) + 1|S1| if sjk
∈ S1
Fi(jk−1) − 1|S2| if sjk
∈ S2
.
Next compute
j∗(i) = arg maxj
|Fi(j)|. (2.27)
The search is repeated for all possible si. Finally, we find the maximum among
|Fi(j∗(i))|,∀i, i.e.
imax = arg maxi
|Fi(j∗(i))|. (2.28)
Then the optimal disk in HCD for A-distance is given by D′(simax , sj∗(imax)), and
the maximum difference is
maxA∈HCD
|S1(A) − S2(A)| = |Fimax(j∗(imax))|.
Algorithm SCD(φA) computes
maxA∈HCD
|S1(A) − S2(A)|√
S1(A)+S2(A)2
.
Clearly when computing Fi(j), we can get S1(D′(si, sj)) and S2(D
′(si, sj)) by sim-
ilar update, so we can compute
Gi(j) =|S1(D
′(si, sj)) − S2(D′(si, sj))|
√
S1(D′(si,sj))+S2(D′(si,sj))
2
.
Then
maxA∈HCD
|S1(A) − S2(A)|√
S1(A)+S2(A)2
= maxi,j
Gi(j).
The complexities of Algorithm SCD(dA) and Algorithm SCD(φA) are of the
same order. Their complexity, compared with the O(M4) complexity of Algorithm
42
SPD, is reduced to O(M2 log M). The dominating term is the sorting of the
sample points according to their distances to a certain sample point, which takes
O(M log M) for each center, and is repeated for M centers.
Search in Diagonal-defined axis-aligned Rectangles (SDR)
Algorithm SDR is a heuristic simplification of Algorithm SAR. A major drawback
of Algorithm SAR is that it is much slower in computing φA distance (O(M4)
compared to O(M3) in computing dA). Aiming at reducing the cost of computing
φA for rectangles, we propose a simplified variation of SAR: Algorithm SDR.
Inspired by Kolmogorov-Smirnov two-sample test [13], we reduce the search to the
class of axis-aligned rectangles having sample points on diagonal vertices.
Let A be the collection of axis-aligned rectangles. Given sample S = S1 ∪ S2,
consider the following finite subset of A defined by
HDR(S)∆= R(yi, yj, xm, xn) : (xm, yi), (xn, yj) ∈ S
or (xm, yj), (xn, yi) ∈ S (2.29)
where R(yi, yj, xm, xn) is the axis-aligned rectangle defined as in (2.14). See
Fig. 2.6.
Proposition 8
VC-d(HDR) = 2.
Proof: It is easy to see that VC-d(HDR) ≥ 2 because any set of two points can be
shattered (a singleton also belongs to HDR).
43
s1
s2
xmxn
yi
yj
R(yi, yj, xm, xn)I
II
III
IV
Figure 2.6: Members of HDR; : sample point in S1, •: sample point in S2.
For any set S of 3 points, i.e., S = s1, s2, s3. If there is no set in HDR
containing S, then S is not shatterable. Otherwise, let s1, s2 be the points defining
such a set, i.e., the axis-aligned rectangle with diagonal vertices s1, s2 contains
S. Then s1, s2 cannot be shattered because the only way to shatter it is by
the axis-aligned rectangle with s1, s2 as diagonal vertices, but this rectangle also
contains s3. Hence VC-d(HDR) ≤ 2.
¥
HDR is not complete w.r.t. A. However, by the same argument as in Algorithm
SCD, we see that if the probability distributions are such that any disk with positive
measure has positive probability, the loss of performance vanishes as sample size
goes to infinity.
Algorithm SDR(dA) and Algorithm SDR(φA) share the following steps:
Initially, the algorithm builds two matrices C1 and C2 to store the empirical
cdf(cumulative distribution function) of S1 and S2. Specifically, assuming x1 ≤
44
x2 ≤ . . . ≤ xM , and y1 ≤ y2 ≤ . . . ≤ yM , define
Ck(j, i)∆= |Sk ∩ R(0, yj, 0, xi)|/|Sk|, k = 1, 2.
Construct C1 and C2 recursively:
(i) Sort S by the abscissa and ordinates respectively;
Define function δk : 1, . . . ,M → 0, 1, k = 1, 2,
δk(j) = 1 if the sensor with ordinate yj belongs to Sk.
Define function g : 1, . . . ,M → 1, . . . ,M,
g(j) = i if (xi, yj) ∈ S.
(ii) Compute the first row:
Ck(1,m) =δk(1)
|Sk|if m ≥ g(1) (2.30)
= 0 otherwise (2.31)
k ∈ 1, 2, m = 1, . . . ,M .
(iii) Compute the j-th row, j = 2, . . . ,M :
Ck(j,m) = Ck(j − 1,m) +δk(j)
|Sk|if m ≥ g(j) (2.32)
= Ck(j − 1,m) otherwise (2.33)
k ∈ 1, 2, m = 1, . . . ,M .
Then compute empirical probabilities for members of HDR: for every rectangle
45
R(yi, yj, xm, xn) ∈ HDR, i ≤ j,m ≤ n, its empirical probabilities are given by
Sk(R(yi, yj, xm, xn)) =
S ′k(R(yi, yj, xm, xn)) + δk(i)
|Sk| if (xm, yi) ∈ S
S ′k(R(yi, yj, xm, xn)) + δk(i)+δk(j)
|Sk| o.w.(2.34)
where
S ′k(R(yi, yj, xm, xn)) = Ck(j, n) − Ck(i, n)
−Ck(j,m) + Ck(i,m), (2.35)
k ∈ 1, 2. As seen in Fig. 2.6, the probability of the bold rectangle is the proba-
bility of I minus that of II, minus III, and plus IV, and we need the amendments
to take care of boundary points.
Then Algorithm SDR(dA) computes
maxR∈HDR
|S1(R) − S2(R)| ,
and Algorithm SDR(φA) computes
maxR∈HDR
|S1(R) − S2(R)|√
S1(R)+S2(R)2
.
Algorithm SDR(dA) and Algorithm SDR(φA) both have complexity O(M2) be-
cause constructing matrices C1 and C2 takes O(M2) steps and the search exhausts
the O(M2) rectangles in HDR. Note that this algorithm requires a substantial
amount of space: O(M2), which is due to the space to store C1 and C2.
46
2.5 Simulation
2.5.1 Simulation Setup
In the simulation, we consider the case when the distribution of collected sensors
is a mixture of 2D uniform distributions, one on an s × s square D and the other
centered at x0 ∈ D with radius r. Specifically, the PDF of the 2D random vector
x is given by
px0(x) =
pπr2p+(s2−πr2)q
x ∈ D, ||x − x0|| ≤ r
qπr2p+(s2−πr2)q
x ∈ D, ||x − x0|| > r
0 otherwise
where x0, p, q, and r are parameters, 0 < r << s and 0 ≤ q < p ≤ 1.
This model corresponds to the scenario when sensors are uniformly distributed
in D, and a sensor is alarmed with probability p if it is within distance r from x0 ∈
D and q if it falls outside this distance. If we view the disk x ∈ D : ||x−x0|| ≤ r
as the area where a noiseless sensor measurement should be “alarm” and the area
outside this disk be where a noiseless measurement should be “non-alarm”, then
1− p is the (uniform) miss detection probability and q is the (uniform) false alarm
probability at sensors.
Under hypothesis H0, two sets of sample points are drawn i.i.d. from the same
px0 ; under H1, one set of sample points are drawn from px0 , and the other set of
sample points are drawn independently from px′0
for some other center x′0.
47
2.5.2 Detector Sensitivity
We consider Neyman-Pearson detection with detector size α, and choose detection
threshold according to (2.12) to guarantee that the detector’s false alarm will not
exceed α.
Recalling that ǫ(n) measures detector sensitivity, we examine the relation be-
tween ǫ(n), the VC-dimension and the distance measure. Note that for fixed
false alarm, we need more sample points to achieve the same threshold for a
test searching in a class of larger VC-dimension. For searches in classes of the
same VC-dimension, the test using relative A-distance needs less sample points to
achieve the same threshold than the one using A-distance. See Fig.2.7.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4Theoretical Threshold w.r.t. Sample Size false alarm: α = 0.05
sample size
dete
ctio
n th
resh
old
dA: VC−d = 2
VC−d = 3 VC−d = 4 φ
A: VC−d = 2
VC−d = 3 VC−d = 4
Figure 2.7: Detection threshold as a function of the sample size for differentVC-dimension’s
Fig.2.8 shows that the detection threshold is not sensitive to the maximum
false alarm α. We see that given a certain sample size, a detector with a larger size
would not have a much smaller detection threshold. Hence increasing the sample
size is usually the only way to improve the accuracy of the detector.
48
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Theoretical Threshold w.r.t. False Alarm Probability VC−dimension = 2
size
dete
ctio
n th
resh
old
dA: n = 1000
n = 2000 n = 3000 φ
A: n = 1000
n = 2000 n = 3000
Figure 2.8: Detection threshold as a function of the detector size for differentsample sizes
2.5.3 Performance
We focus on miss detection in our Monte Carlo simulations. Fig. 2.9 and Fig. 2.10
show the miss detection probability vs. sample size. We observe that there is a
threshold sample size beyond which the miss detection probability drops sharply.
This can be explained using Theorem 1, which states that the upper bound on
miss detection probability begins to drop when ǫ(n) < dA(P1, P2) for δdA or ǫ(n) <
φA(P1, P2) for δφA, and once it starts to drop, it drops exponentially. A heuristic
argument on the minimum sample size would be that the sample size n should be
s.t.
ǫ(n) =
√
32
nlog
8(2n + 1)d
α≤ dA(P1, P2) for δdA
(2.36)
ǫ(n) =
√
4
nlog
2(2n + 1)d
α≤ φA(P1, P2) for δφA
(2.37)
If we know P1 and P2, we can calculate dA(P1, P2) and φA(P1,2 ) to obtain a
lower bound on n by solving the inequalities (2.36) and (2.37). An observation is
49
that this estimation is close to the minimum sample size required in the simulation.
For example, in our simulation setup, the estimated minimum sample sizes for
Algorithm SAS and SCD using A-distance metric are both 2725, and that for
SCD using relative A-distance metric is 53. As indicated in Fig. 2.9 and Fig. 2.10,
they all agree well to the sharp drop in missing detection probabilities.
1000 2000 3000 4000 5000 6000 7000 800010
−2
10−1
100
n
PM
Miss Detection vs. Sample Size size: α = 0.05
Algorithm SAS(dA)
Algorithm SRS(dA)
Algorithm SCD(dA)
Algorithm SDR(dA)
Figure 2.9: Miss detection probability of δdA as a function of the sample size:simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 1000Monte Carlo runs.
0 100 200 300 400 500 600 700 800 900 100010
−3
10−2
10−1
100
n
PM
Miss Detection vs. Sample Size size: α = 0.05
Algorithm SAS(φA)
Algorithm SRS(φA)
Algorithm SCD(φA)
Algorithm SDR(φA)
Figure 2.10: Miss detection probability of δφAas a function of the sample size:
simulation results. Here p = 0.98, q = 0.02, r = s/12. Use 10000Monte Carlo runs.
As expected, both threshold and miss detection probability are decreasing func-
tions of sample size, which reflects a trade-off between detection precision and
50
sampling time, energy consumed and data processing expense.
We also plot the detection probability w.r.t. the size of the detector. See
Fig. 2.11 and Fig. 2.12. The plot shows the detection probability does not increase
significantly with the increase of the detector size, which is expected because the
size affects detection probability only through the threshold, and the threshold is
not sensitive to the change of size (see Fig. 2.8).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.905
0.91
0.915
0.92
0.925
0.93
0.935
0.94
Detection Prob. vs. Detector Size: δ(dA) sample size = 3000
size
dete
ctio
n pr
obab
ility
Algorithm SASAlgorithm SRSAlgorithm SCDAlgorithm SDR
Figure 2.11: Detection probability of δdA as a function of detector size, 1000Monte Carlo runs.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
Detection Prob. vs. Detector Size: δ(φA) sample size = 100
size
dete
ctio
n pr
obab
ility
Algorithm SASAlgorithm SRSAlgorithm SCDAlgorithm SDR
Figure 2.12: Detection probability of δφAas a function of detector size, 10000
Monte Carlo runs.
Note that by choosing the threshold from the upper bound in (2.38) and (2.41),
we only guarantee the false alarm is upper bounded by α. Our simulation shows
51
the actual false alarm probability can be much less than the size of the detector8,
which implies that the theoretical threshold is a loose upper bound of the actual
minimum threshold needed to guarantee the required detector size. This is be-
cause of the nonparametric nature of the theoretical threshold. This threshold is
proved to satisfy the size constraint under arbitrary distributions by the Vapnik-
Chervonenkis Theory. Therefore for a given distribution, this threshold may be
loose.
For comparison among the algorithms, an obvious observation is that δφAout-
performs δdA in detection probability. This is because on one hand, given n and α,
using (2.36,2.37) to choose threshold yields that ǫ(n) for φA is 1/2√
2 smaller than
that for dA; on the other hand, we have φA(S1, S2) ≥ dA(S1, S2). Therefore in our
simulation it is easier for algorithms using statistic φA(S1, S2) to detect a change.
However, this is caused by the specific way to decide the detection threshold, and
does not imply that δφAis uniformly better than δdA .
An intuitive guideline in algorithm design is that the better sets in A separate
the probability mass in P1 and P2 and the simpler A is, the better the detector
performance is, e.g. Algorithm SCD performs better than Algorithm SAS and
SRS. Moreover, we can introduce random factors into the algorithm to make it
more robust, e.g. we randomize SAS to be SRS so as to make it independent of
the direction in which change occurs.
8For example, in our simulation of Algorithm SAS and SRS, for sample size up to 10, 000using 1000 Monte Carlo runs, we encounter no false alarm at all.
52
2.6 Extension to Finite-level Sensor Measurements
We have presented our results based on collecting sensor locations of sensors with
the same report (i.e., “alarm”). Extension can be made to applications with finite-
level sensor measurements.
Without loss of generality, let each sensor report either it is alarmed (say,
measurement level 1) or it is not alarmed (level 0). In such a case, the ith data
collection is modelled by probability space (Ω×0, 1,F , Pi) where F is a σ-field on
Ω× 0, 1. Let random variable x ∈ Ω denote the sensor location, and L ∈ 0, 1
denote the sensor report. In the ith collection, (x, L) has joint distribution Pi, and
the location of alarmed sensors has conditional distribution Pi|L=1. It is easy to
see that there are cases when Pi changes but Pi|L=1 does not. Hence by collecting
both types of sensor reports, we are able to detect a wider range of changes.
To apply the algorithms presented previously, choose class A′ to be the collec-
tion of sets from A in either 0-plane or 1-plane, i.e., A′ = A×0, 1. For instance,
the collection of planar disks becomes the collection of planar disks with either
measurement 0 or measurement 1. Algorithms should be applied to both 0-plane
and 1-plane and we choose the larger as the test statistics dA(S1, S2) or φA(S1, S2).
The detection and estimation performance guarantee still holds, but note that the
sample size now becomes the total number of sensor reports collected (rather than
the number of alarms collected). Note that the VC-dimension of such a class A′
remains the same as that of A:
Proposition 9 For a class A of planar sets,
VC-d(A× 0, 1) = VC-d(A).
53
Proof:
It is easy to see that VC-d(A× 0, 1) ≥VC-d(A).
For any set S, if S contains points from different planes, S is not shatterable
because no set in A × 0, 1 contains points from different planes. If S only
contains points in one plane, it is shatterable only if |S| ≤ VC-d(A). Therefore,
VC-d(A× 0, 1) ≤VC-d(A).
¥
2.7 Summary
We have presented in this chapter a nonparametric approach to the detection of
changes in the distribution of alarmed sensors. We have provided exponential
bounds for the miss detection and false alarm probabilities. The error exponents
of these probabilities provide useful guideline for determining the number of sample
points required.
We have also proposed several nonparametric change detection and estimation
algorithms. Here we have aimed at reducing the computation complexity while
preserving the theoretical performance guarantee by using recursive search strate-
gies that reuse earlier computations, which gives us two near linear-complexity
algorithms SAS and SRS. The more expensive algorithms SCD and SDR also have
their roles, despite their near square cost, especially in detecting changes of highly
clustered distributions. This is because the search classes in Algorithm SCD and
SDR may yield larger distance than the more simplified classes, which in turn gives
54
larger error exponents as indicated in Theorem 1. Moreover, Algorithm SCD is
much more efficient than the exhaustive algorithm SPD with complexity O(M4),
and Algorithm SDR also improves the complexity of its exhaustive counterpart
Algorithm SAR significantly. Complexities of different algorithms presented so far
are summed up in the following table.
Table 2.1: Time Complexity ComparisondA φA
SPD O(M4) O(M4)SCD O(M2 log M) O(M2 log M)SAR O(M3) O(M4)SDR O(M2) O(M2)SAS O(M log M) O(M2)SRS O(M log M) O(M2)
Besides running time, one may also care about the amount of storage used for
executing the algorithms. Obviously O(M) space is needed to store S1 and S2,
and the extra space needed scales as follows:
Table 2.2: Space Complexity ComparisondA φA
SPD O(1) O(1)SCD O(1) O(1)SAR O(1) O(M)SDR O(M2) O(M2)SAS O(1) O(M)SRS O(1) O(M)
Comparing these tables, one can see the time-space trade-off in algorithm de-
sign. For example, although Algorithm SDR has comparable running time with
Algorithm SCD, it requires much more space to execute, i.e. O(M2) instead of
O(1). The choice of algorithm should be a trade-off between running time, space
requirement and detection performance, with the significance of each highly de-
pendent on applications.
55
One should be further cautioned that the techniques considered in this chapter
typically require a large number of sample points. Since no information about the
distribution is used, and the performance guarantee must hold for all distributions,
bounds derived here are conservative. While we have adhered to the principle
of nonparametric approach, the incorporation of certain prior knowledge about
the distribution, in the selection of A for example, would lead to more effective
detection and estimation schemes in practice.
56
APPENDIX 2.A
PROOF OF CHAPTER 2
Proof of Theorem 1
We first prove the theorem for detectors using the A-distance metric dA(S1, S2) =
supA∈A |S1(A) − S2(A)|. From [17], we have
Pr∃A ∈ A, ||P1(A) − P2(A)| − |S1(A) − S2(A)|| > ǫ
≤ 8(2n + 1)de−nǫ2/32 (2.38)
Under H0, P1 = P2, and the false alarm probability satisfies
PF (δ) = PrdA(S1, S2) > ǫ;H0
= Pr∃A ∈ A, |S1(A) − S2(A)| > ǫ;H0
= Pr∃A ∈ A, ||P1(A) − P2(A)|
−|S1(A) − S2(A)|| > ǫ;H0
≤ 8(2n + 1)de−nǫ2/32 (2.39)
where inequality (2.39) follows from (2.38).
57
For the miss probability, let A∗ = arg maxA∈A |P1(A) − P2(A)|.
PM(δ, P1, P2) = PrdA(S1, S2) ≤ ǫ; P1, P2
≤ Pr|S1(A∗) − S2(A
∗)| ≤ ǫ; P1, P2
≤ Pr||P1(A∗) − P2(A
∗)|
−|S1(A∗) − S2(A
∗)||
≥ ||P1(A∗) − P2(A
∗)| − ǫ| ; P1, P2
≤ 8(2n + 1)de−n[|P1(A∗)−P2(A∗)|−ǫ]2/32
(2.40)
Now consider relative distance. The proof for relative distance metric goes line
by line as that for the non-relative metric, replacing inequality (2.38) with the
following results from [17],
P 2n(φA(S1, S2) > ǫ) ≤ 2(2n + 1)de−nǫ2/4 (2.41)
P 2n[|φA(P1, P2) − φA(S1, S2)| > ǫ]
≤ 16(2n + 1)de−nǫ2/16 (2.42)
We have
PF (δ) ≤ 2(2n + 1)de−nǫ2/4 (2.43)
PM(δ, P1, P2) ≤ 16(2n + 1)de−n[φA(P1,P2)−ǫ]2/16 (2.44)
¥
Proof of Theorem 2
Let V C-d(A) = d < ∞. We first prove the theorem for A-distance.
58
Let
A∗dA
= arg maxB∈A
|P1(B) − P2(B)|,
and define η to be
η∆= |P1(A
∗dA
) − P2(A∗dA
)| − supB∈A
B 6=A∗dA
|P1(B) − P2(B)|.
Then η > 0.
By results of [17], we have
PrsupB∈A
||P1(B) − P2(B)| − |S1(B) − S2(B)|| ≤ η
3
≥ 1 − 8(2n + 1)de−nη2/288.
So with probability ≥ 1 − 8(2n + 1)de−nη2/288,
|S1(A∗dA
) − S2(A∗dA
)| − supB∈A
B 6=A∗dA
|S1(B) − S2(B)|
≥ |P1(A∗dA
) − P2(A∗dA
)| − supB∈A
B 6=A∗dA
|P1(B) − P2(B)|
−||S1(A∗dA
) − S2(A∗dA
)| − |P1(A∗dA
) − P2(A∗dA
)||
−| supB∈A
B 6=A∗dA
|S1(B) − S2(B)| − supB∈A
B 6=A∗dA
|P1(B) − P2(B)|| (2.45)
≥ η − 2 supB∈A
||P1(B) − P2(B)| − |S1(B) − S2(B)|| (2.46)
≥ η3
(2.47)
That is,
PrA∗dA
= arg maxB∈A
|S1(B) − S2(B)| ≥ 1 − 8(2n + 1)de−nη2/288.
As n → ∞, we see that
limn→∞
PrA∗dA
= arg maxB∈A
|S1(B) − S2(B)| = 1.
59
For relative A-distance, let
A∗φA
= arg maxB∈A
|P1(B) − P2(B)|√
P1(B)+P2(B)2
.
Let
η∆= fφ(P1(A
∗φA
), P2(A∗φA
)) − supB∈A
B 6=A∗φA
fφ(P1(B), P2(B)).
Then η > 0.
In [2] it was proved that fφ(x, y) is a metric on [0, 1]. The rest of the proof is
similar to that of A-distance. By [17] we have
Pr(supB∈A
fφ(Si(B), Pi(B)) ≤ η
5) ≥ 1 − 8(2n + 1)de−nη2/100,
i = 1, 2.
Thus with probability ≥ [1 − 8(2n + 1)de−nη2/100]2, we have
fφ(S1(A∗φA
), S2(A∗φA
)) − supB∈A
B 6=A∗φA
fφ(S1(B), S2(B))
≥ fφ(P1(A∗φA
), P2(A∗φA
)) − supB∈A
B 6=A∗φA
fφ(P1(B), P2(B))
−|fφ(P1(A∗φA
), P2(A∗φA
)) − fφ(S1(A∗φA
), S2(A∗φA
))|
−| supB∈A
B 6=A∗φA
fφ(P1(B), P2(B)) − supB∈A
B 6=A∗φA
fφ(S1(B), S2(B))| (2.48)
≥ η − fφ(P1(A∗φA
), S1(A∗φA
)) − fφ(P2(A∗φA
), S2(A∗φA
))
− supB∈A
B 6=A∗φA
fφ(P1(B), S1(B)) − supB∈A
B 6=A∗φA
fφ(P2(B), S2(B)) (2.49)
≥ η − 2 supB∈A
fφ(P1(B), S1(B)) − 2 supB∈A
fφ(P2(B), S2(B))
(2.50)
≥ η5
(2.51)
60
That is,
PrA∗φA
= arg maxB∈A
|S1(B) − S2(B)|√
S1(B)+S2(B)2
≥ [1 − 8(2n + 1)de−nη2/100]2.
Letting n → ∞ completes the proof.
¥
61
Chapter 3
Detecting Information Flows Without
Chaff Noise
3.1 Outline
This chapter addresses centralized detection of information flows without chaff
noise. The detection procedure is decomposed into pairwise detection of 2-hop
flows. Section 3.2 gives a mathematical definition of the problem. Section 3.3
presents a packet matching algorithm for detecting information flows with bounded
delay. Section 3.4 presents a variation-based algorithm for detecting information
flows with bounded memory. Section 3.5 compares the performance of the pro-
posed algorithms with existing algorithms. Section 3.6 verifies the performance by
simulations. Section 3.7 concludes the chapter with remarks on the application of
the proposed detection schemes.
3.2 Problem Formulation
3.2.1 Notations
For the ease of presentation, we use the convention that boldface letters denote
vectors, plain letters denote scalars, uppercase letters denote random variables (or
stochastic processes), and lowercase letters denote realizations. For example, we
denote a point process by S, its realization by s, the kth epoch in S by S(k),
62
and the kth epoch in s by s(k). Given a realization s, we use the script letter S
to denote the set of elements in this realization. Given two realizations of point
processes (a1, a2, . . .) and (b1, b2, . . .),⊕
is the superposition operator defined as
(ak)∞k=1
⊕
(bk)∞k=1 = (ck)
∞k=1, where c1 ≤ c2 ≤ . . . and ak∞k=1 ∪ bk∞k=1 = ck∞k=1.
3.2.2 Flow Models
Consider two nodes of interest, denoted by A and B, in a wireless sensor network,
as illustrated in Fig. 3.1. Let the transmission epochs of each node be represented
by a point process
Si = (Si(1), Si(2), Si(3), . . .), i = 1, 2,
where Si(k) (k ≥ 1) is the kth transmission epoch1 of the node (let i = 1 for node
A, and i = 2 for node B).
A B
S1 : S2 :
Figure 3.1: Detecting information flows through nodes A and B by analyzingtheir transmission activities S1 and S2.
If A and B are carrying an information flow and not involved in other trans-
missions, then (S1, S2) needs to satisfy the following definition.
Definition 7 A pair of processes (F1, F2) is an information flow if for every
realization, there exists a bijection g : F1 → F2 such that g(s) − s ≥ 0 for all
1Assume no simultaneous transmissions.
63
s ∈ F1. For an information flow with bounded delay ∆, g(s) − s ≤ ∆ for all
s ∈ F1. For an information flow with bounded memory M , g satisfies that
0 ≤ |F1 ∩ [0, t]| − |F2 ∩ [0, t]| ≤ M, ∀t ≥ 0. (3.1)
The bijection g is a mapping between the transmission and the relay epochs of
the same packets at nodes A and B, allowing permutation of order during the relay.
The condition that g is a bijection imposes a packet-conservation constraint, i.e.,
no packets are generated or dropped at the relay node. The condition g(s)− s ≥ 0
is the causality constraint, which means that a packet cannot leave a node before
it arrives. Communication constraints are additional constraints on g which are
imposed by the requirement of reliable communications. We consider two types of
commonly encountered constraints: bounded delay constraint and bounded mem-
ory constraint. The condition g(s) − s ≤ ∆ is a bounded delay constraint which
implies that the maximum delay at the relay node is uniformly bounded by ∆.
This condition was first proposed by Donoho et al. in [11]. The condition in (3.1)
is a bounded memory constraint which implies that the relay node has a limited
memory that can store at most M relay packets2. The values of ∆ and M are
assumed to be known.
There is a natural correlation between the bounded delay model and the
bounded memory model by Little’s Theorem [4]. If M = λ∆, where λ is the
rate of information flow, then in the bounded delay model, the maximum delay is
bounded by ∆ and the average memory size by M , whereas in the bounded mem-
ory model, the maximum memory size is bounded by M and the average delay by
∆. The two models are fundamentally different. It has been shown that the two
2Similar requirement on buffer size has been considered by Giles and Hajek in the context oftiming channels [14].
64
models have very different scaling behavior on the mutual information between F1
and F2 [14]. In Section 3.5.2, we will show that they also have difference detection
performance with respect to changes in traffic rate.
3.2.3 Hypotheses
We want to test the following hypotheses:
H0 : S1 and S2 are independent,
H1 : (S1, S2) = (F1, F2)
by observing3 Si (i = 1, 2) for a finite time t (t > 0), where (F1, F2) is an infor-
mation flow (with bounded delay or memory). This is a partially nonparametric
hypothesis testing problem. No statistical assumptions are made for Fi (i = 1, 2)
although under H0, Si (i = 1, 2) are assumed to be Poisson processes in the
analysis.
3.3 Detecting Information Flows with Bounded Delay
In this section, we consider detecting information flows with bounded delay. We
propose a linear-time, packet matching-based algorithm, called “Detect-Match”
(DM).
Given measurements (s1, s2), we want to match the packets in s1 with their
possible relays in s2 subject to the delay bound ∆. Under H1, there must be at
least one way of matching packets that satisfies the causality and the bounded
3Note that under H1, the detector may not observe the beginning of the information flow.
65
delay constraints, i.e., the matching induced by the mapping g. For pairs of
independent processes, however, such matching is not always possible. Algorithm
DM is designed based on this observation. Note, however, that if we exhaustively
search all the matchings, the complexity of the algorithm will be exponential4.
Instead of looking for arbitrary matching, we prove that it suffices to search for
matchings which preserve the order of incoming packets, as stated in the following
proposition.
Proposition 10 If matching sk with s′k (k ≥ 1) satisfies the causality and the
bounded delay constraints, then there exists a permutation (jk)∞k=1 such that match-
ing sk with s′jkalso satisfies the above constraints, and this matching is order-
preserving, i.e., sk ≤ sl if and only if s′jk≤ s′jl
for all (k, l).
Proof: As illustrated in Fig. 3.2, if matching sk with s′k (k = 1, 2) satisfies the
causality and the bounded delay constraints, then we can match s1 with s′2 and
s2 with s′1 such that the matching still satisfies the constraints, but the order of
packets is preserved. By induction on the number of out-of-order pairs, we can
reorder matchings of any length into a matching that satisfies the constraints and
is order-preserving.
¥
By Proposition 10, it suffices to only consider the matchings that preserve the
order of packets, which reduces the problem to finding the match of the first packet
4For example, if there are at most L transmissions during time ∆, then the exhaustive searchfor a length-n matching has complexity O(Ln).
66
s1
s2
s1 s2
s′2 s′1Figure 3.2: Both the solid and the dotted lines denote matchings that are causal
and bounded in delay, but the dotted lines also preserve the order ofincoming packets.
in s1. Based on this idea, we develop the following detector5:
δDM(s1, s2) =
1 if ∃ m ∈ [l, u] s.t. s2(k + m − 1) − s1(k) ∈ [0, ∆] for
k = 1, . . . , n,
0 o.w.,
where n = |S1|, s2(l) is the first epoch in s2 after s1(1), and s2(u) is the last
epoch in s2 before s1(1) + ∆ (including boundaries). See Appendix 3.B for its
implementation.
Now we analyze the performance of DM. We claim that DM can detect all the
information flows with bounded delay ∆, i.e., there is no miss detection. Specif-
ically, we have shown by Proposition 10 that an information flow with bounded
delay ∆ must have an order-preserving matching which satisfies the causality and
the bounded delay constraints. Since the match of the first packet in s1 must be
in the interval [s1(1), s1(1) + ∆], which is equivalent to m ∈ [l, u] (as illustrated
in Fig. 3.3), DM must be able to detect such flows.
Next we examine the false alarm probability of DM.
Theorem 3 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,
5We use the convention that the detector gives the value 1 for H1 and 0 for H0.
67
s1
s2
s1(1)
∆
s2(u)
s2(l)
Figure 3.3: Finding the match of s1(1): there are three candidates in the∆-length interval following s1(1).
respectively, then the false alarm probability of DM satisfies
PF (δDM) ≤ γn−1,
where γ = 1 − e−λ1λ2∆/(λ1+λ2).
Proof: See Appendix 3.A.
¥
Remark: Theorem 3 gives a few insights into the problem. Since γ ≤ 1 −
e−min(λ1, λ2)∆, we have γ → 0 if min(λ1, λ2) → 0, i.e., DM almost never falsely
detects slow independent traffic. Intuitively, it is easier to match two processes of
equal rate. This intuition is strengthened by Theorem 3 because γ ≤ 1 − e−λ∆/2,
where λ = max(λ1, λ2), and thus the upper bound for equal rate is larger.
In Neyman-Pearson framework, we can estimate the sample size required by
DM to achieve a given false alarm probability α by calculating the value n that
makes the upper bound in Theorem 3 equal to α, that is,
n = log α/ log γ + 1.
68
For example, if λ1 = λ2 = 1, and ∆ = 10, then a match length 682 suffices to
guarantee a false alarm probability bounded by 1%. Note that for this match
length, DM needs up to 2n + λ2∆ = 1374 packets on the average to find a valid
match.
3.4 Detecting Information Flows with Bounded Memory
In this section, we consider the problem of detecting information flows when the
relay node has limited memory. Specifically, assuming that the node’s memory
can hold at most M packets, we use the property that the difference between the
number of incoming packets and the number of departure packets never exceeds
M during any period of time. Based on this property, we derive a counting-based
algorithm—“Detect-Maximum-Variation” (DMV).
A few definitions are needed to present the algorithm. Given realizations si
(i = 1, 2), let (sw)w≥1 = s1
⊕
s2. Let ni(w) (i = 1, 2) be the number of packets in
si when the total number of packets is w, i.e.,
ni(w)∆=
w∑
j=1
Isj∈Si,
where I· is the indicator function. Sample paths of n1(w) and n2(w) are illustrated
in Fig. 3.4. (a). Define the cumulative difference between s1 and s2 as
d(w)∆= n1(w) − n2(w),
and let the maximum variation of d(w) be
v(w)∆= max
1≤i≤wd(i) − min
1≤i≤wd(i).
See Fig. 3.4. (b) for an illustration of d(w) and v(w).
69
(a) (b)w w0
0
d(w)
n1(w)
n2(w)
v(w)
Figure 3.4: (a) the cumulative counting functions ni(w) (i = 1, 2); (b) thecumulative difference d(w) and the maximum variation v(w).
If (s1, s2) is a realization of an information flow with bounded memory, then
the sample path of d(w) will have bounded variation. Specifically, note that
v(w) = max1≤i≤j≤w
|d(j) − d(i)|
= max1≤i≤j≤w
|(n1(j) − n1(i)) − (n2(j) − n2(i))|,
where |(n1(j) − n1(i)) − (n2(j) − n2(i))| is the difference in the numbers of pack-
ets in s1 and s2 between the ith and the jth packets. For memory bound M ,
this difference is bounded by M , i.e., v(w) ≤ M, ∀w. Algorithm DMV detects
information flows with bounded memory based on the maximum variation. The
detector is defined as follows:
δDMV(s1, s2) =
1 if v(n) ≤ M,
0 o.w.,
where n = |S1| + |S2|. An implementation of the detector can be found in Ap-
pendix 3.B.
Since any information flow with bounded memory M will be detected after n
packets, i.e., miss detection is totally avoided, we only need to take care of the
false alarm probability, which is bounded as follows.
Theorem 4 For independent Poisson processes, the false alarm probability of
DMV satisfies
PF (δDMV) ≤ (M + 1)
1 − ρρn,
70
where ρ = cos πM+2
. Furthermore, if the two processes have the same rate, then the
upper bound is tight with respect to the error exponent, i.e.,
limn→∞
− 1
nlog PF (δDMV) = − log ρ.
Proof: See Appendix 3.A.
¥
Remarks: For a given false alarm constraint α, we can guarantee the satis-
faction of this constraint by making the upper bound in Theorem 4 equal to α,
yielding a sample size
n =log α(1 − ρ) − log (M + 1)
log ρ(3.2)
which grows as O(M2 log Mα
) as M → ∞ and α → 0. For example, if M = 20, (3.2)
says that using 1196 packets will guarantee a false alarm probability no greater
than 1%.
3.5 Comparing the Algorithms
We have introduced detection algorithms under the bounded delay and the
bounded memory models. In practice, the information flows of interest may sat-
isfy the assumptions of more than one detection algorithm. This section aims at
comparing the performance of different algorithms when they are both applicable.
71
3.5.1 DMV vs. DA
Blum et al. [5] consider the detection of information flows that satisfy both the
bounded delay and the bounded peak rate conditions. The underlying idea is that
in interactive information flows, usually not only is the delay bounded, but the peak
rate at which packets are issued is also bounded. Specifically, consider information
flows in which the maximum delay is bounded by ∆, and the maximum number
of arrivals within time t (t ≥ 0) is L(t). The second condition is referred to as the
bounded peak rate condition. The formal definition of such information flows is as
follows.
Definition 8 A pair of processes (F1, F2) is an information flow with bounded
delay ∆ and bounded peak rate L(·) if it is an information flow with bounded delay
∆, and for every realization, |F1 ∩ [s, t]| ≤ L(t − s) for all 0 ≤ s ≤ t.
Information flows with bounded delay and bounded peak rate always have
bounded memory, as stated in the following proposition.
Proposition 11 Given a realization (f1, f2) of an information flow with bounded
delay and bounded peak rate, let ni(a, b) (i = 1, 2) be the number of packets in fi
in the interval [a, b] (a ≤ b). Then
|n1(a, b) − n2(a, b)| ≤ L(∆), ∀a ≤ b.
Proof: See Appendix 3.A.
¥
72
By Proposition 11, we see that information flows with bounded delay and
bounded peak rate are also information flows with bounded memory, where the
memory bound is L(∆). Note that the inverse is not true, i.e., bounded delay and
bounded memory do not imply bounded peak rate.
For information flows with bounded delay and bounded peak rate, Blum et al.
in [5] propose a detection algorithm called “Detect-Attacks” (DA). Algorithm DA
merges the observations and divides the results into groups of 2(M + 1)2 packets.
Then it computes the cumulative differences in each group (starting from zero in
every group). The algorithm returns H0 if there exists a group with cumulative
difference greater than M . The detector is defined as
δDA(s1, s2) =
n/(2(M+1)2)∏
k=1
δ(k)DA (s1, s2),
where
δ(k)DA (s1, s2) =
1 if max0≤w≤2(M+1)2
|d(k)(w)| ≤ M,
0 o.w.,
where d(k)(w) (w = 0, . . . , 2(M +1)2) is the cumulative difference for the kth group
(d(k)(0) = 0).
Blum et al. showed that DA has no miss detection. Moreover, they proved
that 2(M + 1)2 log 1α
packets are needed to guarantee a false alarm probability no
more than α. We are more interested in the asymptotic detection performance in
terms of error exponents. Since [5] does not compute the error exponent for the
false alarm probability of DA, we introduce the following lemma.
Lemma 1 For independent Poisson processes,
Pr maxi∈1,..., m
|d(i)| ≤ M ≤ σm
1 − σ,
73
and when m is large enough,
Pr maxi∈1,..., m
|d(i)| ≤ M ≥ Kσm,
where σ = cos π2(M+1)
, and K =sin π
2(M+1)
2(M+1)(1−σ).
Proof: See Appendix 3.A.
¥
If M is large, we can apply Lemma 1 to each group of 2(M + 1)2 epochs to
obtain the upper and the lower bounds on the false alarm probability of that group.
Moreover, it was proved in [5] that the single group false alarm probability is upper
bounded by 12. Hence the false alarm probability of one group is upper bounded
by
min
(
σ2(M+1)2
1 − σ,
1
2
)
=
2+√
216
if M = 1,
12
if M ≥ 2.
Algorithm DA has a false alarm if all the n/ [2(M + 1)2] groups have false alarms6.
Thus for large M , the total false alarm probability satisfies
(
K1
2(M+1)2 σ)n
≤ PF (δDA) ≤(
1
2
)n
2(M+1)2
. (3.3)
Therefore, for large M , the false-alarm error exponent of DA is at most
− log (K1
2(M+1)2 σ) and at least log 2/(2(M + 1)2).
Now we compare DA with DMV under the assumptions of bounded delay and
bounded peak rate. Note that since we have shown in Proposition 11 that such
information flows satisfy the bounded memory condition, DMV also has no miss
detection. It remains to compare their false alarm probabilities.
6In DA, the sample size n is always a multiple of 2(M + 1)2.
74
We first point out that under H0, DMV always outperforms DA for any real-
ization. The reasons are that v(w) ≥ max1≤i≤w
|d(i)| (see Fig. 3.5) within a group, and
DA restarts computation from zero at the beginning of each group, whereas DMV
keeps increasing v(w) across groups. Therefore, for every realization, if DMV has
a false alarm, DA must have a false alarm too.
w
d(w)
0
max1≤i≤w
|d(i)|v(w)
Figure 3.5: The statistic of DA is no larger than that of DMV.
Next we compare their false alarm probabilities. In particular, we are interested
in whose false alarm probability has a larger error exponent. From Theorem 4 and
(3.3), we see that the false-alarm error exponent of DMV is − log ρ, whereas that
of DA is at most − log (K1
2(M+1)2 σ). By Taylor expansion of the error exponents,
we have that as M → ∞,
− log ρ =π2
2(M + 2)2+ o
(
1
M2
)
,
− log (K1
2(M+1)2 σ) =π2
4+ log π
2
2(M + 1)2+ o
(
1
M2
)
.
Therefore, for large M , the false-alarm error exponent of DMV is at least 3.38
times larger than that of DA.
3.5.2 DM vs. DMV
For information flows with bounded memory and bounded delay, both DMV and
DM are applicable. We are interested in which algorithm performs better asymp-
75
totically. Note that we need to give DMV and DM the same sample size to make a
fair comparison. If we define sample size as the total number of observed packets,
then by Theorems 3 and 4, we see that for Poisson processes of equal rate λ, DM
is preferable if γ ≤ ρ2, i.e.,
λ ≤ − 4
∆log
(
sinπ
M + 2
)
. (3.4)
Otherwise, DMV is preferable. For example, for M = 40, and ∆ = 10, the
threshold is λ ≤ 1.0375. This threshold phenomenon has an intuitive explanation.
Algorithm DMV only uses the rank statistics, so it does not depend on the rate of
the traffic; on the other hand, DM performs better on lighter traffic and worse on
heavier traffic. The reason for the latter is that if we normalize the maximum delay
with the average interarrival time, then the normalized delay bound λ∆ clearly
satisfies that λ∆ → 0 as λ → 0 and λ∆ → ∞ as λ → ∞, which implies that
for extremely light traffic, almost perfect synchrony is required to raise an alarm,
whereas for extremely heavy traffic, the delay constraint is essentially removed,
causing DM to always raise alarms. Therefore, when the traffic rate is sufficiently
low, DM outperforms DMV, and otherwise DMV outperforms DM.
The comparison suggests that the bounded memory condition is more informa-
tive than the bounded delay condition for λ∆ > 4 log ((M + 2)/π). Since the right
hand side merely grows as log M , the memory bound can be advantageous even
for modest rate and large memory. For example, for M = 106 packets, ∆ = 10
seconds, we only need λ > 5.1 packets per second for the bound memory condition
to outweigh the bounded delay condition.
76
3.6 Numerical Results
We simulate our algorithms on synthetic data to verify their performance. Since all
the algorithms (DA, DM, and DMV) are free of miss detection, we only simulate
the false alarm probabilities. We generate measurements by independent Poisson
processes of equal rate. We let M = 40 packets, ∆ = 10 seconds, and vary
the sample size (i.e., the total number of packets in both s1 and s2) between
2500 and 5000. Note that since DA requires the sample size to be a multiple
of 2(M + 1)2 = 3362 packets, we extend the sample size of DA to 6724. The
performance of DA and DMV does not depend on the traffic rate. For DM, the
rate will be specified when detailed results are presented.
We have shown the advantage of DMV over DA and have quantified their
difference in terms of error exponent as M → ∞ in Section 3.5.1. We now show
how their performance compares for finite M . In Fig. 3.6, we plot the simulated
false alarm probabilities of DMV and DA, together with the upper bound on
PF (δDMV) from Theorem 4 and the asymptotic upper and lower bounds on PF (δDA)
from (3.3). Simulation shows that the asymptotic bounds in (3.3) are valid even
for relatively small M (M = 40). Furthermore, it confirms our claim that the false
alarm probability of DMV decays much faster than that of DA.
We simulate DM for different traffic rates (λ = 3, 3.5, 4, 4.5). The simulation
results are plotted in Fig. 3.7. The upper bounds in Theorem 3 for rates between 3
and 4.5 are close to 1; the actual false alarm probabilities obtained from simulation
are much lower. The plot shows that the upper bound in Theorem 3 is not tight,
but it correctly predicts the fact that PF (δDM) increases with the increase of traffic
rate, as argued in Section 3.5.2.
77
2500 3000 3500 4000 4500 5000 5500 6000 6500 700010
−5
10−4
10−3
10−2
10−1
100
101
102
PF vs. sample size n M = 40, ∆ = 10
PF
n
DA DA upper bound DA lower bound DMV DMV upper bound
DA
DMV
Figure 3.6: PF (δDA), PF (δDMV), and their bounds; M = 40 packets, 100000 MonteCarlo runs.
Furthermore, we make an overall comparison by plotting the simulated false
alarm probabilities of DA, DMV and DM together in Fig. 3.8. From the plot, it is
clear that the comparison between DM and DMV depends on the traffic rate. In
our simulation, M = 40, ∆ = 10, the threshold rate estimated by (3.4) is about
1.0375. The simulation verifies the existence of such a threshold rate because the
false alarm probability of DM decays faster than that of DMV for λ = 3.5 and
slower for λ = 4.5. Note, however, that in the estimation of the threshold rate we
are conservative about DM. This is because for DMV, Theorem 4 gives the exact
error exponent, whereas for DM, Theorem 3 only characterizes a lower bound on
its error exponent (which is shown to be not tight). Therefore, we expect that
the actual threshold rate is larger than the one estimated by (3.4), e.g.,in the
simulation the threshold rate is about 4 .
78
2500 3000 3500 4000 4500 500010
−5
10−4
10−3
10−2
10−1
PF vs. sample size n M = 40, ∆ = 10
PF
n
DM, λ = 3 DM, λ = 3.5DM, λ = 4 DM, λ = 4.5
Figure 3.7: PF (δDM) under various rates; ∆ = 10 seconds, 100000 Monte Carloruns.
2500 3000 3500 4000 4500 5000 5500 6000 6500 700010
−5
10−4
10−3
10−2
10−1
100
PF vs. sample size n M = 40, ∆ = 10
PF
n
DA DMV DM, λ = 3 DM, λ = 3.5DM, λ = 4 DM, λ = 4.5
DMλ = 4
DMV
Figure 3.8: PF (δDA), PF (δDMV), and PF (δDM); M = 40 packets, ∆ = 10 seconds,100000 Monte Carlo runs.
3.7 Summary
In this chapter, we develop techniques to detect information flows when there is no
chaff noise. These techniques all belong to pairwise detection. If the information
flow of interest involves more than two hops, one can repeatedly apply the proposed
algorithms to detect all the 2-hop pieces and then use existing serialization methods
(e.g.,see [43]) to construct the flow path.
79
APPENDIX 3.A
PROOF OF CHAPTER 3
Proof of Theorem 4 and Lemma 1
The proof is based on the theory of random walk. Let Xnn≥0 be a simple random
walk, i.e.,
X0 = 0, Xn = Z1 + Z2 + . . . + Zn, (n > 0)
where Zii=1, 2,... are i.i.d. random variables taking value in −1, 0, 1. Let
p = PrZi = 1, q = PrZi = −1. Define the hitting time of −b or a (a, b ≥ 0) as
N−b, a = infn ≥ 1 : Xn = −b or a.
The following lemma is from [8]:
Lemma 2
PrN−b, a = n ≤ 1
2
(
p
q
)a/21
sn−11
+1
2
(
q
p
)b/21
sn−11
, (3.5)
where s1 = 1
1−p−q+2(pq)12 cos ( π
a+b). If a = b, then for large n,
PrN−b, a = n ≥ sin π2a
2asn−11
. (3.6)
Moreover, there exist constants cv (v = 1, . . . , a+b−1) and sv (v = 2, . . . , a+b−1)
not depending on n, s.t.
PrN−b, a > n =a+b−1∑
v=1
cv
snv
(3.7)
where |s1| ≤ |sv| (v = 2, . . . , a + b − 1).
80
Since
PrN−b, a > n =∞
∑
r=n+1
PrN−b, a = r,
(3.5,3.6) give upper and lower bounds on PrN−b, a > n.
For the proof of Theorem 4, note that for independent Poisson processes,
it is known that d(w) is a simple random walk. Define extreme values Un =
maxi=0,..., n
d(i), Ln = mini=0,..., n
d(i). A false alarm occurs in DMV if and only if
Un − Ln < M + 1. Note that the false alarm probability is the largest if d(w)
is symmetric (i.e., p = q = 12). Then we have
PF (δDMV) = PrUn − Ln < M + 1
= PrM+1⋃
a=1
Un < a, Ln > −(M + 2 − a)
≤M+1∑
a=1
PrUn < a, Ln > −(M + 2 − a) (3.8)
≤ (M + 1)ρn
1 − ρ, (3.9)
where ρ = cos πM+2
. Here (3.8) is by union bound, and (3.9) is by noticing
PrUn < a, Ln > −(M + 2 − a) = PrN−(M+2−a), a > n,
and then applying (3.5) with p = q = 12. Furthermore, by (3.7) it is easy to see
that
limn→∞
− 1
nlog PF (δDMV) = − log ρ.
For the proof of Lemma 1, note that
Pr maxi∈1,..., n
|d(i)| ≤ M = PrN−(M+1), (M+1) > n.
Applying (3.5, 3.6) with a = b = M + 1 and p = q = 12
gives the desired result.
¥
81
Proof of Theorem 3
Given a matching (si, s′i)i=1, 2,..., define Yi∆= s′i − si. Algorithm DM has
a false alarm if and only if there exists s′1 s.t. the order-preserving matching
(si, s′i)i=1,..., n satisfies 0 ≤ Yi ≤ ∆ for all i = 1, . . . , n.
For i ≥ 2, define the interarrival times to be Ui∆= si − si−1, and Vi
∆= s′i − s′i−1.
Let Zi∆= Vi − Ui. Then
Yi = (s′i−1 − si−1) + (s′i − s′i−1) − (si − si−1) = Yi−1 + Zi.
Therefore, given Y1, Yi∞i=2 is a general random walk with steps Zi (i ≥ 2). We
know that Vi and Ui are independent exponential random variables with mean
1/λ2 and 1/λ1, respectively, and thus Zi’s are i.i.d. with distribution function
PrZi ≤ z = PrVi − Ui ≤ z
=
∫ ∞
max(0, −z)
pUi(u) PrVi ≤ u + zdu
=
1 − λ1
λ1+λ2e−λ2z if z ≥ 0,
λ2
λ1+λ2eλ1z if z < 0.
The probability density function (pdf) of Zi is
pZ(z) =
λ1λ2
λ1+λ2e−λ2z if z ≥ 0,
λ1λ2
λ1+λ2eλ1z if z < 0.
The false alarm probability satisfies
PF (δDM) = Pr∃s′1, s.t. 0 ≤ Y n1 ≤ ∆
≤ maxy1∈[0, ∆]
Pr0 ≤ Y n2 ≤ ∆|Y1 = y1.
Fix a y1 ∈ [0, ∆]. For n ≥ 2, define
pn(z)dz∆= PrY n−1
2 ∈ [0, ∆], z < Yn < z + dz|Y1 = y1.
82
Define p1(z) = δ(z − y1) (Dirac delta function). In [8] (page 53) it is shown that
pn(z) =
∫ ∆
0
pn−1(x)pZ(z − x)dx. (n = 2, 3, . . .)
Then we have
Pr0 ≤ Y n2 ≤ ∆|Y1 = y1 =
∫ ∆
0
pn(zn)dzn
=
∫ ∆
0
pn−1(zn−1)dzn−1
∫ ∆
0
pZ(zn − zn−1)dzn
=
∫ ∆
0
pZ(z2 − y1)dz2
∫ ∆
0
pZ(z3 − z2)dz3 · · ·∫ ∆
0
pZ(zn − zn−1)dzn.
Let γ∆= max
t∈[0, ∆]
∫ ∆−t
−tpZ(z)dz. Simple calculation yields that γ = 1−e−λ1λ2∆/(λ1+λ2).
Then
Pr0 ≤ Y n2 ≤ ∆|Y1 = y1 ≤ γn−1.
Since this is true for all y1 ∈ [0, ∆], we have PF (δDM) ≤ γn−1.
¥
Proof of Proposition 11
If b − a ≤ ∆, then
|n1(a, b) − n2(a, b)| ≤ max(n1(a, b), n2(a, b)) ≤ M.
83
For b − a > ∆, let n′1(a − ∆, a) be the number of packets that arrive at the
relay node in [a−∆, a) and departure after a, and n′′1(b−∆, b) be the number of
packets that arrive in (b − ∆, b] and departure before b. Then
n1(a, b) = n1(a, b − ∆) + n1(b − ∆, b),
n2(a, b) = n1(a, b − ∆) + n′1(a − ∆, a)
+n′′1(b − ∆, b),
We have
n2(a, b) − n1(a, b) = n′1(a − ∆, a) + n′′
1(b − ∆, b)
−n1(b − ∆, b).
Since n′′1(b − ∆, b) ≤ n1(b − ∆, b) and n′
1(a − ∆, a) ≤ M , we have
n2(a, b) − n1(a, b) ≤ n′1(a − ∆, a) ≤ M.
Since n′1(a − ∆, a) ≥ 0, n′′
1(b − ∆, b) ≥ 0 and n1(b − ∆, b) ≤ M , we have
n2(a, b) − n1(a, b) ≥ −n1(b − ∆, b) ≥ −M.
¥
84
APPENDIX 3.B
ALGORITHMS OF CHAPTER 3
Algorithm DM
A pseudo code implementation of δDM is presented in Table 3.1.
Table 3.1: Detect-Match (DM).
Detect-Match(s1, s2, ∆, n):l = infk : s2(k) ≥ s1(1);u = supk : s2(k) ≤ s1(1) + ∆;for m = l, . . . , u
for k = 1, . . . , n*) if s2(k + m − 1) − s1(k) < 0 or s2(k + m − 1) − s1(k) > ∆ break;
endif k == n + 1 return H1;
endreturn H0;
To analyze the complexity of DM, note that the inner loop has O(n) operations,
and the number of such loops is O(1). Thus the complexity of DM is O(n). Zhang
et al. in [49] proposed to improve the complexity of DM by replacing step (*) with
the steps in Table 3.2, which enables DM to decide H0 earlier.
Table 3.2: Alternative Implementation of (*).
if s2(k + m − 1) − s1(k) < 0break;
else if s2(k + m − 1) − s1(k) > ∆return H0;
endend
85
Algorithm DMV
An implementation of δDMV is shown in Table 3.3. This algorithm has complexity
O(n) and uses only constant memory (O(log M), to be precise).
Table 3.3: Detect-Maximum-Variation (DMV).
Detect-Maximum-Variation(s1, s2, M, n):(sw)n
w=1 = s1
⊕
s2;dmax = dmin = d(0) = 0;for w = 1 : n
d(w) =
d(w − 1) + 1 if sw ∈ S1
d(w − 1) − 1 if sw ∈ S2;
dmax = max(dmax, d(w));dmin = min(dmin, d(w));if dmax − dmin > M return H0;
endreturn H1;
86
Chapter 4
Detecting Information Flows With Chaff
Noise
4.1 Outline
In this chapter, we address the detection of information flows mixed with chaff
noise. The main contribution is a tight characterization of flow detectability as
the maximum amount of chaff noise allowed for consistent detection. The rest
of the chapter is organized as follows. Section 4.2 defines the problem. Section
4.3 summarizes our results on the detectability of information flows. Sections 4.4
and 4.5 present chaff-inserting algorithms for the optimal embedding. Section 4.6
presents the detector and analyzes its performance. The analysis is supported by
simulation results in Section 4.8. Section 4.7 comments on the generalization of the
Poisson assumption. Then Section 4.9 concludes the chapter with remarks on its
contributions. Appendix 4.A includes all the proofs, and Appendix 4.C contains
pseudo code implementations of all the proposed algorithms.
4.2 Problem Formulation
We use the same convention for notations as in Chapter 3.
87
4.2.1 Multi-hop Flow Models
The two-hop flow models in Section 3.2.2 can be extended, in a natural way, to
flows over multiple hops. Suppose that we are interested in detecting information
flows through n (n ≥ 2) nodes, as illustrated in Fig. 4.1. Let Si (i = 1, . . . , n) be
the process of transmission epochs of node Ri, i.e.,
Si = (Si(1), Si(2), Si(3), . . .), i = 1, 2, . . . , n,
where Si(k) (k ≥ 1) is the kth transmission epoch of Ri.
S1:
S2:
R2
Sn:
R1 Rn
· · ·
Figure 4.1: Detecting information flows through nodes R1, R2, . . . , Rn bymeasuring their transmission activities; dotted lines denote apotential route.
F1 F2 Fn−1 Fn
R1 R2Rn
· · ·· · ·
Figure 4.2: An information flow along the path R1 → . . . → Rn.
If (Si)ni=1 contains an information flow, then it can be decomposed into an
information-carrying part (Fi)ni=1 and a chaff part (Wi)
ni=1:
Si = Fi
⊕
Wi, i = 1, · · · , n, (4.1)
where the information-carrying part consists of packets sent by R1 and relayed
sequentially by Ri (i = 2, . . . , n) as illustrated in Fig. 4.2. Note that chaff noise
is not subject to any constraints on information flows and can be correlated with
the information flows.
88
We extend the definition of information flows from two hops to arbitrary hops
as follows.
Definition 9 A sequence of processes (F1, . . . , Fn) is an information flow if for
every realization fi (i = 1, . . . , n), there exist bijections gi : Fi → Fi+1 (i =
1, . . . , n − 1) such that gi(s) − s ≥ 0 for all s ∈ Fi. For an information flow
with bounded delay ∆, gi(s)− s ≤ ∆ for all s ∈ Fi; for an information flow with
bounded memory M , gi satisfies
0 ≤ |Fi ∩ [0, t]| − |Fi+1 ∩ [0, t]| ≤ M (4.2)
for any t ≥ 0.
The bijection gi is a mapping between the transmission epochs of the same
packets at nodes Ri and Ri+1. For explanation of this definition, we refer to the
comments after Definition 7. Although in this definition, we have assumed equal
delay or memory constraint at every relay node, it can be easily generalized to
unequal constraints. Again, the constants ∆ and M are assumed to be known.
4.2.2 Problem Statement
We are interested in testing the following hypotheses:
H0 : S1, S2, . . . , Sn are jointly independent;
H1 : (Si)ni=1 contains an information flow,
by observing Si (i = 1, . . . , n) for some time t (t > 0). No statistical assumptions
are made for Fi and Wi (i = 1, . . . , n) under H1, but the distributions of Si
89
(i = 1, . . . , n) are assumed to be known under H0 (they are assumed to be Poisson
processes in our analysis). We point out that although Poisson assumption is
needed to obtain explicit expressions, the idea of detection is applicable for general
point processes.
Remark: The above is a test of independent traffic against end-to-end informa-
tion flows. Since the complement of H0 is not H1, one should view this test as part
of an overall detection scheme. For example, if we observe realizations s1, . . . , sN ,
and we want to find out whether a subset of the processes contains an information
flow, we can first apply the above hypothesis testing to every pair of realizations
(si, sj) (i, j ∈ 1, . . . , N) to test if this pair contains an information flow, and
then if there is no detection on pairs, we extend the scope to every triple, etc. That
is, we can sequentially test H0 versus H1 on every subset (si)i∈I (I ⊆ 1, . . . , N)
for |I| = 2, . . . , N . This procedure helps us to simplify the detection of partial
information flows which may only go through a subset of the monitored nodes to
the detection of end-to-end flows.
To characterize the amount of chaff noise, we introduce the following definition.
Definition 10 Given realizations of an information flow (fi)ni=1 and chaff noise
(wi)ni=1, the chaff-to-traffic ratio (CTR) is defined as
CTR(t)∆=
n∑
i=1
|Wi ∩ [0, t]|n∑
i=1
|Si ∩ [0, t]|, CTR
∆= lim sup
t→∞CTR(t) (4.3)
In words, CTR(t) is the fraction of chaff packets in the first t period of time
and CTR its asymptotic value. We are interested in the asymptotic detection
performance with respect to CTR.
90
Since we consider a nonparametric alternative hypothesis in which distributions
of Fi and Wi (i = 1, . . . , n) are unknown, we borrow the notion of Chernoff-
consistency in [32] to introduce the following performance measure.
Definition 11 A detector δt is called r-consistent (r ∈ [0, 1]) if it is Chernoff-
consistent for all the information flows with CTR bounded by r a.s.1, that is, the
false alarm probability PF (δt) and the miss probability PM(δt) satisfy
1. limt→∞
PF (δt) = 0 for any (Si)ni=1 under H0;
2. sup(Si)n
i=1∈Plimt→∞
PM(δt) = 0, where
P = (Si)ni=1 : (Si)
ni=1 contains an information flow,
and lim supt→∞
CTR(t) ≤ r a.s..
The consistency of a detector is defined as the supremum of r such that the detector
is r-consistent.
4.3 Flow Detectability
We first give the general detectability result, starting with the following definitions.
Definition 12 For n-hop information flows with bounded delay ∆, the level of
weak detectability, denoted by α∆n , is defined as
α∆n
∆= supr : ∀(Si)
ni=1 containing an information flow with
bounded delay ∆, if lim supt→∞
CTR(t) ≤ r a.s., then
∃ a Chernoff-consistent detector for (Si)ni=1..
1Here a.s. means “almost surely”.
91
The level of strong detectability, denoted by α∆n , is defined as
α∆n
∆= supr : ∃ δt s.t. δt is r-consistent..
For information flows with bounded memory, the levels of weak and strong de-
tectabilities, denoted by αMn and αM
n , are defined similarly.
By definition, the weak detectability allows the detector to depend on the
distribution of information flows, whereas the strong detectability does not. Thus
the level of weak detectability is no lower than that of strong detectability, i.e.,
αjn ≤ αj
n (j = ∆, M).
With a sufficient amount of chaff noise, the nodes can make traffic containing
an information flow mimic arbitrary traffic patterns, including the traffic patterns
under H0. Therefore, there must be some limits on the amount of chaff noise be-
yond which information flows are no longer detectable. A basic limit is the amount
of chaff noise sufficient to make an information flow statistically identical with in-
dependent traffic. Specifically, we define a notion of the level of undetectability as
follows.
Given H0, define the level of undetectability as2
β∆n
∆= infr ∈ [0, 1] : ∃(Fi)
ni=1, (Wi)
ni=1 satisfying:
1) (Fi
⊕
Wi)ni=1
d= (Si)
ni=1 for some
(Si)ni=1 under H0;
2) (Fi)ni=1 is an information flow with
bounded delay ∆;
3) lim supt→∞
CTR(t) ≤ r a.s.. (4.4)
2Here “d=” means equal in distribution.
92
That is, β∆n is the minimum CTR for an n-hop information flow with bounded delay
∆ to be equal to traffic under H0 in distribution. The corresponding quantity βMn
for bounded memory flows is defined similarly.
Our main results are the following relationships among the levels of weak and
strong detectabilities and the level of undetectability.
Theorem 5 If Si (i = 1, . . . , n) are Poisson processes of bounded rates under H0,
then
αjn = αj
n = βjn, j = ∆, M.
Remark: This theorem states that for Poisson null hypothesis, the levels of weak
and strong detectabilities are equal and equal to the minimum fraction of chaff to
mimic the null hypothesis. For CTR less than βjn (j = ∆, M), any information
flow can be detected consistently by the same detector; for CTR above or equal
to βjn, there is a method to hide the information flow among chaff noise such that
consistent detection is impossible. We will give explicit expressions for βjn or its
bounds later.
Proof: The proof contains a converse part and an achievability part. For the
converse part, we need to show that αjn ≤ βj
n (j = ∆, M). By the definition of
βjn, there exists (Si)
ni=1 such that it contains an information flow with βj
n fraction
of chaff, and S1, . . . , Sn are truly independent Poisson processes. Thus, it is
impossible to have a Chernoff-consistent detector for this information flow, which
implies that βjn is an upper bound on the level of weak detectability.
For the achievability part, we need to show that αjn ≥ βj
n (j = ∆, M). The
approach is to design a detector which is r-consistent for r arbitrarily close to βjn.
93
The detector is presented later in Definition 13 and analysis of its consistency in
Theorems 11 and 12. Combining the converse and the achievability results and
the fact that αjn ≤ αj
n (j = ∆, M) gives Theorem 5.
¥
In the following sections, we will explain how to compute βjn (j = ∆, M) and
how to do the detection.
4.4 Detectability of Two-hop Flows
In this section, we consider 2-hop information flows (i.e., n = 2). Given the
distribution of (S1, S2) under H0, we aim at characterizing the value of βj2 (j =
∆, M).
Our approach is to first find the algorithms which optimally partition Si (i =
1, 2) into Fi and Wi such that (F1, F2) is an information flow, and the CTR is
minimized, and then calculate βj2 by analyzing the CTR of these algorithms under
H0. Such algorithms are called chaff-inserting algorithms, and the CTR of these
algorithms is defined as the CTR of the partitioned traffic.
4.4.1 Two-hop Flows with Bounded Delay
Suppose that nodes R1 and R2 want to send a 2-hop information flow with bounded
delay ∆, and they are allowed to design the insertion of chaff noise. The question
94
is how to insert the minimum amount of chaff noise such that S1 and S2 become
statistically independent.
To answer this question, Blum et al. in [5] proposed a greedy algorithm called
“Bounded-Greedy-Match” (BGM) which works as follows: given a realization
(s1, s2),
1. match every packet transmitted at time s in the first process s1 with the first
unmatched packet transmitted in [s, s + ∆] in the second process s2;
2. label all the unmatched packets in s1 and s2 as chaff.
See Fig. 4.3 for an illustration of BGM. It is easy to see that BGM has complexity
O(|S1| + |S2|). For a pseudo code implementation of BGM, see Appendix 4.C.
s1
s2
Chaff
∆
Figure 4.3: BGM: a sequential greedy match algorithm.
Algorithm BGM has been shown in [5] to be the optimal chaff-inserting algo-
rithm for 2-hop information flows with bounded delay, as stated in the following
proposition.
Proposition 12 ( [5]) For any realization (s1, s2), BGM inserts the minimum
number of chaff packets in transmitting an information flow with bounded delay ∆.
The optimality of BGM allows us to characterize the minimum chaff needed
to mimic completely independent traffic by analyzing the CTR of BGM. If, in
95
particular, the independent traffic can be modelled as Poisson processes, then we
prove the following results.
Theorem 6 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,
respectively, then with probability one, the CTR of BGM satisfies
limt→∞
CTRBGM(t)
=
(λ2−λ1)(
1+(
λ1λ2
)
e∆(λ1−λ2))
(λ1+λ2)(
1−(
λ1λ2
)
e∆(λ1−λ2)) if λ1 6= λ2,
11+λ1∆
if λ1 = λ2.
Proof: See Appendix 4.A.
¥
It is easy to show that if λi ≤ λ (i = 1, 2), then the CTR of BGM is lower
bounded by 1/(1 + λ∆). By the optimality of BGM, we see that the following
result holds.
Corollary 1 If under H0, S1 and S2 are independent Poisson processes with max-
imum rate λ, then the level of undetectability β∆2 = 1/(1 + λ∆).
With 1/(1+λ∆) fraction of chaff noise, the 2-hop traffic containing an informa-
tion flow with bounded delay can be made identical with traffic under H0 so that
no detector can detect this flow consistently. Note that as λ∆ → ∞, the value of
β∆2 will decrease to zero, implying that it is easy to mimic H0 if the traffic load is
heavy (large λ) or the delay bound is loose (large ∆).
96
4.4.2 Two-hop Flows with Bounded Memory
Consider the transmission of a 2-hop information flow with bounded memory M .
We want to find a method that schedules transmissions according to independent
traffic while inserting the minimum amount of chaff noise.
The bounded memory constraint requires that the memory size used at the
relay node to store relay packets is always bounded between 0 and M . Thus, a
feasible scheduling is to keep updating the memory size for each arrival (i.e., a
packet in S1) or departure (i.e., a packet in S2), and assign that packet to be chaff
if the memory is overflowed or underflowed. Based on this idea, we develop a chaff-
inserting algorithm called “Bounded-Memory-Relay” (BMR). Given a realization
(s1, s2) and (sk)∞k=1
∆= s1
⊕
s2, let M1(k) be the memory size after the transmission
of the kth packet in s1
⊕
s2. Algorithm BMR does the following: for k = 1, 2, . . .,
1. label a packet sk as chaff if and only if this packet will cause a memory
overflow, i.e., sk ∈ S1 and M1(k − 1) = M , or underflow, i.e., sk ∈ S2 and
M1(k − 1) = 0; initially, M1(0) = 0;
2. compute M1(k) by3
M1(k) =
M1(k − 1) if sk = chaff,
M1(k − 1) + Isk∈S1
−Isk∈S2 o.w.
A sample path of M1(k) (k ≥ 1) is shown in Fig. 4.4.
The complexity of BMR is O(|S1| + |S2|). See Appendix 4.C for an imple-
mentation of BMR. Note that unlike BGM, BMR does not specify the mapping
3Here I· is the indicator function.
97
0
0
s1
⊕
s2
M = 2
k
M1(k)
Chaff
Figure 4.4: Example: •: sk ∈ S1; : sk ∈ S2; M1(k): the statistics calculated byBMR. Initially, M1(0) = 0, indicating that the memory is empty. Thefirst packet is a departure, and it is assigned as chaff becauseotherwise the memory will be underflowed. The second packet is anarrival, and thus the memory size is increased by one. Such updatingoccurs at each arrival or departure.
between packets in the two processes because as long as the memory constraint is
satisfied, the order of transmission is irrelevant.
The optimality of BMR is guaranteed by the following proposition.
Proposition 13 For any realization (s1, s2), BMR inserts the minimum number
of chaff packets in transmitting an information flow with bounded memory M .
Proof: See Appendix 4.A.
¥
Since BMR is optimal, we can characterize βM2 by the CTR of BMR, as stated
in the following theorem.
Theorem 7 If S1 and S2 are independent Poisson processes of rates λ1 and λ2,
98
respectively, then with probability one, the CTR of BMR satisfies
limt→∞
CTRBMR(t) =
(λ2−λ1)
(
1+(
λ1λ2
)M+1)
(λ1+λ2)
(
1−(
λ1λ2
)M+1) if λ1 6= λ2,
11+M
if λ1 = λ2.
Proof: See Appendix 4.A.
¥
It can be shown that the CTR is minimized when λ1 = λ2, based on which we
have the following result.
Corollary 2 If under H0, S1 and S2 are independent Poisson processes, then the
level of undetectability βM2 = 1/(1 + M).
If nodes can insert at least 1/(1+M) fraction of chaff noise, then BMR gives a
feasible transmission schedule for an information flow with bounded memory such
that the overall traffic is statistically the same as traffic under H0. Therefore,
1/(1 + M) establishes a limit on the maximum amount of chaff noise under the
requirement of Chernoff-consistent detection. If M ≫ 1, then very little chaff noise
suffices to hide the information flows.
4.5 Detectability of Multi-hop Flows
The results in Section 4.4 suggest that pairwise detection of information flows
is vulnerable to chaff noise because a relatively small amount of chaff noise can
99
make the information flow undetectable. These results indeed reveal the weakness
of pairwise detection. As the number of hops increases, however, we see that
the constraints imposed on information-carrying packets become tighter because
only the packets satisfying the constraints at every hop can successfully reach the
destination. This observation motivates us to extend the results in Section 4.4 to
information flows over multiple hops. Specifically, we will show that the fraction of
chaff noise needed to make a multi-hop information flow mimic jointly independent
traffic increases to one as the number of hops increases, which implies that joint
detection may significantly improve the performance against chaff noise.
4.5.1 Multi-hop Flows with Bounded Delay
Consider the transmission of an n-hop (n ≥ 2) information flow with bounded
delay ∆ according to certain processes. Given a sequence of processes (Si)ni=1,
we want to decompose Si (i = 1, . . . , n) into Fi and Wi such that (Fi)ni=1 is an
information flow with bounded delay, and the CTR is minimized.
Given the 2-hop chaff-inserting algorithm BGM, one might think that we can
sequentially apply BGM to every pair of processes to obtain (Fi)ni=1. Such an ap-
proach, however, does not give the optimal decomposition. For example, consider
the realizations shown in Fig. 4.5. If we use BGM to match packets in s1 and s2,
and then repeat BGM to match the matched packets in s2 with s3, we only find
one sequence of matched packets (as shown in (a)). There is, however, another way
of matching that gives two sequences of matched packets (as shown in (b)). The
implication is that for n > 2, a hop-by-hop greedy match is not sufficient. We have
to jointly consider all the subsequent hops to find the optimal packet matching.
100
s1s1
s2s2
s3s3
∆∆(a) (b)
Figure 4.5: Example: (a) The scheduling obtained by repeatedly using BGM. (b)Another scheduling. It shows that repeatedly using BGM issuboptimal.
To solve this problem, we develop an algorithm called “Multi-Bounded-Delay-
Relay” (MBDR). The idea of MBDR is that a packet at time t1 in s1 can be
matched with a packet at t2 ∈ [t1, t1 + ∆] in s2 only if t2 has matched packets
in si for all i = 3, . . . , n. The matching of t2 and its matched packets is done by
recursions. Such recursions allow us to consider all the processes simultaneously
and achieve a smaller CTR than repeatedly applying BGM. Specifically, MBDR
works as follows: given a realization (si)ni=1,
1. match every packet at time t1 in s1 with the first unmatched packet t2 in
[t1, t1 + ∆] in s2, conditioned on that t2 has a match in s3;
2. for i = 2, . . . , n−1, match the packet ti in si with the first unmatched packet
ti+1 in [ti, ti+∆] in si+1, conditioned on that ti+1 has a match in si+2 (assume
every packet in sn has a match);
3. after trying to match all the packets in s1, label all the unmatched packets
as chaff.
For example, consider the 3-hop information flow illustrated in Fig. 4.6. To match
t1 ∈ S1, MBDR first tries to find a match for t2. Since t2 can be matched with
t3 ∈ S3, t1 is matched with t2. If t2 does not have a match in s3, MBDR will try
101
to match t1 with the next unmatched packet in [t1, t1 + ∆] in s2. If there is no
more packet left, MBDR will label t1 as chaff.
s1
s2
s3
chafft1
t2
t3
Figure 4.6: MBDR: a recursive greedy match algorithm.
A direct implementation of MBDR has complexity O((λ∆)n|S1|), where λ is
the maximum rate of S1, . . . , Sn. The complexity can be reduced to O(n2|S1|)
by expanding the recursions (see Appendix 4.C). Note that MBDR is reduced to
BGM when n = 2.
It is easy to verify that if we transmit information-carrying packets according
to the matching found by MBDR, the transmissions will satisfy the bounded delay
constraint at every hop. Moreover, such a transmission schedule preserves the order
of incoming packets. The following proposition states that MBDR is optimal.
Proposition 14 For any realization (si)ni=1, MBDR inserts the minimum number
of chaff packets in transmitting an n-hop information flow with bounded delay ∆.
Proof: See Appendix 4.A.
¥
102
By arguments similar to those in the proof of Theorem 6, one can show that the
CTR of MBDR converges a.s. It is difficult to compute the exact limit4. Instead,
we give the following bound.
Theorem 8 If Si (i = 1, . . . , n) are independent Poisson processes of maximum
rate λ, then
limt→∞
CTRMBDR(t) ≥ 1 − κn a.s.
where
κn = min
(
(λ∆)n−2(1 − e−λ∆),n−1∏
i=1
(1 − e−iλ∆)
)
.
Proof: See Appendix 4.A.
¥
By Theorem 8, we see that the CTR of MBDR goes to one exponentially with
the increase of n if λ∆ < 1. It can be shown that if we repeatedly apply BGM,
then the CTR is lower bounded by 1− (1− e−λ∆)n−1 a.s., which always converges
to one exponentially.
Although in Definition 9 we have assumed identical delay bounds at all the
relay nodes, MBDR can be easily extended to different delay bounds, and κn in
Theorem 8 becomes
min
(
(1 − e−λ∆n−1)n−2∏
i=1
(λ∆i),n−1∏
i=1
(1 − e−iλ∆i)
)
,
where ∆i is the maximum delay at the ith relay node.
4For example, for independent Poisson processes, computing the CTR of MBDR involvescomputing the limiting distribution of an (n − 1)-dimensional continuous state space Markovprocess.
103
The optimality of MBDR allows us to have the following result.
Corollary 3 If under H0, S1, . . . , Sn are independent Poisson processes of rates
bounded by λ, then β∆n ≥ 1 − κn.
By this result, we see that for sufficiently light traffic or small delay bound
(i.e., λ∆ < 1), β∆n converges to one exponentially fast as n increases. Numerical
calculation shows that β∆n still converges to one for λ∆ > 1, but the convergence
is slower than exponential. If we calculate the maximum rate of the information
flow by λ(1 − β∆n ), then this rate will go to zero with the increase of n, implying
that it is almost impossible to hide information flows over arbitrarily long paths.
See Fig. 4.7–4.9 for numerically computed information rate 1 − β∆n as a function
of n. From the plots, it is clear that the information rate decays exponentially at
λ∆ < 1 (Fig. 4.7) and subexponentially at λ∆ > 1 (Fig. 4.8, 4.9).
4.5.2 Multi-hop Flows with Bounded Memory
Suppose that we want to transmit an n-hop information flow with bounded memory
M according to certain processes. We generalize BMR to an algorithm called
“Multi-Bounded-Memory-Relay” (MBMR) to insert chaff noise in this case.
Algorithm MBMR borrows the idea of monitoring memory size in BMR. Specif-
ically, let Mi(k) (i = 1, . . . , n − 1) denote the memory size of Ri+1 after the
kth packet in the total traffic. Algorithm MBMR keeps updating (Mi(k))n−1i=1 for
k = 1, 2, . . . and assigns chaff packets if memory underflow or overflow occurs.
Given a realization (si)ni=1 and (sk)
∞k=1
∆= s1
⊕ · · ·⊕ sn, MBMR works as follows:
for k = 1, 2, . . .,
104
The normalized rate of information flow 1 − β∆n with respect to n (∆ = 1):
solid line: 1 − β∆n computed for 1000 packets per process; dashed line: κn.
2 3 4 5 6 7 8 9 1010
−2
10−1
100
n
info
rmat
ion
rate
Figure 4.7: λ = 0.9.
2 4 6 8 10 12 14 16 18 2010
−1
100
n
info
rmat
ion
rate
Figure 4.8: λ = 2.
0 5 10 15 20 25 30 35 4010
−1
100
n
info
rmat
ion
rate
Figure 4.9: λ = 4.
1. label sk ∈ Si as chaff if and only if Mi−1(k − 1) = 0 or Mi(k − 1) = M
(initially, Mj(0) = 0 for j = 1, . . . , n − 1; M0(0) = ∞; Mn(0) = −∞);
2. compute (Mj(k))n−1j=1 by
Mi−1(k) =
Mi−1(k − 1) − 1 if sk 6= chaff,
Mi−1(k − 1) o.w.,
Mi(k) =
Mi(k − 1) + 1 if sk 6= chaff,
Mi(k − 1) o.w.,
and Mj(k) = Mj(k − 1) for j = 1, . . . , i − 2, i + 1, . . . , n − 1.
See Fig. 4.10 for an example of MBMR.
105
s1
s2
s3
s4
s
M1
M2
M3
Chaff
Figure 4.10: MBMR for n = 4 and M = 3 (s = s1
⊕ · · ·⊕ s4): monitor thememory sizes of the relay nodes and assign a chaff packet if thememory of any node will be underflowed or overflowed. Initially,Mi(0) = 0 (i = 1, 2, 3); at the end of this realization (after the 10thpacket), (M1(10), M2(10), M3(10)) = (1, 1, 0).
Algorithm MBMR has complexity O(n∑
i=1
|Si|). See Appendix 4.C for its imple-
mentation. Note that MBMR is reduced to BMR when n = 2. If we sequentially
match the non-chaff packets found by MBMR, then we will have a transmission
schedule that satisfies the bounded memory constraint. The optimality of MBMR
is provided by the following proposition.
Proposition 15 For any realization (si)ni=1, MBMR inserts the minimum number
of chaff packets to schedule the transmission of an n-hop information flow with
bounded memory M .
Proof: The proof follows the same arguments as in the proof of Proposition 13.
¥
We can now characterize βMn by the CTR of MBMR. If S1, . . . , Sn are indepen-
106
dent Poisson processes, then the CTR of MBMR converges almost surely, and the
limit can be calculated by the limiting distribution of a Markov chain, as shown
in Appendix 4.B.
It is difficult to give a closed-form expression for the exact CTR of MBMR.
Alternatively, we derive the following upper and lower bounds. Let
A ∆= (Si)
ni=1 : S1, . . . , Sn are independent Poisson processes.
We have the following theorem.
Theorem 9 For any (Si)ni=1 ∈ A,
limt→∞
CTRMBMR(t) ≥ 1 − un a.s.
Furthermore,
infA
limt→∞
CTRMBMR(t) ≤ 1 − ln a.s.
Here ln and un are given by
ln+1 =ln(1 − lMn )
1 − lM+1n
and
un+1 = un
(
1 − 1
M + 12−M/un
)
for n ≥ 2, and l2 = u2 = M/(M + 1).
Proof: See Appendix 4.A.
¥
107
Although identical memory constraints have been assumed in Definition 9,
MBMR can be easily modified to allow different memory constraints, and it can
be shown that the CTR is bounded between 1 − u′n and 1 − l′n, where
l′n+1 =l′n(1 − l′n
Kn)
1 − l′nKn+1
and
u′n+1 = u′
n
(
1 − 1
Kn + 12−Kn/u′
n
)
for n ≥ 2, and l′2 = u′2 = K1/(K1 + 1). Here Ki (i = 1, . . . , n − 1) is the memory
constraint at the ith relay node.
Based on Theorem 9 and the optimality of MBMR, we have the following result.
Corollary 4 If under H0, S1, . . . , Sn are independent Poisson processes, then
1 − un ≤ βMn ≤ 1 − ln.
The bounds in Corollary 4 are not far from the actual value of βMn at small n;
see the numerical results in Fig. 4.11.
2 3 4 5 6 7 8 9 100.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55CTR of MBMR
n
CT
R
βMn
1−ln
1−un
Figure 4.11: The level of undetectability βMn and its bounds as functions of n:
M = 4; compute βMn on 10000 packets.
108
Another interpretation of Corollary 4 is that the normalized maximum rate of
information flow calculated by 1 − βMn is bounded between ln and un. Numerical
calculation shows that ln and un both decay polynomially. Specifically, ln decays at
approximately Θ(n−1/M) and un at Θ(n−1/(2M−2)). Furthermore, numerical com-
parison shows that if λ∆ = M , βMn increases slower than β∆
n as n → ∞, suggesting
that it is relatively easier to hide information flows with bounded memory.
4.6 Detector
In Section 4.4 and 4.5, we have characterized the levels of undetectabilities for infor-
mation flows with bounded delay or bounded memory. The results are summarized
in Table 4.1. These results provide upper bounds on the level of detectability.
Table 4.1: Levels of undetectabilities (Poisson null hypothesis).β∆
2 = 11+λ∆
βM2 = 1
1+M
β∆n ≥ 1 − κn
1 − un ≤ βMn ≤ 1 − ln
In this section, we will present an explicit detector whose consistency can ap-
proximate βjn (j = ∆, M) arbitrarily. Our main theorem is stated as follows.
Theorem 10 For any ǫ > 0, there exists a detector such that its consistency is
no smaller than βjn − ǫ (j = ∆, M).
Remark: The theorem states that as ǫ → 0, there exists a sequence of detectors
with consistency approaching βjn (j = ∆, M). Therefore, the level of strong
detectability is no smaller than βjn, i.e., αj
n ≥ βjn (j = ∆, M).
109
The proof of Theorem 10 is by constructing a detector and showing that its
consistency approximates βjn (j = ∆, M) arbitrarily. Ideally, we would like to know
what strategy is used to perturb timing and insert chaff noise so that we can design
a detector accordingly. The difficulty here is that we do not know what strategy
is going to be used when information flows are transmitted, and therefore our goal
is to design a single detector which has good performance for a wide variety of
information flows.
The key idea is to design the detector based on the amount of chaff noise needed
by the optimal chaff-inserting algorithms. If the detector is designed to guarantee
that even the optimal algorithms need a sufficiently large amount of chaff to evade
detection, then any other chaff-inserting algorithm would have to insert no less
chaff noise to evade detection. Therefore, we can make sure that the detector is
r-consistent against fractions of chaff up to a certain level. Specifically, we propose
the following detector.
Definition 13 Given observations5 (si)ni=1 (n ≥ 2), the detector is defined as
δt((si)ni=1; τn) =
1 if CTR(t) ≤ τn,
0 o.w.,
where τn is a predetermined threshold, and CTR(t) is the minimum fraction of
chaff in the measurements.
Remark: The statistic CTR(t) is computed by the optimal chaff-inserting algo-
rithm followed by certain adjustments. Specifically, it is calculated by the following
procedure:
5To be precise, the detector is only given the part of si (i = 1, . . . , n) that falls into thelength-t observation interval.
110
1. compute C , the set of chaff packets found by the optimal chaff-inserting
algorithm (MBDR for bounded delay flows or MBMR for bounded memory
flows);
2. calculate a number C by
C =
∣
∣
∣
∣
∣
C \(
n⋃
i=1
Si ∩ [0, (i − 1)∆)
)∣
∣
∣
∣
∣
for bounded delay flows, or
C = |C | + min0≤k≤w∗
d(k)
for bounded memory flows, where d(k) is the cumulative difference defined
as
d(k)∆=
k∑
j=1
(
Isj∈S1 − Isj∈S2)
, (4.5)
d(0) = 0, and w∗ is the first time that d(k) varies by M , i.e., w∗ ∆= infw :
max0≤k≤w
d(k) − min0≤k≤w
d(k) = M;
3. compute CTR(t) = C/N , where N =n∑
i=1
|Si|.
For implementation details, we refer to Appendix 4.C. We point out that for large
N , the influence of the adjustment in step (2) on CTR(t) is negligible.
The reason for CTR(t) to be the minimum fraction of chaff in the measurements
hinges on two facts. The first is the optimality of the chaff-inserting algorithm used
to find C . The second is the adjustment in step (2). The adjustment is needed
because the detector may not observe the beginning of the information flow. At the
time the detector starts, there may have been packets stored at the relay nodes, and
when these packets are relayed, the relay packets may appear to be chaff noise from
the detector’s perspective since they do not correspond to any observed packets.
We solve this problem by ignoring certain chaff packets found at the beginning of
111
the measurements. For bounded delay flows, these are the packets in [0, (i− 1)∆)
in si (i = 1, . . . , n). For bounded memory flows, these are the packets which may
be relays of packets stored in the memory initially. Detailed explanations can be
found in Appendix 4.C.
Now that CTR(t) is the minimum CTR in the measurements, we can guarantee
detection as follows.
Theorem 11 The detector in Definition 13 has vanishing miss probability for all
the information flows with CTR bounded by τn a.s.
Theorem 11 is a direct implication of the fact that CTR(t) is the minimum
fraction of chaff packets in the measurements. Actually, a stronger statement holds,
which is that the detector has no miss detection for all realizations of information
flows with no more than τn fraction of chaff packets.
The threshold value needs to be carefully chosen such that the detector satisfies
certain false alarm constraint. Specifically, under the assumption that Si’s are
independent Poisson processes of maximum rate λ under H0, we have the following
theorems on the false alarm probabilities.
Theorem 12 If τn < βjn (j = ∆, M), then the false alarm probability satisfies
limN→∞
1
Nlog PF (δt) ≤ −Γn(τn; λ, ∆) < 0
for bounded delay flows, and
limN→∞
1
Nlog PF (δt) ≤ −Γn(τn; M) < 0
for bounded memory flows, where N =n∑
i=1
|Si|.
112
Proof: See Appendix 4.A.
¥
The theorem states that the false alarm probability of the proposed detector
decays exponentially as long as the threshold is less than βjn (j = ∆, M). The
functions Γn(τn; λ, ∆) and Γn(τn; M) give lower bounds on the error exponents;
see the proof for their definitions. We point out that Γn(τn; λ, ∆) and Γn(τn; M)
are positive for all τn < βjn (j = ∆, M), and they are both decreasing functions of
τn.
Combining Theorem 11 and 12 yields the following result.
Corollary 5 If τn < βjn (j = ∆, M), then τn is the consistency of the proposed
detector.
Remark: As τn → βjn, the consistency of the proposed detector converges to βj
n,
which proves that the level of strong detectability is lower bounded by βjn. From
Corollary 5, we see that the proposed detector is optimal in terms of consistency.
In particular, since βjn → 1 as n increases, the proposed detector can detect almost
all the long-lasting information flows with sufficiently long paths.
The threshold τn represents a tradeoff between the consistency and the false
alarm probability. A larger τn enables consistent detection against more chaff
noise at the cost of a higher false alarm probability, whereas a smaller τn leads to
a smaller false alarm probability but less consistency against chaff noise.
113
4.7 Generalization of Poisson Assumption
We have assumed that the node transmission epochs can be modelled as indepen-
dent Poisson processes under H0. Poisson assumption allows us to obtain clean
analytical results, but it is known that wide-area traffic such as internet traffic does
not fit the Poisson model. It, however, can be argued that Poisson processes are
less bursty than real-world traffic, and therefore, our results provide lower bounds
on the levels of detectability of actual information flows.
Specifically, suppose that traffic under H0 can be modelled as independent
renewal processes with Pareto interarrival distributions [28]. It was shown in [28]
that Pareto distribution fits experimental data over many time scales. We show
that such processes are more difficult to mimic than independent Poisson processes,
as stated in the following theorem.
Theorem 13 Let CTR′BMR
(t) denote the chaff-to-traffic ratio found by BMR in in-
dependent renewal processes with Pareto interarrival distributions and CTRBMR(t)
in independent Poisson processes of the same rates. Then
lim inft→∞
CTR′BMR
(t) ≥ limt→∞
CTRBMR(t) a.s.
Similar statement holds for the CTR of BGM.
Proof: See Appendix 4.A.
¥
By this theorem, we see that it requires more chaff noise to mimic the null
114
hypothesis under Pareto interarrival distributions. The results can be generalized
to MBDR and MBMR.
If traffic under H0 has Pareto interarrival distributions, the idea in prov-
ing Theorem 5 is still applicable. Specifically, let CTR′(t) be the fraction of
chaff packets inserted by the optimal chaff-inserting algorithm (i.e., MBDR or
MBMR) in the interval [0, t] under Pareto interarrival distribution. Then the up-
per bound on the level of weak detectability is the minimum r (r ∈ [0, 1]) such that
lim supt→∞
CTR′(t) ≤ r a.s., and the lower bound on the level of strong detectability
is the maximum r such that lim inft→∞
CTR′(t) ≥ r a.s6. We see that the levels of de-
tectabilities under Pareto distribution are no smaller than those under Exponential
distribution.
To verify the claim that Poisson assumption provides lower bounds on the ac-
tual detection performance, we simulate BGM and BMR on the traces LBL-PKT-
4, which contains an hour’s worth of all wide-area traffic between the Lawrence
Berkeley Laboratory and the rest of the world7. We compute the CTR of pairs of
different traces8, and then compare the empirical cumulative distribution function
(c.d.f.) of the computed CTR with the c.d.f. of the CTR predicted by Theorems
6 and 7 for independent Poisson processes of the same rates as the empirical rates
of the traces. See Fig. 4.12 and 4.13. From these plots, it is clear that at the
same threshold, the traces have much lower false alarm probabilities than Poisson
processes.
We point out that the results in Theorem 13 also apply to renewal processes
with other interarrival distributions which have the heavy-tailed property [21].
6Note that for Pareto interarrival distributions, the upper and the lower bounds may notmeet.
7The traces were made by Paxson and were first used in his paper [28].8We extract 134 TCP traces from the data, each of which is truncated to 1000 packets.
115
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Poisson
traces
c.d.f.
CTR
Figure 4.12: The c.d.f. of the CTR of BGM for ∆ = 5: CTR on traces vs. CTRon Poisson processes.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Poisson
traces
c.d.f.
CTR
Figure 4.13: The c.d.f. of the CTR of BMR for M = 20: CTR on traces vs. CTRon Poisson processes.
On the other hand, if the interarrival distributions are light-tailed such as the
uniform distribution, it can be shown that the opposite results hold. In terms of
tailweight, we have analyzed a popular medium-tailed distribution, the exponential
distribution, and our results should be viewed as a benchmark for other tailweights.
116
4.8 Simulations
In this section, we simulate the proposed detectors on both synthetic Poisson traffic
and internet traces. The simulations on Poisson traffic are meant to verify our
analysis and examine properties of the proposed detectors, whereas the simulations
on traces are mainly used to verify the performance on actual traffic and show the
relative advantages of our detectors compared with existing flow detectors.
4.8.1 Synthetic Data
For synthetic data, (S1, . . . , Sn) is a sequence of independent Poisson processes of
rate λ under H0. Under H1, it is the mixture of an information flow (F1, . . . , Fn)
of rate (1− fc)λ (for some fc ∈ (0, 1)) and chaff traffic (W1, . . . , Wn), where Wi
(i = 1, . . . , n) are independent Poisson processes of rate fcλ. Here the parameter
fc is the CTR. The process F1 is a Poisson process of rate (1− fc)λ, and its relays
Fi (i > 1) are generated as follows. For information flows with bounded delay,
Fi = sort(Fi−1(1) + D1, Fi−1(2) + D2, . . .), i > 1,
where Fi−1 = (Fi−1(1), Fi−1(2), . . .), and D1, D2, . . . are i.i.d. delays uniformly
distributed in [0, ∆]. For information flows with bounded memory, we partition
the epochs of Fi−1 into groups of size ⌊M/2⌋, where the jth group is
(Fi−1((j − 1)⌊M/2⌋), . . . , Fi−1(j⌊M/2⌋ − 1)).
Then Fi is generated by selecting ⌊M/2⌋ epochs independently and uniformly from
the interval [Fi−1((j − 1)⌊M/2⌋), Fi−1(j⌊M/2⌋)) for each j ≥ 2. As illustrated in
Fig. 4.14, if we match epochs in the generated realizations fi−1 and fi (i ≥ 2)
sequentially, then the matching satisfies the bounded memory constraint.
117
f1
f2
⌊M2⌋
Figure 4.14: Generating information flows with bounded memory (⌊M2⌋ = 3): f2
is generated by storing ⌊M2⌋ packets from f1 and randomly releasing
these packets during the arrival of the next ⌊M2⌋ packets.
Explanations of the parameters used in this simulation are summarized in Ta-
ble 4.2. We are mainly interested in the influence of changing n on the detection
performance. Since it can be shown that increasing n has opposite effects on the
false alarm and the miss probabilities, we plot the receiver operating characteristics
(ROCs) [30] for different n.
Table 4.2: Parameters for Simulations on Synthetic Data.n the number of processesλ the rate of Si (i = 1, . . . , n)∆ maximum delayM maximum memory sizefc CTR
We first fix the sample size per process and vary the threshold to plot the ROCs
for bounded delay flows and bounded memory flows; see Fig. 4.15, 4.16. From
the plots, we see that the ROCs approach the upper left corner (i.e., zero error
probabilities) as n increases, implying that the detector has better performance
as the number of processes increases. This is as expected because as n increases,
the detector has more observations, and thus the detection performance should be
improved.
We then fix the total sample size and plot the ROCs for different n; see Fig. 4.17,
118
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n = 2n = 3n = 4n = 5n = 6
PF (δt)
1−
PM
(δt)
n ↑
Figure 4.15: The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, 100 packets per process, 10000 Monte Carloruns.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n = 2n = 3n = 4n = 5n = 6
PF (δt)
1−
PM
(δt)
n ↑
Figure 4.16: The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, 40 packets per process, 10000 Monte Carloruns.
4.18. We want to find out whether, given a total sample size, we should do pairwise
detection or joint detection over multiple hops. As illustrated in Fig. 4.17, given
a total sample size of 200, the ROCs for n = 3 outer bounds the ROCs for both
n = 2 and n = 4, . . . , 6. Similar observations can be obtained from Fig. 4.18.
The observations suggest that if the total sample size is constrained, then there
is an optimal n such that joint detection over n nodes optimizes the performance.
Intuitively, this is because as n increases, the sample size per process decreases,
119
making the detection more difficult, but the constraints on information flows be-
come tighter, making the detection easier. These contradictory effects lead to a
tradeoff between sample size and path length, which results in the optimal n. Note
that here we have assumed the flow path to be sufficiently long.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n = 2n = 3n = 4n = 5n = 6
PF (δt)
1−
PM
(δt)
n = 3 : 6
n = 2
Figure 4.17: The ROCs for detecting bounded delay flows: λ = 4, ∆ = 1,fc = 0.2, n = 2, . . . , 6, totally 200 packets over all process, 10000Monte Carlo runs.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n = 2n = 3n = 4n = 5n = 6
PF (δt)
1−
PM
(δt)
n = 3 : 5
n = 2n = 2n = 6n = 6
Figure 4.18: The ROCs for detecting bounded memory flows: λ = 4, M = 4,fc = 0.2, n = 2, . . . , 6, totally 100 packets over all process, 10000Monte Carlo runs.
Note that the levels of detectability at n = 2 for bounded delay flows and
bounded memory flows are both equal to 0.2. By the discussions following Corol-
laries 1 and 2, we know that with 20% chaff noise, we can generate information
120
flows which make the detection no better than random guessing. In the simulation
(Fig. 4.15–4.18), however, the detection is clearly much better than random guess-
ing. This observation shows that the flow-generating models used in the simulation
are not optimal. If we compare the ROCs for bounded delay flows with those for
bounded memory flows9 (i.e., Fig. 4.15 vs. Fig. 4.16 and Fig. 4.17 vs. Fig. 4.18),
we see that the detector of bounded memory flows outperforms that of bounded
delay flows for these flow-generating models.
4.8.2 Traces
For simulation on traces, we use the TCP traces in LBL-PKT-4 referenced in
Section 4.7. We extract 134 flows from the TCP packets in LBL-PKT-4. Each
flow has at least 1000 packets, and 4 of them have at least 10000 packets. Only
pairwise detection is simulated due to the limited data. Under H0, (S1, S2) is a
pair of different traces of size 1000. Under H1, Si = Fi ⊕ Wi (i = 1, 2), where
Wi consists of Nc packets i.i.d. uniformly distributed on the range of Fi. The
process F1 is a trace of size 10000, and F2 is generated by bounded delay or
bounded memory perturbations as those in Section 4.8.1. Parameters used in this
simulation are explained in Table 4.3.
Table 4.3: Parameters for Simulations on Traces.N total number of epochs∆ maximum delayM maximum memory sizeNc number of chaff packets per process
We compare the proposed detectors for bounded delay flows (denoted by δBD)
9We have made M = λ∆ for fair comparison.
121
and bounded memory flows (denoted by δBM) with the detector δDAC using algorithm
“Detect-Attacks-Chaff” (DAC) for bounded memory flows10 (by Blum et al. in [5])
and the detector δS-III using algorithm S-III for bounded delay flows (by Zhang et
al. in [49]). We first simulate the false alarm probabilities; see Fig. 4.19. We
choose the thresholds of δBD and δBM such that their false alarm probabilities are
comparable with that of δDAC. The false alarm probabilities of these three detectors
level off after the sample size 1000; the false alarm probability of δS-III, however,
keeps decreasing to a much smaller value. From the plot, we see that the false
alarm probabilities of δBM, δBD, and δDAC do not decay exponentially. It is possible
that the false alarm probability of δS-III decays exponentially, but we do not have
enough data in these traces to verify that.
0 200 400 600 800 1000 1200 1400 1600 1800 200010
−5
10−4
10−3
10−2
10−1
100
N
PF
(δ)
δBM
δBD
δDAC δS-III
Figure 4.19: PF (δBM), PF (δDAC), PF (δBD), and PF (δS-III) on LBL-PKT-4: M = 20,∆ = 5, threshold for δBD = 1/14, threshold for δBM = 1/21, tested on134 × 133 trace pairs.
We then simulate the miss probabilities of δBM and δDAC; see Fig. 4.20. For
each of the 4 traces of size 10000, we generate 1000 bounded memory flows inde-
pendently. The simulation shows that δBM has much lower miss probability than
δDAC. In fact, δBM detects all the information flows, whereas δDAC has up to 27.58%
10Originally, DAC was proposed for information flows with bounded delay and bounded peakrate, but it is applicable for bounded memory flows as discussed in Section 3.5.1.
122
misses by sample size 22000. The plot also shows that the miss probability of δDAC
increases with the sample size. This is because as the sample size increases, the
average number of chaff packets also increases, and δDAC can only handle a fixed
number of chaff packets. Note that although our analysis says that δBM is only
consistent for CTR up to 0.0476, δBM survives CTR = 0.1 in the simulation, which
implies that the uniform chaff insertion is not optimal for bounded memory flows.
0 0.5 1 1.5 2 2.5
x 104
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
N
PM
(δ)
δBM
δDAC
Figure 4.20: PM(δBM) and PM(δDAC): M = 20, Nc = 1000, threshold forδBM = 1/21, tested on 4000 bounded memory flows.
Next we simulate the miss probabilities of δBD and δS-III; see Fig. 4.21. We
generate 1000 bounded delay flows independently from each of the traces of size
10000. The plot confirms that δBD has a much smaller miss probability than δS-III;
actually, in the simulation, δBD has no miss for almost all the sample sizes11. This
is because δBD can tolerate a certain fraction of chaff packets no matter where
they are inserted, whereas δS-III is vulnerable to chaff packets in S1. As in the case
of bounded memory flows, we see that δBD handles much more chaff noise than
predicted by the analysis, which shows that uniform chaff insertion is also not
optimal for bounded delay flows. Moreover, comparing Fig. 4.20 and Fig. 4.21, we
see that δDAC is more robust to chaff noise than δS-III.
11The exception is at the sample size 3000, where we have 6 misses out of 4000 informationflows.
123
0 0.5 1 1.5 2 2.5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N
PM
(δ)
δBD
δS-III
Figure 4.21: PM(δBD) and PM(δS-III): ∆ = 5, Nc = 1000, threshold for δBD = 1/14,tested on 4000 bounded delay flows.
4.9 Summary
This chapter addresses timing-based detection of information flows in the pres-
ence of active perturbations and chaff noise. It characterizes the detectability
of information flows in terms of the maximum amount of chaff noise that allows
consistent detection and shows how to design the detector to achieve consistent de-
tection based on knowledge of the null hypothesis. The Poisson assumption under
the null hypothesis makes our results lower bounds on the detection performance
of practical information flows. The proposed detector coupled with capacity con-
straints between neighbor nodes can capture all the long-lived information flows
with positive rates and sufficiently long paths.
124
APPENDIX 4.A
PROOF OF CHAPTER 4
Proof of Theorem 6
Let Yj be the jth packet delay, i.e., Yj = S2(j) − S1(j). Define
Zj∆= Yj − Yj−1 = (S2(j) − S2(j − 1)) − (S1(j) − S1(j − 1)).
We see that Zj’s are i.i.d. random variables; each Zj is the difference between two
independent exponential random variables with mean 1/λ2 and 1/λ1, respectively.
The process Yj∞j=1 is a general random walk with step Zj. Define Y0 = 0.
Now for every chaff packet inserted at t in S2, we insert a virtual packet at t
in S1; for every chaff packet at s in S1, we insert a virtual packet at s + ∆ in S2,
as illustrated in Fig. 4.22. Let the new packet delays after the insertion of virtual
packets be Y ′j ∞j=0. It can be shown that Y ′
j ∞j=0 is also a random walk with step
Zj, but it has two reflecting barriers at 0 and ∆, i.e.,
Y ′j = min(max(Y ′
j−1 + Zj, 0), ∆).
Since it is almost surely impossible for Y ′j−1 + Zj to be exactly equal to 0 or
∆, each time Y ′j = 0 corresponds to a chaff packet in S2, and Y ′
j = ∆ corresponds
to a chaff packet in S1. Thus, the limiting probability for a packet to be chaff
is h∆/(1 − h0) in S1, and h0/(1 − h∆) in S2, where h0 = limj→∞
PrY ′j = 0, and
h∆ = limj→∞
PrY ′j = ∆. The overall probability for a packet in S1
⊕
S2 to be chaff
is a weighted sum
λ1h∆
(λ1 + λ2)(1 − h0)+
λ2h0
(λ1 + λ2)(1 − h∆). (4.6)
125
0
0 S1
S2t
s∆
Chaff Attackingpacket
packetVirtual
Figure 4.22: Inserting virtual packets to calculate the delays of chaff packets.
By ergodicity of Y ′j ∞j=0, the CTR of BGM converges to the limiting probability
in (4.6) almost surely.
Now we calculate h0 and h∆. Let the equilibrium distribution function of Y ′j
be H(x), i.e., H(x) = limj→∞
PrY ′j ≤ x. It is shown in Example 2.16 in [8] that
h0 = H(0) =
1−λ1λ2
1−(
λ1λ2
)2e∆(λ1−λ2)
if λ1 6= λ2,
12+λ1∆
o.w.
and
h∆ = 1 − H(∆−) =
(
λ1λ2
)
e∆(λ1−λ2)(
1−λ1λ2
)
1−(
λ1λ2
)2e∆(λ1−λ2)
if λ1 6= λ2,
12+λ1∆
o.w.
Therefore, by (4.6), we have that the CTR of BGM satisfies
limt→∞
CTRBGM(t)
=
(λ2−λ1)(
1+(
λ1λ2
)
e∆(λ1−λ2))
(λ1+λ2)(
1−(
λ1λ2
)
e∆(λ1−λ2)) if λ1 6= λ2,
11+λ1∆
if λ1 = λ2,
almost surely.
¥
126
Proof of Proposition 13
Algorithm BMR is feasible since the non-chaff part (f1, f2) satisfies the bounded
memory constraint. It remains to show the optimality.
Assume that C∗ is an optimal chaff-inserting algorithm. If M1(k − 1) = M
and sk ∈ S1, then node R2 has an arrival when the memory is full, and C∗ has to
drop at least one arriving packet at or before sk to prevent memory overflow. If
M1(k − 1) = 0 and sk ∈ S2, then R2 has a departure when the memory is empty,
so C∗ has to insert at least one dummy packet at or before sk in s2 to prevent
memory underflow. Therefore, BMR inserts no more chaff than C∗.
¥
Proof of Theorem 7
If S1 and S2 are independent Poisson processes of rates λ1 and λ2 respectively,
then it is known that the cumulative differences d(w) defined in (4.5) form
a simple random walk. Algorithm BMR assigns chaff such that the cumulative
differences d′(w) of the processes F1 and F2 satisfy 0 ≤ d′(w) ≤ M for all w.
By the memoryless property of exponential interarrival times, it is easy to see that
d′(w) is a random walk with reflecting barriers at 0 and M (i.e., a Markov chain
with state space 0, . . . , M). Its transition probabilities are shown in Fig. 4.23.
0 1 M
pppp
qqq q
Figure 4.23: The Markov chain formed by d′(w); p = λ1
λ1+λ2, q = 1 − p.
127
It is easy to see that d′(w) is an irreducible, aperiodic, and positive recurrent
Markov chain, and thus has a limit distribution (π0, . . . , πM). Since the limit
distribution satisfies
πi =λ1
λ2
πi−1, i = 1, . . . ,M,
we have
π0 =
1−λ1λ2
1−(
λ1λ2
)M+1 if λ1 6= λ2,
11+M
o.w.
πM =
(
λ1λ2
)M(
1−λ1λ2
)
1−(
λ1λ2
)M+1 if λ1 6= λ2,
11+M
o.w.
The physical meaning of d′(w) is the memory size after the transmission of the
wth packets in S1
⊕
S2. The self-loop at state 0 corresponds to chaff packets in
S2 because these transmissions occur when the memory is empty (so they have to
be dummy packets); the self-loop at state M corresponds to chaff in S1 because
the transmissions occur when the memory is full (so the packets will be dropped).
By ergodicity of d′(w), as w → ∞, the CTR of BMR converges to the limiting
probability of self-loops almost surely. The limiting probability is a weighted sum
π0q + πMp, which is equal to
(λ2−λ1)
(
1+(
λ1λ2
)M+1)
(λ1+λ2)
(
1−(
λ1λ2
)M+1) if λ1 6= λ2,
11+M
if λ1 = λ2.
¥
128
Proof of Proposition 14
By expanding the recursions of MBDR, it can be shown that MBDR is equivalent
to an algorithm which finds the earliest sequence of relay epochs for each packet
in s1. That is, for s ∈ S1, if p = (s, t2, . . . , tn) (ti ∈ Si) is a sequence of relay
epochs for s, then MBDR finds the sequence p = (s, t2, . . . , tn) such that
1. p satisfies the causality and the bounded delay constraints;
2. ti ≤ ti (i = 2, . . . , n) for any other sequence of relay epochs that satisfies
these constraints.
We will refer to a sequence of relay epochs as a relay sequence.
A set of relay sequences preserves the order of packets if for any two sequences
(ti)ni=1 and (t′i)
ni=1 in the set, t1 ≤ t′1 implies ti ≤ t′i for all i = 2, . . . , n. We will use
the following result.
Lemma 3 Among all sets of relay sequences satisfying the constraints of causality,
packet-conservation, and bounded delay, there always exists a set which has the
largest size and preserves the order of packets.
By this lemma, it suffices to search among order-preserving sets of relay se-
quences. It remains to show that it is optimal to find the earliest relay sequences.
Let P be the set of relay sequences found by MBDR, and P∗ be the largest
and order-preserving set of relay sequences. Suppose s1 ∈ S1 has a relay sequence
p∗1 ∈ P∗ but not in P, as illustrated in Fig. 4.24. Then there must be relay
sequences in P which start earlier than s1 and partly overlap with p∗1 (otherwise,
129
MBDR would have chosen p∗1 or a sequence earlier than p∗1 for s1). Let the earliest
of these sequences be p1, with starting epoch s2 ∈ S1. For j = 2, 3, . . ., do the
following.
i) If sj does not have a relay sequence in P∗, we stop searching; otherwise,
suppose that sj has a relay sequence p∗j ∈ P∗.
ii) The sequence p∗j is at least partly earlier than pj−1 because p∗j is earlier than
p∗j−1 and p∗j−1 partly overlaps with pj−1. Since MBDR has not chosen the
earlier part of p∗j , it implies that there must be sequences in P earlier than
pj−1, which partly overlap with p∗j . Let the earliest of these sequences be pj,
with starting epoch sj+1 ∈ S1. Continue with i).
· · ·...
s1
sn
s1s2s3smsm+1
p1
p∗1p∗2
p2p∗mpm
Figure 4.24: Every relay sequence in P∗ corresponds to a relay sequence in P;solid line: sequences in P; dashed line: sequences in P∗.
When we stop searching, we will either find an epoch in s1 which has a relay
sequence in P but not in P∗, or reach a relay sequence pm ∈ P which starts before
all the relay sequences in P∗. Therefore, for every relay sequence in P∗, we can
find a different sequence in P, which implies that the size of P is no smaller than
that of P∗.
¥
130
Proof of Lemma 3
The proof is by induction. As illustrated in Fig. 4.25, suppose that
(s1(1), s2(2), s3(1)) and (s1(2), s2(1), s3(2)) are relay sequences satisfying causal-
ity, packet-conservation, and bounded delay. By switching the intersected part, we
obtain two sequences (s1(1), s2(1), s3(1)) and (s1(2), s2(2), s3(2)) which satisfy
these constraints and also preserve the packet order. By repeatedly applying such
switching, we can reorganize any set of relay sequences into an order-preserving
set and maintain satisfaction of the constraints.
s1
s2
s3
s1(1) s1(2)
s2(1)
s2(2)
s3(1) s3(2)
Figure 4.25: Solid lines denote the original relay sequences; dashed lines denotethe reorganized relay sequences which preserve the order of packets.
¥
Proof of Theorem 8
We bound the CTR of MBDR by deriving upper bounds on the probability for an
arbitrary packet in S1 to have a match. Then the result of Theorem 8 holds by
ergodicity. Compared with the first packet, subsequent packets are more difficult
to match because some of their relay epochs may have been used to relay previous
packets. Thus, it suffices to upper bound the probability for the first packet to
131
have a match. Denote this probability by Pn.
First, note that a necessary condition for the first packet at time t to have a
match is that the corresponding intervals [t, t + (i − 1)∆] in Si (i = 2, . . . , n)
in which the packet can be relayed are all nonempty. The probability for this
event is at mostn−1∏
i=1
(1− e−iλ∆) (achievable if all the processes have rate λ). Thus,
Pn ≤n−1∏
i=1
(1 − e−iλ∆).
Next, we prove by induction that Pn is also upper bounded by (λ∆)n−2(1 −
e−λ∆). For n = 2, this bound is the same as the upper bound derived above.
Assume that the result holds for Pn−1 (n ≥ 3). By writing Pn in parts with
respect to the number of epochs within delay ∆ in S2, we have
Pn ≤∞
∑
k=1
(λ∆)k
k!e−λ∆
·Prat least one of the k epochs has a match
≤∞
∑
k=1
(λ∆)k
k!e−λ∆kPn−1 (4.7)
= λ∆Pn−1,
where union bound is used to obtain (4.7). Hence, we have shown that Pn ≤
(λ∆)n−2(1 − e−λ∆).
Combining these two bounds, we have that the CTR of MBDR is lower bounded
by 1 − min(n−1∏
i=1
(1 − e−iλ∆), (λ∆)n−2(1 − e−λ∆)) a.s.
¥
132
Proof of Theorem 9
We prove the theorem by induction.
For n = 2, we have seen from Theorem 7 that the minimum CTR of MBMR is
1/(1 + M).
Assume the result holds up to n (n ≥ 2). For (n + 1)-hop flows, it suffices to
show that 1 − un+1 ≤ limt→∞
CTRMBMR(t) ≤ 1 − ln+1 a.s. when Si’s have equal rate.
This is because equal rate is the case that minimizes CTR (which can be shown
by arguments similar to Theorem 7). We prove the result by showing that the
asymptotic fraction of non-chaff packets (i.e., 1− CTR) is bounded between ln+1
and un+1.
Note that the output of a relay node is no longer a Poisson process. This
is because the probability of finding another information-carrying packet af-
ter an information-carrying packet is greater than the probability of finding an
information-carrying packet after a chaff packet. The precise model to decide
whether a packet is chaff or not is the Markov chain shown in Appendix 4.B. As a
result, the arrival process at node Rn+1 is more regular than a Poisson process of
the same rate.
For the lower bound, assuming Si’s all have rate λ, we substitute the arrival
process at node Rn+1 with a Poisson process of rate λln. Since we destroy the
regularity and may also reduce the rate (because λln is a lower bound on the
rate), this substitution gives us a lower bound on the fraction of non-chaff packets.
For this arrival process and an independent Poisson process of rate λ which is
the departing process of Rn+1, we know from the proof of Theorem 7 that the
133
asymptotic fraction of chaff packets in the departing process is
π0 =1 − λ1
λ2
1 −(
λ1
λ2
)M+1,
where λ1 = λln and λ2 = λ. Therefore, we have that the asymptotic fraction of
non-chaff packets is lower bounded by
1 − π0 = 1 − 1 − ln1 − lM+1
n
,
which is equal to ln+1.
For the upper bound, we consider the following arrival process at node Rn+1.
The process is generated by dividing points in a Poisson process of rate λ into
consecutive groups of size M/un and selecting M consecutive points from the
beginning of each group. Analogous to conventional batched processes, we refer to
the group size M/un as the period, and M as the batch size. A realization of such
a process is drawn in Fig. 4.26.
batch
periodtime
Figure 4.26: A “batched” arrival process generated from a Poisson process. ¥:arrival epochs; : points in the underlying Poisson process; M = 2,period= 5.
We consider such a batched process because it maximizes the time between the
(kM)th arrival and the (kM + 1)th arrival (k ∈ N) so that it is least likely for
the memory to be overflowed. Moreover, we choose the period to make the arrival
rate equal to λun (which may be higher than the actual rate). Therefore, using
this arrival process allows us to obtain an upper bound on the fraction of non-chaff
packets.
134
Consider such an arrival process and an independent Poisson process of rate
λ. After the first arrival in a period, with probability 2−M/un , there will be no
departure until the first arrival in the next period. In this case, there are M + 1
consecutive arrivals, and thus at least 1 packet will be dropped. Hence, the fraction
of dropped packets at node Rn+1 is lower bounded by 2−M/un/(M +1), i.e., at most
1 − 2−M/un/(M + 1) fraction of the information-carrying packets arriving at Rn+1
can be successfully relayed. Since at most un fraction of the incoming packets of
Rn+1 is carrying information, the overall fraction of information-carrying packets
relayed by Rn+1 is upper bounded by
un
(
1 − 1
M + 12−M/un
)
,
which is equal to un+1.
¥
Proof of Theorem 12
We prove the theorem for bounded delay flows and bounded memory flows sepa-
rately. Here we present the proof for n = 2; the proof for n ≥ 2 is analogous.
Proof for Bounded Memory Flows
By Theorem 7, we know that the false alarm probability is maximized when λ1 =
λ2, where λi (i = 1, 2) is the rate of Si. Consider this equal rate case.
Define T1 to be the number of packets in S1
⊕
S2 until the first chaff packet,
including the first chaff packet, and Ti (i > 1) the number of packets between the
135
(i − 1)th and ith chaff packets, excluding the (i − 1)th chaff packet but including
the ith. Let C be the number of chaff packets found by BMR. Then the false alarm
probability can be written as
PF (δt) = PrC ≤ τ2N
= Prτ2N∑
i=1
Ti ≥ N
= Pr 1
τ2N
τ2N∑
i=1
Ti ≥1
τ2
. (4.8)
It is know that for Poisson processes, the cumulative differences d(k)k=1, 2,...
defined in (4.5) form a simple random walk with Prd(k) = d + 1|d(k − 1) = d =
1/2. The Markovian property implies that T1, T2, . . . are independent, and for
i ≥ 2, Ti has the same distribution as N−1, M+1 defined by
N−1, M+1∆= infk : d(k) = −1 or M + 1 | d(0) = 0. (4.9)
By Theorem 7, we know that the ratio C/N will almost surely converge to
1/(1 + M) as N → ∞, i.e., limc→∞
c/(c
∑
i=1
Ti) = 1/(1 + M) almost surely. It implies
that limc→∞
1c
c∑
i=1
Ti = 1 + M almost surely, and thus E[Ti] = 1 + M (i ≥ 2).
Now that Ti’s (i ≥ 2) are i.i.d. , by Sanov’s Theorem in [7], we have
limN ′→∞
1
N ′ log Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2 =
− minW : E[W ]≥1/τ2
D(W ||T2),
where N ′ = τ2N . By (4.8), we obtain that
limN→∞
1
Nlog PF (δt) = −τ2 min
W : E[W ]≥1/τ2D(W ||T2)
∆= −Γ2(τ2; M).
136
It is difficult to compute Γ2(τ2; M) directly, but the computation can be reduced
to an optimization over a single variable by Cramer’s Theorem [9]. Nevertheless, as
long as 1/τ2 > 1 + M , we have that E[W ] > E[T2], and thus Γ2(τ2; M) is positive.
By the definition of Γ2(τ2; M), it is easy to see that it is a decreasing function of
τ2.
Proof for Bounded Delay Flows
The proof for bounded delay flows is similar to that for bounded memory flows.
By Theorem 6, we see that the false alarm probability is maximized when S1 and
S2 both have the maximum rate λ. Consider this case.
Let Ti (i ≥ 1) be defined the same as in the proof for bounded memory flows.
Then the false alarm probability can be written as
PF (δt) = Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2, (4.10)
where N ′ = τ2N .
Let Yj be defined as in the proof of Theorem 6. We have shown that the
process Yjj=1, 2,... is a general random walk. For i ≥ 2, Ti’s are i.i.d. with the
same distribution as
2 · infj : Yj < 0 or Yj > ∆ | Y0 = 0 − 1. (4.11)
Let C be the number of chaff packets found by BGM. By Theorem 6, we have
limN→∞
C/N = 1/(1 + λ∆) almost surely. Thus, limc→∞
1c
c∑
i=1
Ti = 1 + λ∆ almost surely,
which implies that E[T2] = 1 + λ∆.
137
By Sanov’s Theorem [7], we have
limN ′→∞
1
N ′ log Pr 1
N ′
N ′∑
i=1
Ti ≥ 1/τ2 =
− minW : E[W ]≥1/τ2
D(W ||T2).
Plugging in (4.10) yields that
limN→∞
1
Nlog PF (δt) = −τ2 min
E[W ]≥1/τ2D(W ||T2)
∆= −Γ2(τ2; λ, ∆).
For 1/τ2 > 1 + λ∆, we have that E[W ] > E[T2], and therefore Γ2(τ2; λ, ∆) >
0. As τ2 increases, the minimization is over a larger set, and thus Γ2(τ2; λ, ∆)
decreases. This completes the proof.
¥
Proof of Theorem 13
The proof utilizes the ideas in the proofs of Theorem 6 and 7.
The classical Pareto distribution (see [28]) with shape parameter β (β ≥ 0)
and location parameter α (α ≥ 0) has the probability density function
p(x) = βαβx−β−1, x ≥ α.
This distribution has a property that the conditional expectation E[X − x|X ≥ x]
is an increasing function of x.
For information flows with bounded memory, consider the cumulative differ-
ences d′(w)∞w=0 between the processes of matched epochs found by BMR, as
138
defined in the proof of Theorem 7. The CTR of BMR is the frequency of self-loops
in d′(w)∞w=0, as illustrated in Fig. 4.23. Unlike exponential distribution, Pareto
distribution has memory, and the resulting d′(w)∞w=0 is not Markovian. Note,
however, that the memory of interarrival times makes it easier to reach the states
0 and M and generate self-loops. This is because whenever d′(w) increases by 1,
the next arrival is more likely to be in S1 (since the arrival in S2 has waited for
some time and it is likely to wait even longer), and thus d′(w) is likely to keep
increasing. Hence the average time to reach 0, M is shorter than that for Pois-
son processes. At the state 0 (or M), the same argument implies that it is more
likely to take more self-loops after a self-loop. Therefore, BMR inserts more chaff
noise in independent renewal processes with Pareto interarrival distributions than
in independent Poisson processes.
For information flows with bounded delay, similar arguments hold. The process
Y ′j ∞j=0 defined in the proof of Theorem 6 is no longer a Markov process under
Pareto interarrival distributions, but we can show that the endpoints 0 and ∆ are
visited more frequently and therefore produce more chaff noise.
¥
139
APPENDIX 4.B
ASYMPTOTIC CTR OF MBMR
Here we show how to calculate the CTR of MBMR by a Markov chain. In
particular, we are interested in computing βMn (n ≥ 2). Assume the processes are
independent and Poisson under H0.
If S1, . . . , Sn are independent Poisson processes, then the vectors
(Mi(k))n−1i=1 ∞k=0 computed by MBMR form an (n − 1)-dimensional homogeneous
Markov chain. By arguments similar to that in the proof of Theorem 7, it can be
shown that the CTR is minimized when all Si’s have equal rate, in which case the
CTR of MBMR is βMn . We will focus on the equal rate case although the method
is easily generalizable to arbitrary rates.
If Si (i = 1, . . . , n) have equal rate, then the transition probabili-
ties of (Mi(k))n−1i=1 are as follows. Denote the transition probability by
Prmn−11 |m′n−1
1 , where mn−11 , m′n−1
1 ∈ 0, . . . , Mn−1, and (mi, . . . , mj) (i ≤ j)
by mji . For 2 ≤ i ≤ n − 1, mi−1 > 0, and mi < M ,
Pr(mi−21 , mi−1 − 1, mi + 1, mn−1
i+1 |mn−11 ) =
1
n;
for m1 < M ,
Pr(m1 + 1, mn−12 |mn−1
1 ) =1
n;
for mn−1 > 0,
Pr(mn−21 , mn−1 − 1|mn−1
1 ) =1
n;
140
moreover,
Pr(mn−11 |mn−1
1 ) =1
n
·(
Im1=M +n−1∑
i=2
Imi−1=0 ∨ mi=M + Imn−1=0
)
.
According to MBMR, each self-loop corresponds to a chaff packet, and therefore
the CTR is equal to the probability of self-loops in the equilibrium distribution.
That is, if π is the equilibrium distribution of (Mi(k))n−1i=1 , then the CTR of
MBMR converges to the limiting probability of self-loops, denoted by ηn, almost
surely, where
ηn =∑
mn−11 ∈0,...,Mn−1
π(mn−11 ) Pr(mn−1
1 |mn−11 ).
For example, for n = 3 and M = 2, (M1(k), M2(k)) (k ≥ 0) follows the Markov
chain in Fig. 4.27. Here η3 = 13( 1
15+ 2×4
45+ 2×1
9) + 2
3(2×4
45+ 2
9) = 19
45. This is the
CTR of MBMR for 3-hop information flows with memory sizes bounded by 2, i.e.,
β23 = 19
45.
0, 0 2, 0
0, 1 1, 1 2, 1
0, 2 1, 2 2, 2
13
13
13
13
13
13
23
23
23
115
445
445
445
445
19
19
215
29
29
Figure 4.27: The Markov chain of (M1(k), M2(k))∞k=0. All straight lines havetransition probability 1/3. All the states are marked with theirlimiting probabilities, e.g.,π(0, 2) = 1/15.
141
APPENDIX 4.C
ALGORITHMS OF CHAPTER 4
Chaff-inserting Algorithm for Two-hop Bounded Delay
Flows
For the algorithm BGM presented in Section 4.4.1, we combine the insertion of
chaff and the matching of information-carrying packets into the implementation
presented in Table 4.4.
Table 4.4: Bounded-Greedy-Match (BGM).
Bounded-Greedy-Match(s1, s2, ∆):m = n = 1;while m ≤ |S1| and n ≤ |S2|
if s2(n) − s1(m) < 0s2(n) = chaff; n = n + 1;
else if s2(n) − s1(m) > ∆s1(m) = chaff; m = m + 1;
elsematch s1(m) with s2(n);m = m + 1; n = n + 1;
endend
end
This implementation of BGM uses two pointers m and n to record the current
epochs examined in s1 and s2, and keeps updating m and n depending on whether
the match is successful or not. Its complexity is O(|S1| + |S2|).
142
Chaff-inserting Algorithm for Multi-hop Bounded Delay
Flows
Implementation of the algorithm MBDR presented in Section 4.5.1 is presented in
Table 4.5. The complexity of such an direct implementation is O((λ∆)n|S1|) (λ
is the maximum rate of S1, . . . , Sn).
Table 4.5: Multi-Bounded-Delay-Relay (MBDR).
Multi-Bounded-Delay-Relay(s1, . . . , sn, ∆):for k = 1 : |S1|
match of s1(k) = MBDR1(s1(k), 1, s1, . . . , sn, ∆);if match of s1(k) = ∅
s1(k) = chaff;end
end
MBDR1(s, i, s1, . . . , sn, ∆):for t ∈ Si+1 ∩ [s, s + ∆]
match of t = MBDR1(t, i + 1, s1, . . . , sn, ∆);if match of t = ∅
t = chaff;else
return t;end
endreturn ∅;
Performance of recursive algorithms can often be improved by expanding re-
cursions. An implementation of expanded MBDR is shown in Table 4.6. The
complexity of this implementation is12 O(n2|S1|).12The dominating step is the recursive computation of Ci, j ’s. Suppose that the maximum
rate of S1, . . . , Sn is λ, and thus there are at most (i − 1)λ∆ points in Ci, j on the average.The selection of these points takes (2i − 3)λ∆ steps. The total complexity can be calculated by
|S1|n∑
i=2
(2i − 3)λ∆ = λ∆(n − 1)2|S1|.
143
Table 4.6: Expanded-Multi-Bounded-Delay-Relay (E-MBDR).
Expanded-Multi-Bounded-Delay-Relay(s1, . . . , sn, ∆):(pi)
ni=1 = (0, . . . , 0);
for j = 1 : |S1|C1, j = s1(j);for i = 1 : n − 1
for all s ∈ Ci, j in increasing orderfor all t ∈ Si+1 ∩ [s, s + ∆], t > pi+1, and t 6∈ Ci+1, j
t.predecessor = s;add t to the set Ci+1, j;
endend
endif Cn, j 6= ∅
tn = min(Cn, j);for i = n − 1 : −1 : 1
ti = ti+1.predecessor;end(ti)
ni=1 is the sequence of relay epochs for s1(j);
(pi)ni=1 = (ti)
ni=1;
endend
for all s ∈n⋃
i=1
Si and s 6= any selected relay epoch
s = chaff;end
Chaff-inserting Algorithm for Two-hop Bounded Memory
Flows
A pseudo code implementation of BMR presented in Section 4.4.2 is given in
Table 4.7.
Note that once BMR marks out the chaff packets, the order in which
information-carrying packets are transmitted is irrelevant as far as the memory
constraint is concerned. The complexity of BMR is only O(|S1| + |S2|).
144
Table 4.7: Bounded-Memory-Relay (BMR).
Bounded-Memory-Relay(s1, s2, M):s = s1
⊕
s2;d = 0;for w = 1 : |S |
if (d = M and s(w) ∈ S1) or (d = 0 and s(w) ∈ S2)s(w) = chaff;
else if s(w) ∈ S1
d = d + 1;else
d = d − 1;end
endend
Chaff-inserting Algorithm for Multi-hop Bounded Memory
Flows
Algorithm MBMR is a direct generalization of BMR. Its implementation is given
in Table 4.8.
Algorithm MBMR has complexity O(n∑
i=1
|Si|). It uses Mi (i = 1, . . . , n− 1) to
record the number of packets stored in node Ri+1. The algorithm keeps updating
Mi’s and guarantees that each Mi is always between 0 and M , which implies that
the scheduling found by MBMR satisfies the bounded memory constraint.
Detection Algorithm for Two-hop Bounded Delay Flows
Algorithm “Detect-Bounded-Delay” (DBD) is derived to detect 2-hop information
flows with bounded delay. It does detection with the help of the optimal chaff-
145
Table 4.8: Multi-Bounded-Memory-Relay (MBMR).
Multi-Bounded-Memory-Relay(s1, . . . , sn, M):s = s1
⊕ · · ·⊕ sn;(M1, . . . , Mn−1) = (0, . . . , 0);for w = 1 : |S |
if s(w) ∈ S1
if M1 < MM1 = M1 + 1;
elses(w) = chaff;
endelse if s(w) ∈ Sn
if Mn−1 > 0Mn−1 = Mn−1 − 1;
elses(w) = chaff;
endelse
let i (1 < i < n) be such that s(w) ∈ Si;if Mi−1 > 0 & Mi < M
Mi−1 = Mi−1 − 1;Mi = Mi + 1;
elses(w) = chaff;
endend
endend
inserting algorithm BGM.
Given measurements (s1, s2), DBD
1. calculates C, the number of chaff packets assigned by BGM in s1
⊕
s2 but
excluding chaff in S2 ∩ [0, ∆);
2. returns H1 if the ratio of C and the total sample size is less than or equal to
1/(1 + λ′∆) (λ′ is a design parameter), and otherwise returns H0.
146
Implementation of DBD is presented in Table 4.9. The complexity of DBD is
O(N), where N is the joint sample size, i.e., the total number of examined packets
in s1
⊕
s2.
Table 4.9: Detect-Bounded-Delay (DBD).
Detect-Bounded-Delay(s1, s2, ∆, N, λ′):i = j = 1;C = 0;while i + j ≤ N
if s2(j) − s1(i) < 0if s2(j) ≥ ∆
C = C + 1;endj = j + 1;
else if s2(j) − s1(i) > ∆C = C + 1; i = i + 1;
elsei = i + 1; j = j + 1;
endend
end
return
H1 if CN
≤ 11+λ′∆
,
H0 o.w.;
Suppose H1 is true. Then the actual number of chaff packets in s1
⊕
s2 has to
be no smaller than C because BGM is optimal, and chaff packets in [0, ∆) in s2 have
been ignored (because they may be the relay packets of packets arriving before the
detector starts). It means that the actual CTR has to be more than 1/(1 + λ′∆)
to evade DBD. Therefore, DBD has no miss detection for CTR≤ 1/(1 + λ′∆).
147
Detection Algorithm for Multi-hop Bounded Delay Flows
We extend DBD to multiple hops by utilizing the multi-hop chaff-inserting al-
gorithm MBDR. The algorithm, called “Detect-Multi-Bounded-Delay” (DMBD),
works as follows.
Given measurements (s1, . . . , sn), DMBD
1. calculates C, the number of chaff packets found by MBDR, excluding chaff
packets in the beginning (i − 1)∆ period of si for i = 1, . . . , n;
2. returns H1 if the ratio between C and the total sample size is bounded by
τn (τn is a design parameter); otherwise, returns H0.
See Table 4.10 for an implementation of DMBD based the extended version of
MBDR. The complexity of this implementation is O(nN).
Since MBDR inserts the minimum number of chaff packets, and chaff packets
at the beginning of si are ignored (because they may be the relay packets of
information-carrying packets sent before the detector starts), C is always a lower
bound on the actual number of chaff packets, which means that the actual CTR
has to be larger than τn to evade DMBD. Therefore, DMBD has no miss detection
for CTR≤ τn.
Detection Algorithm for Two-hop Bounded Memory Flows
Algorithm “Detect-Bounded-Memory” (DBM) detects 2-hop information flows
with bounded memory based on the chaff-inserting algorithm BMR.
148
Table 4.10: Detect-Multi-Bounded-Delay (DMBD).
Detect-Multi-Bounded-Delay(s1, . . . , sn, ∆, N, τn):C = 0; K1 = 0;(J1, . . . , Jn) = (I1, . . . , In) = (0, . . . , 0);for i = 2 : n
Ki = supk : si(k) < (i − 1)∆;endj = 1;
whilen∑
i=1
Ji < N & j ≤ |S1|C1, j = s1(j);for i = 1 : n − 1
for all s ∈ Ci, j in increasing orderfor all t ∈ Si+1 ∩ [s, s + ∆], t > si+1(Ji+1), and t 6∈ Ci+1, j
t.predecessor = s; add t to the set Ci+1, j;end
endendif Cn, j 6= ∅
In = mink : sn(k) ∈ Cn, j;for i = n − 1 : −1 : 1
Ii is such that si(Ii) = si+1(Ii+1).predecessor;end
C = C +n∑
i=1
(Ii − max(Ji, Ki) − 1);
(J1, . . . , Jn) = (I1, . . . , In);endj = j + 1;
end
C = C + max(N −n∑
i=1
max(Ji, Ki), 0);
N = max(n∑
i=1
Ji, N);
return
H1 if CN
≤ τn
H0 o.w.;
149
Given measurements (s1, s2), DBM
1. calculates d(w) (w = 1, 2, . . .), the cumulative difference between s1 and s2
defined in (4.5) (d(0) = 0);
2. if v(w)∆= max
0≤k≤wd(k) − min
0≤k≤wd(k) is less than M for all w, returns H1;
otherwise, computes the smallest index w∗ such that v(w∗) = M ; let
du = max0≤k≤w∗
d(k) and dl = min0≤k≤w∗
d(k);
3. calculates C, the number of chaff packets assigned by BMR to keep the
variable d between dl and du (the original BMR keeps d between 0 and M);
4. returns H1 if the ratio of C and the total sample size is bounded by 1/(1+M ′)
(M ′ is a design parameter); otherwise, returns H0.
Implementation of DBM is given in Table 4.11. Algorithm DBM has complexity
O(N).
Table 4.11: Detect-Bounded-Memory (DBM).
Detect-Bounded-Memory(s1, s2, M, N, M ′):s = s1
⊕
s2;d = dmax = dmin = 0;C = 0;for w = 1 : N
if (s(w) ∈ S1, d − dmin = M) or (s(w) ∈ S2, dmax − d = M)C = C + 1;
else
d =
d + 1 if s(w) ∈ S1,d − 1 if s(w) ∈ S2;
dmax = max(dmax, d);dmin = min(dmin, d);
endend
return
H1 if CN
≤ 11+M ′ ,
H0 o.w.;
150
It is shown in [19] that the actual number of chaff packets in s1
⊕
s2 is lower
bounded by C. It implies that DBM has no miss detection for realizations of
information flows with CTR up to 1/(1 + M ′).
Detection Algorithm for Multi-hop Bounded Memory Flows
We extend DBM to a joint detection algorithms called “Detect-Multi-Bounded-
Memory” (DMBM) based on the chaff-inserting algorithm MBMR.
Given measurements (s1, . . . , sn), DMBM
1. for i = 1, . . . , n − 1, calculates vi(w)∆= max
0≤k≤wdi(k) − min
0≤k≤wdi(k) for w =
1, 2, . . ., where di(k) is the cumulative difference between si and si+1;
2. if vi(w) < M for all w, Ui = ∞ and Li = −∞; otherwise, let Ui =
max0≤k≤w∗
di(k), and Li = min0≤k≤w∗
di(k), where w∗ = infw : vi(w) ≥ M;
3. calculates C, the number of chaff packets assigned by MBMR to keep the
variable Mi (i = 1, . . . , n − 1) between Li and Ui (originally, MBMR keeps
Mi between 0 and M);
4. returns H1 if the ratio of C and the total sample size is bounded by τn (τn
is a design parameter); otherwise, returns H0.
Implementation of DMBM is presented in Table 4.12. The algorithm has com-
plexity O(N). The value of C is the number of times that memory overflow or
underflow would have occurred if chaff packets had not been inserted. Since the
actual number of chaff packets is at least C, the actual CTR has to be larger than
τn to evade DMBM.
151
Table 4.12: Detect-Multi-Bounded-Memory (DMBM).
Detect-Multi-Bounded-Memory(s1, . . . , sn, M, N, τn):s = s1
⊕ · · ·⊕ sn;(M1, . . . , Mn−1) = (U1, . . . , Un−1) = (L1, . . . , Ln−1) = (0, . . . , 0);C = 0;for w = 1 : N
i is such that s(w) ∈ Si;if i = 1
if M1 − L1 < MM1 = M1 + 1; U1 = max(U1, M1);
elseC = C + 1;
endelse if i = n
if Un−1 − Mn−1 < MMn−1 = Mn−1 − 1; Ln−1 = min(Ln−1, Mn−1);
elseC = C + 1;
endelse if Ui−1 − Mi−1 < M & Mi − Li < M
Mi−1 = Mi−1 − 1; Mi = Mi + 1;Li−1 = min(Li−1, Mi−1);Ui = max(Ui, Mi);
elseC = C + 1;
endend
endend
return
H1 if CN
≤ τn,H0 o.w.;
152
Chapter 5
Distributed Detection of Information
Flows
5.1 Outline
In the previous chapters, precise timing measurements have been used for de-
tection. In wide-area networks (e.g.,wireless sensor networks), there are usually
constraints on the communication rates between the points of measurements and
the detector. This chapter addresses this issue in the framework of distributed
detection. The rest of the chapter is organized as follows. Section 5.2 formulates
the problem. Section 5.3 defines the performance criteria and gives some theo-
retical results on the performance of general detection systems. Sections 5.4–5.6
are dedicated to practical detection systems, where Section 5.4 defines two simple
quantizers, Section 5.5 presents optimal chaff-inserting and detection algorithms
for each quantizer, and Section 5.6 analyzes and compares the performance of the
proposed detection systems. Then Section 5.7 concludes the chapter with a few
remarks.
153
5.2 The Problem Formulation
5.2.1 Problem Statement
In a wireless ad hoc network as illustrated in Fig. 5.1, nodes A and B may be car-
rying an information flow. If the nodes are transmitting an information flow, then
their transmission activities Si (i = 1, 2) can be decomposed into an information
flow (F1, F2) and chaff noise Wi, i.e., Si = Fi ⊕Wi (referred to as containing an
information flow). As in Chapter 4, we allow Wi to be any process, and it may
be correlated with Fi. In this chapter, we only consider information flows with
bounded delay ∆ defined in Definition 7.
A B
S1: S2:
Uplinkchannels
Detector
Wireless node Eavesdropper
Figure 5.1: In a wireless network, nodes A and B may serve on one or multipleroutes. Eavesdroppers are deployed to collect their transmissionactivities Si (i = 1, 2), which are then sent to a detector at the fusioncenter.
We are interested in testing the following hypotheses:
H0 : S1, S2 are independent,
H1 : (S1, S2) contains an information flow,(5.1)
by observing measurements compressed from Si (i = 1, 2). Assume that the
154
maximum delay ∆ is known. Moreover, assume that the marginal distributions of
Si (i = 1, 2) are known, and they are the same under both hypotheses (detailed
analysis is done for Poisson processes). This is a partially nonparametric hypothesis
testing because no statistical assumptions are imposed on the correlation of S1 and
S2 under H1.
We point out that the assumption that Si (i = 1, 2) have the same distributions
under both hypotheses is not a limiting assumption. This is because otherwise, an
eavesdropper can independently make a decision based on its own measurements
(e.g.,by the Anderson-Darling test [28]) and send the result (a 1-bit message) to
the fusion center, and the error probabilities can be made arbitrarily small if there
are enough measurements.
5.2.2 System Architecture
The capacity constraints in the uplink channels make it necessary to incorporate
quantizers q(t)i (i = 1, 2) at the eavesdroppers, where t is the duration of the
observation. As illustrated in Fig. 5.2, the processes Si (i = 1, 2) are compressed
into q(t)i (Si) which are delivered to the fusion center, and then the detector makes
a decision in the form of
θt = δt(q(t)1 (S1), q
(t)2 (S2)),
where1 θt ∈ 0, 1. The capacity constraints are expressed as2
||q(t)i || ≤ etRi , i = 1, 2 (5.2)
1The value 0 denotes H0, and 1 denotes H1.2The unit of Ri (i = 1, 2) is nats per unit time.
155
for sufficiently large t, where ||q(t)i || is the alphabet size of the output of q
(t)i .
Generally, R1, R2 < ∞, but if the detector is located at one of eavesdroppers,
e.g.,the eavesdropper of node B, then R2 = ∞, which is called the case of full
side-information.
S1
S2
q(t)1
q(t)2
U ∈ 1, . . . , etR1
V ∈ 1, . . . , etR2 δt
θt
Figure 5.2: A distributed detection system. This system consists of twoquantizers q
(t)1 and q
(t)2 and a detector δt.
Given (R1, R2), the problem is to design q(t)i (i = 1, 2) and δt such that the
overall detection performance is optimized.
5.3 Performance Criteria
In this section, we define the criteria for evaluating detection performance and
present theoretical results on the optimal performance under the proposed criteria.
5.3.1 Level of detectability
The detection performance in classical multiterminal hypothesis testing is usually
evaluated by the error exponents [16]. In our problem, the alternative hypothesis
is nonparametric, which makes it improper to adopt the error exponent criterion.
Instead, we measure the performance by the notion of consistency defined in Defini-
tion 11. The optimal performance establishes a level of detectability of information
flows as follows.
156
Given capacity constraints (R1, R2), we characterize the extent to which infor-
mation flows are detectable by a notion called the level of detectability, denoted by
α(R1, R2), which is defined as
α(R1, R2)
∆= supr ∈ [0, 1] : ∃(q
(t)1 , q
(t)2 , δt) :
1) δt is r-consistent;
2) lim supt→∞
1
tlog ||q(t)
i || ≤ Ri, i = 1, 2.. (5.3)
That is, α(R1, R2) is the maximum consistency of all the detection systems under
capacity constraints (R1, R2). If Ri = ∞ (i = 1, 2), then this definition is reduced
to the level of strong detectability in centralized detection (see Definition 12). Our
goal is to design quantizers and detectors to achieve α(R1, R2).
Before concluding the introduction to our performance measure, we would like
to show an example which explains why our approach deviates from the classical
approaches.
Example Consider an alternative formulation where we assume Si (i = 1, 2)
are renewal processes under both hypotheses, i.e., the interarrival times Kj∆=
S1(j + 1)−S1(j) and Lj∆= S2(j + 1)−S2(j) (j = 1, 2, . . .) are i.i.d. , respectively.
Moreover, assume that the process of epoch pairs (S1(j), S2(j))∞j=1 is also renewal,
i.e., (Kj, Lj) (j ≥ 1) are i.i.d. with some distribution PKL. The testing hypotheses
are
H0 : PKL = PK PL, H1 : PKL 6= PK PL, (5.4)
where PK PL is the product distribution with the same marginals as PKL, defined
by PK PL(k, l) = PK(k)PL(l) (i.e., Kj and Lj are independent). This is a
testing against dependence problem under multiterminal data compression. By
157
similar techniques as in the testing against independence problem in [1], one can
develop the optimal test of (5.4) to minimize the error probabilities. The problem
is, however, that this is not the problem we want to solve in information flow
detection. Specifically, there are simple strategies to manipulate the information
flows such that the optimal test of (5.4) fails. For example, consider the scenario
in Fig. 5.3, where S2 is a identical copy of S1 except that a chaff packet is inserted
at the beginning. Then the subsequent observations of interarrival times will be
misaligned. In particular, for j ≥ 3, the jth pair of interarrival times becomes
(Kj, Lj) = (Kj, Kj−1). Since Kj’s are independent, the test of (5.4) will fail to
detect such an obvious information flow.
S1
S2
Chaff
K1
K2 K3 K4K5
L1 L2L3 L4 L5
Figure 5.3: Inserting one chaff packet can destroy the alignment of measurements.
The notion of consistency prevents obvious mistakes as in the above example
by guaranteeing that it is possible to have non-vanishing miss probability only if
a sufficient amount of chaff noise is inserted.
5.3.2 Level of Undetectability
Since the eavesdroppers cannot distinguish chaff noise from information flows, there
is a limit on the amount of chaff noise beyond which an information flow can be
158
made statistically identical with traffic under H0. We use this limit to measure
the level of undetectability. For centralized detection, the level of undetectability
is defined as the minimum CTR for an information flow to mimic the distribu-
tions under H0; see (4.4). For distributed detection, the distributions seen by the
detector depend on the quantizers, and so does the level of undetectability.
For deterministic quantizers3 qi (i = 1, 2), the level of undetectability is defined
as the minimum CTR required to mimic H0 after quantization, i.e.,
φ(H0; q1, q2)
∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :
1) Fi
⊕
Wid= Si, i = 1, 2, and
(qi(Fi
⊕
Wi))2i=1
d= (qi(Si))
2i=1 for some
(Si)2i=1 under H0;
2) (F1, F2) is an information flow;
3) lim supt→∞
CTR(t) ≤ r a.s.. (5.5)
With proper perturbations and φ(H0; q1, q2) fraction of chaff noise, an information
flow can appear to be the same as traffic under H0 to both the eavesdroppers and
the detector. Therefore, the maximum consistency under quantizers (q1, q2) is
upper bounded by φ(H0; q1, q2).
Generally, the quantization schemes may involve randomization. A random-
ized quantizer of S1 is a set of conditional distributions Q1(x|s1), where s1 is a
realization of S1, x ∈ X ∞ for a finite or countable alphabet X , and Q1(x|s1) is
the probability of quantizing s1 to x. A randomized quantizer Q2(y|s2) of S2 is
3Each qi can be viewed as the limit of a sequence of deterministic quantizers q(t)i t≥0 as t
increases.
159
defined similarly. Given (Q1, Q2), the level of undetectability is defined as
φ(H0; Q1, Q2)
∆= infr ∈ [0, 1] : ∃Fi, Wi (i = 1, 2) :
1) Fi
⊕
Wid= Si, i = 1, 2, and
(X, Y)|(Fi
⊕
Wi)2i=1
d= (X, Y)|(Si)2i=1
for
some (Si)2i=1 under H0;
2) (F1, F2) is an information flow;
3) lim supt→∞
CTR(t) ≤ r a.s.. (5.6)
where (X, Y)|(Si)2i=1is the marginal of (X, Y) in (X, Y, S1, S2) specified by
the distribution of (S1, S2) and the conditional distribution Q(X, Y|S1, S2) =
Q1(X|S1)Q2(Y|S2). Note that we can write the conditional distribution in product
form because the quantization of the two processes is independent, i.e., X → S1 →
S2 → Y. Similar to φ(H0; q1, q2), φ(H0; Q1, Q2) gives an upper bound on the
maximum consistency under (Q1, Q2).
5.3.3 General Converse and Achievability
Given capacity constraints (R1, R2), we are interested in finding the value of
α(R1, R2) and designing detection systems to achieve it. In Chapter 4, we have
answered these questions for infinite capacities. Now we provide high-level answers
for finite capacities.
Theorem 14 For any Ri ≥ 0 (i = 1, 2),
α(R1, R2) ≤ maxP1
φ(H0; Q1, Q2), (5.7)
160
where4
P1 = (Q1(X|S1), Q2(Y|S2)) : lim supt→∞
1
tI(S1; X) ≤ R1,
lim supt→∞
1
tI(S2; Y) ≤ R2..
Furthermore, let Q∗1 and Q∗
2 achieve the maximum in (5.7), and (F∗i , W
∗i ) (i = 1, 2)
achieve φ(H0; Q∗1, Q∗
2) as defined in (5.6) without the requirement that Fi
⊕
Wid=
Si (i = 1, 2). If Q∗1 and Q∗
2 are deterministic, and the CTR of (F∗i
⊕
W∗i )
2i=1
converges a.s. to some value α∗(R1, R2), then
α(R1, R2) ≥ α∗(R1, R2).
Proof: See Appendix 5.A.
¥
Remark: The theorem contains a converse result and an achievability result.
The converse result states that the level of detectability under certain capacity
constraints is no more than the maximum level of undetectability over all the
quantizers satisfying the capacity constraints. The achievability result gives a lower
bound on the level of detectability by constructing a specific detection system with
consistency equal to α∗(R1, R2).
It can be shown that solving the maximization in (5.7) is equivalent to com-
puting a distortion rate function with distortion measure
φ(H0) − φ(H0; Q1, Q2),
4Note that P1 is well-defined because Si (i = 1, 2) have the same distributions under bothhypotheses.
161
which characterizes the performance loss due to quantization by Qi (i = 1, 2). How
to compute this distortion rate function is an open problem because the distortion
measure is not single-letter (and it is a function of distributions). Instead, we will
develop practical detection systems and analyze their performance to give lower
bounds on α(R1, R2).
5.4 Quantizer Design
The design of quantizers q(t)i (i = 1, 2) is complicated by the dependency on t.
To simplify design, we partition the observation into n slots of equal length T
(T = t/n) and perform independent and identical quantization in each slot. We
propose the following quantizers based on the counting measure.
Definition 14 Given a point process S, a slotted quantizer with slot length T is
defined as γ(S)∆= (Z1, Z2, . . .), where Zj (j ≥ 1) is the number of points in the
jth slot (i.e., the interval [(j − 1)T, jT )) of S.
The slotted quantizer was first used to compress Poisson processes by Rubin in
[31], where combined with proper reconstruction methods, it was shown to achieve
compression performance close to the optimal predicted by the rate distortion
function under the single-letter absolute-error fidelity criterion. Note that it does
not imply that slotted quantizer is optimal or near optimal in our problem because
our fidelity criterion is different. We refer to the quantization by a slotted quantizer
as slotted quantization. It is easy to see that the above definition is equivalent to
the point-wise quantizer γ(t) = ⌊t/T ⌋, where t ∈ R+.
162
For applications requiring extremely low rate, it may be desirable to further
compress the results of slotted quantization. To this end, we propose the following
quantizer.
Definition 15 Given a point process S, a one-bit quantizer is a binary quantiza-
tion of the output of a slotted quantizer, defined as
γ(S) = (IZj>0)∞j=1,
where Z = γ(S), and I· is the indicator function.
Quantization by a one-bit quantizer is called one-bit quantization. The rate of
one-bit quantizer decays at O(1/T ) as T → ∞.
Hereafter, we will refer to the quantization results of S1 and S2 by Xn = (Xj)nj=1
and Yn = (Yj)nj=1, respectively, the meaning of which will depend on the quantizers
used. For the full side-information case (i.e., R2 = ∞), we will use Y (s, t) to denote
the number of epochs in S2 in the interval [s, t).
If Si (i = 1, 2) are Poisson processes, then Xj’s and Yj’s are i.i.d. , and it is
known that they can be delivered almost perfectly under the capacity constraints
in (5.2) if and only if
H(X1)
T≤ R1,
H(Y1)
T≤ R2. (5.8)
5.5 Detection Algorithms
In this section, we will present detectors for each of the quantization schemes pro-
posed in Section 5.4 and analyze their consistency. The detectors borrow the idea
163
in centralized detection, i.e., the detector should compute the minimum fraction
of chaff noise needed to generate the received measurements and make detection if
this fraction is suspiciously small. Optimal chaff-inserting algorithms are developed
to compute the minimum fraction of chaff.
In the rest of this section, we will discuss the following four cases: I) q1 is
a slotted quantizer, q2 is an identity function (full side-information); II) q1 and
q2 are both slotted quantizers; III) q1 is a one-bit quantizer, q2 is an identity
function; IV) q1, q2 are both one-bit quantizers. In Cases II and IV, equal capacity
constraints (R1 = R2) are considered for simplicity, although the idea of detection
is generalizable to unequal constraints. Since the level of detectability in high
capacity regime is already known, our analysis will focus on the low capacity (i.e.,
large slot length) regime.
5.5.1 Case I: Slotted Quantization, Full Side-Information
Consider the case when q1 is a slotted quantizer and q2 is an identity function.
Assume that the capacities are sufficient to permit reliable delivery of quantized
measurements. Then the detector needs to make a decision based on the measure-
ments xn and s2.
We want to insert the minimum chaff noise to mimic given (xn, s2), i.e., we want
to find realizations of an information flow (fi)2i=1 and chaff noise wi (i = 1, 2) such
that i) xn = γ(f1⊕
w1), s2 = f2⊕
w2, and ii) the CTR is minimized. If both s1
and s2 are given, then the optimal chaff-inserting algorithms is BGM presented in
Section 4.4.1. Now that we only know xn and s2, the idea is to reconstruct s1 from
xn and apply BGM on the reconstructed processes. Based on this idea, we develop
164
a chaff-inserting algorithm called “Insert Chaff: Slotted, Full side-information”
(IC-SF) as follows. Given (xn, s2), IC-SF does the following:
1. construct a point process s1 as busts of xj simultaneous epochs at (j − 1)T
(j ≥ 1), as illustrated in Fig. 5.4;
2. run BGM on (s1, s2) with delay bound T + ∆.
s1
s2
xn: 22
∆ 2T
0
0 T
chaff
Figure 5.4: IC-SF: Match s1 with s2 subject to delay bound T + ∆. : directlyobserved epoch in s2; •: reconstructed epoch in s1.
The optimality of IC-SF is provided by the following proposition.
Proposition 16 Algorithm IC-SF inserts the minimum number of chaff packets
to make an information flow mimic any (xn, s2) under the quantization in Case I.
Proof: See Appendix 5.A.
¥
Since IC-SF is optimal, we can compute the minimum number of chaff packets
to mimic the measurements (xn, s2) using IC-SF. This idea leads to the following
detector.
165
Given (xn, s2), define a detector δI as
δI(xn, s2) =
1 if CI/N ≤ τI,
0 o.w.,
where N =n∑
j=1
xj + |S2|, and CI is the number of chaff packets found by IC-SF in
(xn, s2), excluding chaff packets in5 S2 ∩ [0, ∆). The implementation of δI can be
found in Appendix 5.B.
The actual number of chaff packets has to be at least CI, and therefore, δI has
vanishing miss probability for all the information flows with CTR bounded by τI
a.s.
The false alarm probability of δI is guaranteed by the following theorem.
Theorem 15 If under H0, S1 and S2 are independent Poisson processes of rates
bounded by λ, and T is large, then for any τI < 1√πλT
− ∆4T
, the false alarm probability
of δI decays exponentially with n.
Proof: See Appendix 5.A.
¥
Theorem 15 tells us how to choose τI to have exponentially decaying false alarm
probability. Combining the theorem with our discussion on miss probability, we see
that a proper choice of threshold will enable δI to be r-consistent for r arbitrarily
close to
αI(T )∆=
1√πλT
− ∆
4T≈ 1√
πλT(5.9)
5This is because packets in this interval may be relays of packets transmitted before thedetector starts taking observations.
166
fraction of chaff noise, i.e., the consistency of δI is lower bounded by αI(T ). As
expected, αI(T ) is a decreasing function of T .
If we fix the quantization scheme as slotted quantization, then the capacity
constraints affect detection only through T . It is known that for Poisson processes
of maximum rate λ, the rate6
RI(T )∆= H(Poi(λT ))/T (5.10)
suffices to reliably deliver Xn for large n. By Gaussian approximation, we see
that H(Poi(λT ))/T ≈ log (2πeλT )/(2T ) for large T , i.e., the required rate under
slotted quantization decreases as O(log T/T ).
Combining Theorem 15 and (5.10) gives us an achievable rate-consistency pair
for each T , denoted by (RI(T ), αI(T )). Given a capacity constraint R, the achiev-
able consistency-rate function can be obtained by αI(R−1I (R)). The consistency-
rate functions for the other cases (i.e., Case II–IV) are characterized similarly.
5.5.2 Case II: Slotted Quantization, Equal Capacity Con-
straints
Consider equal capacity constraints (R1 = R2 = R < ∞), and qi (i = 1, 2) are
both slotted quantizers of the same slot length T . We follow the procedure in Case
I to develop a detector for this scenario.
We develop an optimal chaff-inserting algorithm called “Insert Chaff: Slotted,
Equal capacities” (IC-SE) based on ideas similar to IC-SF. Given (xn, yn), IC-SE
works as follows:6Here H(Poi(λT )) is the entropy of Poisson distribution with mean λT .
167
1. construct point processes si (i = 1, 2) as bursts of xj (or yj) simultaneous
points at (j − 1)T for j ≥ 1;
2. run BGM on (s1, s2) with delay bound ⌈∆T⌉T .
Algorithm IC-SE is optimal in minimizing the number of chaff packets, as stated
in the following proposition.
Proposition 17 For any (xn, yn), IC-SE inserts the minimum number of chaff
packets to make an information flow mimic these observations under the quanti-
zation in Case II.
Proof: See Appendix 5.A.
¥
Algorithm IC-SE provides a method to compute the minimum amount of chaff
noise in the measurements, based on which we design a detector as follows.
Given (xn, yn), define a detector δII as
δII(xn, yn) =
1 if CII/N ≤ τII,
0 o.w.,
where N =n∑
j=1
(xj + yj), and CII is the number of chaff packets found by IC-SE
in (xn, yn), except for chaff packets in7 S2 ∩ [0, ⌈∆T⌉T ). See Appendix 5.B for an
implementation of δII.
7As in the computation of CI, this adjustment is needed because packets at the beginning ofs2 may be the relays of packets transmitted before the detector starts.
168
Under H1, the optimality of IC-SE implies that the actual number of chaff
packets in the measurements is no smaller than CII. Therefore, a CTR larger
than τII is required to evade δII, i.e., δII has vanishing miss probability for all the
information flows with CTR bounded by τII a.s.
Under H0, the following theorem guarantees the false alarm probability of δII.
Theorem 16 If S1 and S2 are independent Poisson processes of maximum rate λ,
and T is large, then for any τII < c12√
λTe−λT/6, where c1 = 0.0014, the false alarm
probability of δII decays exponentially with n.
Proof: See Appendix 5.A.
¥
By Theorem 16, δII can achieve Chernoff-consistent detection for arbitrarily
close to
αII(T )∆=
c1
2√
λTe−λT/6 (5.11)
fraction of chaff noise, and thus its consistency is at least αII(T ). Note that as
T increases, αII(T ) decays exponentially at the rate O(e−λT/6); compared with
the O(1/√
T ) decay of αI(T ), the results suggest that the consistency in Case II
decays much faster than that in Case I due to the quantization of S2. The pair
(RII(T ), αII(T )), where RII(T ) = RI(T ), gives us a pair of achievable rate and
consistency.
169
5.5.3 Case III: One-Bit Quantization, Full Side-Information
Consider the scenario when S1 is compressed by one-bit quantization, and S2 is
fully available.
This case is similar to Case I in Section 5.5.1 except that the observations are
indicators instead of the exact counts. Clearly, more information is lost after one-
bit quantization because when xj = 1, there can be one or more epochs in slot
j in s1. To overcome this difficulty, we use a backward matching, i.e., matching
epochs in s2 with nonempty slots in s1. Specifically, we develop a chaff-inserting
algorithm called “Insert Chaff: One-bit, Full side-information” (IC-OF) which
works as follows. Given (xn, s2), IC-OF:
1. match every epoch in s2 with the earliest unmatched nonempty slot within
delay ∆, as illustrated in Fig. 5.5;
2. unmatched epochs become chaff; each unmatched nonempty slot contains a
chaff packet.
s2
xn:
∆
T
0 1111
Figure 5.5: IC-OF: Backward greedy matching. Each epoch is matched to thefirst unmatched nonempty slot that is no more than ∆ earlier.
Algorithm IC-OF is the optimal chaff-inserting algorithm in Case III, as stated
in the following proposition.
170
Proposition 18 Algorithm IC-OF inserts the minimum number of chaff packets
to make an information flow generate any given observations (xn, s2) under the
quantization in Case III.
Proof: See Appendix 5.A.
¥
Based on IC-OF, we develop the following detector. Given (xn, s2), the detector
δIII is defined as
δIII(xn, s2) =
1 if CIII/(nN1 + |S2|) ≤ τIII,
0 o.w.,
where CIII is the number of chaff packets found by IC-OF in (xn, s2), excluding
chaff packets in S2∩ [0, ∆), and N1 = − log (1 − x) for x = 1n
n∑
j=1
xj. Here N1 is the
Maximum Likelihood estimate of the mean number of epochs per slot in S1 based
on the assumption that S1 is Poisson. See Appendix 5.B for an implementation of
δIII.
Under H1, Proposition 18 guarantees that the actual number of chaff packets
is no smaller than CIII. Moreover, under the Poisson assumption, N1 converges to
the average traffic size per slot in S1 a.s. Thus the statistic CIII/(nN1 + |S2|) is
upper bounded by the actual CTR a.s. as n → ∞, implying that δIII has vanishing
miss probability for CTR bounded by τIII a.s.
Under H0, the performance of δIII is guaranteed by the following theorem.
171
Theorem 17 If S1 and S2 are independent Poisson processes of maximum rate λ,
and T is large, then for any τIII < 12e−λT , the false alarm probability of δIII decays
exponentially with n.
Proof: See Appendix 5.A.
¥
By this theorem, we see that the consistency of δIII is lower bounded by
αIII(T )∆=
1
2e−λT . (5.12)
For S1 considered in Theorem 17, it requires a rate of
RIII(T )∆=
log 2/T if λT ≥ log 2,
h(e−λT )/T o.w.,(5.13)
to deliver Xn reliably for large n, where h(p) is the binary entropy function defined
as h(p) = −p log p−(1−p) log (1 − p). Therefore, we see that (RIII(T ), αIII(T )) is an
achievable rate-consistency pair. As T increases, αIII(T ) decays exponentially with
the exponent λ. Note that this decay is much faster than the O(1/√
T ) decay of
αI(T ), indicating that for the same slot length, one-bit quantization significantly
reduces consistency compared with slotted quantization. It, however, does not
imply that slotted quantization is better because one-bit quantizers can use a
slot length much smaller than that of slotted quantizers under the same capacity
constraints.
172
5.5.4 Case IV: One-Bit Quantization, Equal Capacity Con-
straints
Suppose that one-bit quantizers with the same slot length T are used for both S1
and S2. This case is similar to Case II in Section 5.5.2, except that the measure-
ments (xn, yn) are binary vectors instead of exact packet counts.
To match the epochs, we can still use the idea of IC-SE, but since the number of
epochs in a nonempty slot can be one or more, we assume it to be the number that
minimizes the number of chaff packets over all positive integers. The amount of
chaff noise can be computed by an algorithm called “Insert Chaff: One-bit, Equal
capacities” (IC-OE) as follows. Given (xn, yn), IC-OE inserts a chaff packet in
slot j if
xj >
j+⌈∆/T ⌉∑
k=j
yk, or yj >
j∑
k=j−⌈∆/T ⌉xk
for j = 1, . . . , n.
Algorithm IC-OE computes the minimum amount of chaff noise as stated in
the following proposition.
Proposition 19 Algorithm IC-OE inserts the minimum number of chaff pack-
ets to make an information flow mimic given binary vectors (xn, yn) under the
quantization in Case IV.
Proof: See Appendix 5.A.
¥
173
Based on IC-OE, we develop a detector δIV as follows. Given (xn, yn), the
detector is defined as
δIV(xn, yn) =
1 if CIV/[n(N1 + N2)] ≤ τIV,
0 o.w.,
where CIV is the number of chaff packets inserted by IC-OE in (xn, yn), excluding
chaff packets in S2 ∩ [0, ⌈∆T⌉T ), and Ni (i = 1, 2) are defined as in δIII as functions
of xn and yn, respectively. See Appendix 5.B for its implementation.
Under H1, δIV has vanishing miss probability as long as the CTR is bounded
by τIV a.s. because of Proposition 19 and arguments similar to those in Section
5.5.3.
Under H0, the following theorem tells us how to choose the threshold to guar-
antee vanishing false alarm probability.
Theorem 18 If S1 and S2 are independent Poisson processes of maximum rate
λ, and T is large, then for any τIV < (1−e−λT )2λT
e−2λT , the false alarm probability of
δIV decays exponentially with n.
Proof: See Appendix 5.A.
¥
By this theorem, the consistency of δIV is lower bounded by
αIV(T )∆=
(1 − e−λT )
2λTe−2λT . (5.14)
Thus we have an achievable rate-consistency pair (RIV(T ), αIV(T )), where RIV(T ) =
RIII(T ). The value of αIV(T ) decays exponentially with the increase of T with the
174
exponent 2λ. Comparing this result with the decay of αII(T ), we see the analysis
suggests that for the same T , the consistency under one-bit quantization decays
12 times faster than that under slotted quantization. Again, it does not mean that
slotted quantization is better because the slot lengths under different quantization
schemes are different.
5.6 Analysis and Comparison
Recall that we have taken a separated approach by breaking the distributed de-
tection process into three steps—quantization, data transmission, and detection.
In this section, we analyze the consistency of the proposed detectors and then
compare their consistency to gain insights into the quantizer design.
5.6.1 Performance Analysis
Assume that S1 and S2 are independent Poisson processes of maximum rate λ
under H0. We will analyze the consistency of the detectors proposed in Section
5.5 and give bounds on the maximum consistency in each of the four cases.
Conceptually, we can calculate the exact consistency of the proposed detectors
as follows.
Theorem 19 There exist functions α∗i (T ) (i = I, . . . , IV) such that δi has van-
ishing false alarm probability if and only if τi < α∗i (T ).
Proof: See Appendix 5.A.
175
¥
The theorem implies that the consistency of δi (i = I, . . . , IV) is equal to α∗i (T ).
The definition of α∗i (T ) can be found in the proof. Their computation is rather
involved; instead, we resort to closed-form lower bounds that will also guarantee
vanishing false alarm probabilities, which leads to αi(T ) in (5.9, 5.11, 5.12, 5.14).
Fixing quantization schemes as in Case i (i = I, . . . , IV), we provide a converse
result in the following theorem.
Theorem 20 The level of undetectability in Case i (i = I, . . . , IV) is bounded by
φ(H0; q1, q2) ≤E[|X − Y |]
2λT,
where qj (j = 1, 2) are the quantizers in Case i, and X and Y are independent
Poisson random variables with mean λT .
Proof: See Appendix 5.A.
¥
Note that detectors δi (i = I, . . . , IV) are not necessarily optimal because the
chaff-inserting algorithms used in these detectors only make an information flow
mimic the joint distribution of quantized processes under H0. The marginal distri-
butions are different from H0 (e.g.,the process constructed by IC-SF is not Poisson)
and can still be used to distinguish the two hypotheses. In the proof of Theorem
20, we give a method to mimic both the marginal and the joint distributions under
176
H0 and analyze the CTR of that method to obtain the upper bound on the level
of undetectability.
Combining Theorems 19 and 20 yields the following result.
Corollary 6 The maximum consistency in Case i (i = I, . . . , IV) is lower bounded
by α∗i (T ) and upper bounded by E[|X − Y |]/(2λT ), where X and Y are defined as
in Theorem 20.
The relationship of the quantities discussed so far regarding the consistency in
Case i (i = I, . . . , IV) can be summarized as follows:
αi(T ) ≤ α∗i (T ) ≤ max consistency in Case i
≤ φ(H0; q1, q2) ≤E[|X − Y |]
2λT,
where qj (j = 1, 2) are the quantizers in Case i.
5.6.2 Numerical Comparison
We now give some heuristics on quantizer design by comparing the consistency
of δi (i = I, . . . , IV) as functions of capacity constraints. Specifically, let the
capacity constraints be (R, ∞) in Cases I and III, and (R, R) in Cases II and
IV. The consistency-rate functions are computed by α∗i (R
−1i (R)) (i = I, . . . , IV).
Since the form of α∗i (T ) is not explicit, we calculate it numerically as the CTR of
the optimal chaff-inserting algorithms (i.e., IC-SF, SE, OF, OE) on independent
Poisson processes of rate λ. In addition, we compare the computed consistency-
rate functions with the upper bound u(R)∆= E[|X − Y |]/(2λT ) for T = R−1
I (R),
177
where X and Y are defined in Theorem 20 (it can be shown that the upper bound
for T = R−1III (R) is much looser, and thus this bound is omitted). For algorithmic
simplicity, we choose the range of R to guarantee that R−1i (R) ≥ ∆ (i = I, . . . , IV).
See Fig. 5.6–5.8 for plots of the consistency-rate functions under different traffic
rates (i.e., different λ).
The plots yield the following observations: i) for small λ (Fig. 5.6), the consis-
tency of δI is similar to that of δIII, and the same holds for δII and δIV; as λ increases
(Fig. 5.7, 5.8), the consistency of δI (or δII) becomes increasingly larger than that
of δIII (or δIV); at λ = 1 (Fig. 5.8), the consistency of δII exceeds that of δIII even
though δIII has full side-information; ii) at the same R, the consistency of all the
detectors decreases with the increase of λ; iii) the consistency of δI is close to the
upper bound on the maximum consistency in Case I, especially at small R.
Observation (i) clearly suggests that which quantizer to use should depend on
the traffic rate. For very light traffic, we can use the simpler one-bit quantizer to
achieve the same performance as the more complicated slotted quantizer, whereas
we should use the slotted quantizer to obtain better performance if the traffic is
not so light. The intuition behind this observation is that for very small λ, the
probability that a slot contains more than one epoch is small, and thus we will
not lose much information by further compressing the results of slotted quanti-
zation by one-bit quantization; otherwise, there is nonnegligible probability for a
nonempty slot to contain multiple epochs, and this information will be lost after
one-bit quantization, making it more difficult to distinguish the two hypotheses.
Observation (ii) says that it is more difficult to detect information flows in heavy
traffic. The intuition behind is that if we normalize the maximum delay by the
average interarrival time, the normalized maximum delay constraint λ∆ will be
178
The consistency-rate functions of δI, . . . , δIV for various traffic rates:∆ = 1; α∗
i is computed over 104 slots.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III α∗
IV
u
Figure 5.6: λ = 0.1.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III
α∗IV
u
Figure 5.7: λ = 0.5.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.05
0.1
0.15
0.2
0.25
0.3
0.35Case I Case II Case III Case IV Upper bound
R
α∗ i
α∗I
α∗II
α∗III
α∗IV
u
Figure 5.8: λ = 1.
relatively loose for large λ, making the detection more difficult (see parallel results
in Section 3.5.2). Observation (iii) implies that the detector δI is close to the opti-
mal in Case I. Its consistency and the upper bound jointly specify the maximum
consistency under slotted quantization and full side-information.
179
5.7 Summary
In this chapter, we consider distributed detection of bounded delay flows in chaff
noise. We give a theoretical characterization of the optimal performance and then
focus on the development of practical detection systems. We are especially inter-
ested in the detection performance at extremely low rates. Our results suggest that
slotted quantization coupled with the proposed detector gives satisfactory perfor-
mance in terms of the consistency-rate tradeoff. What remains to be solved by the
future work includes tighter converse results, better lower bounds, and ultimately,
the optimal quantizers and detectors.
180
APPENDIX 5.A
PROOF OF CHAPTER 5
Proof of Theorem 14
For the converse result, since for any quantizers Qi (i = 1, 2), the consistency is
upper bounded by φ(H0; Q1, Q2), the largest of such upper bounds under the given
capacity constraints gives an upper bound on the maximum consistency, i.e., the
level of detectability.
For the achievability result, consider the detection system8 (Q∗(n)1 , Q∗(n)
2 , δ∗n),
where δ∗n is a threshold detector defined as follows. Given (xn, yn),
δ∗n(xn, yn) =
1 if CTR∗(xn, yn) ≤ τ
0 o.w.,
where
CTR∗(xn, yn)∆= min
P2
CTR(nT )
for
P2=(fi, wi)2i=1 :
1)(n)
Q∗1(x
n|f1⊕
w1) > 0,(n)
Q∗2(y
n|f2⊕
w2) > 0;
2) (f1, f2) is a realization of an information flow..
That is, CTR∗(xn, yn) is the minimum CTR over all realizations of informa-
tion flows and chaff noise, which can generate (xn, yn) after being quantized by
8Quantizers Q∗(n)i (i = 1, 2) are the marginalization of Q∗
i on [0, nT ].
181
(Q∗(n)1 , Q∗(n)
2 ). The detector δ∗n makes detection if this minimum CTR is upper
bounded by a predetermined threshold τ .
Since the statistic CTR∗(xn, yn) is a lower bound on the actual CTR in the
measurements, it is easy to see that δ∗n has vanishing miss probability as long as
the CTR is upper bounded by τ a.s.
Generally, the statistic is smaller than the CTR required to mimic H0. If,
however, Q∗i (i = 1, 2) are deterministic, then CTR∗(Xn, Yn) is the minimum CTR
for an information flow to mimic the distribution of (Xn, Yn) after quantization,
and the minimum CTR∗(Xn,Yn) under H0 is the minimum CTR to mimic the joint
distribution of the quantization results of some (S1, S2) under H0. By definition,
this is the CTR of (F∗i
⊕
W∗i )
2i=1. Note that the processes achieving CTR∗(Xn,Yn)
do not necessarily mimic the marginal distributions of Si (i = 1, 2) under H0. By
assumption, there exists a constant α∗(R1, R2) such that the CTR of (F∗i
⊕
W∗i )
2i=1
converges to α∗(R1, R2) a.s. Thus, for any τ < α∗(R1, R2), we have that
limn→∞
PF (δ∗n) = limn→∞
PrCTR∗(Xn, Yn) ≤ τ = 0.
Combining this result and the arguments on miss probability, we conclude that
the consistency of δ∗n is at least α∗(R1, R2). Therefore, α∗(R1, R2) is a lower bound
on α(R1, R2).
¥
182
Proof of Proposition 16
First, we show that the matched pairs found by IC-SF indeed form a realization
of an information flow. Let x′n be the vector of the numbers of matched epochs
in s1, and f2 = (t1, t2, . . .) be the sequence of matched epochs in s2. We construct
a sequence f1 = (sj)j≥1 as follows. As illustrated in Fig. 5.9, for an epoch t1 in
f2 matched to the same slot, we construct an epoch s1 = t1; for an epoch t2 in
f2 matched to a previous slot, we construct s2 at the end of that slot. It is easy
to see that slotted quantization of f1 yields x′n, and (f1, f2) is a realization of an
information flow.
f1
f2
x′1 = 2 x′
2 = 1
∆ 2T0 T
s1 s2
t1 t2
Figure 5.9: Construct f1: : original epochs; •: constructed epochs.
Then we show that IC-SF is optimal. Since it is known that for given real-
izations and delay bound, BGM inserts the minimum number of chaff packets, it
only remains to show that our construction of s1 and choice of delay bound min-
imize the need of chaff. Given xn, the xj packets in slot j can be anywhere in
[(j − 1)T, jT ). By the causality and bounded delay constraints, the maximum
interval for these packets to be relayed is [(j − 1)T, jT +∆). By putting all the xj
packets at (j−1)T and allowing delays up to T +∆, we allow the matched packets
in s2 to be anywhere in the maximum interval. Thus, any other chaff-inserting
algorithm will have to insert no fewer chaff packets than IC-SF. Therefore, IC-SF
mimics (xn, s2) by inserting the minimum number of chaff packets.
183
¥
Proof of Theorem 15
Let Ck be the number of chaff packets inserted in the kth slot. Since T ≫ ∆, we
see that Ck (k = 1, 2, . . .) are approximately i.i.d. , and
Ckd= max(Y (υ, T ) − X1, 0) + max(X1 − Y (υ, T + ∆), 0),
where υ is a random variable in [0, ∆] denoting the time used in each slot to relay
packets sent in the previous slot. It is easy to see that the CTR is minimized when
the processes have equal rate because unequal rates will make υ drift towards 0 or
∆ and increase the mean of Ck. Moreover, if we prove the theorem for processes
of equal rate λ, then the result also holds for smaller rates. For example, if Si
(i = 1, 2) have rate λ′ < λ, then by the result of the theorem, we have that
PrCI/N < 1√πλ′T
− ∆4T decays exponentially, implying that PrCI/N < 1√
πλT− ∆
4T
also decays exponentially. Therefore, it suffices to consider independent Poisson
processes of equal rate λ.
We first show that Pr 1n
n∑
k=1
Ck ≤ η decays exponentially for any η <
2λTαI(T ). By Cramer’s Theorem [9], this result holds if we show that E[Ck] ≥
2λTαI(T ). Fix a value of υ in [0, ∆]. By Gaussian approximation of Poisson
random variables, we have that
Y (υ, T ) − X1 ∼ N (−λυ, λ(2T − υ)) ≈ N (−λυ, 2λT ).
184
Then
E[max(Y (υ, T ) − X1, 0)]
≈∞
∫
0
z√4πλT
e−(z+λυ)2/(4λT )dz
=
√
λT
πe−λυ2/(4T ) − λυQ
(
λυ√2λT
)
≈(
√
λT
π− 1
2λυ
)
e−λυ2/(4T )
≈√
λT
π− 1
2λυ.
Similarly, E[max(X1 − Y (υ, T + ∆), 0)] ≈√
λT/π − 12λ(∆ − υ). Therefore,
E[Ck] ≈ 2
√
λT
π− 1
2λ∆ = 2λTαI(T ).
Next, let β∆= τI/αI(T ) (β ∈ [0, 1)). A necessary condition for false alarm is
that 1n
n∑
k=1
Ck ≤ √β2λTαI(T ) or N/n ≥ (2λT )/
√β. By union bound, we have that
PF (δI) ≤ Pr 1
n
n∑
k=1
Ck ≤√
β2λTαI(T ) + PrN
n≥ 2λT√
β.
We have shown that the first term decays exponentially with n, and by Cramer’s
Theorem, the second term can be shown to decay exponentially as well. Therefore,
the overall false alarm probability decays exponentially. This completes the proof.
¥
Proof of Proposition 17
First, we show that IC-SE indeed finds realizations of an information flow and
chaff noise such that the slotted quantization results are equal to (xn, yn). Let
185
(x′n, y′n) denote vectors of the matched numbers found by IC-SE. We will show
that (x′n, y′n) is the result of slotted quantization of a pair of sequences (f1, f2)
which is a realization of an information flow. As illustrated in Fig. 5.10, for T ≥ ∆,
we construct f1 as x′j (j ≥ 1) epochs at the end of slot j and f2 as y′
j, 1 epochs at the
beginning and y′j, 2 epochs at the end of slot j, where y′
j, 1 is the number of epochs
out of y′j which are matched to the (j − 1)th slot, and y′
j, 2 is the number of epochs
matched to the jth slot (we have y′j = y′
j, 1 +y′j, 2). Such construction preserves the
quantization results, and (f1, f2) forms a realization of an information flow. For
T < ∆, the construction of fi (i = 1, 2) is the same except that for f2, y′j, 1 is the
number of epochs matched to slots before the jth slot, and y′j, 2 is the number of
epochs matched to the jth slot.
f1
f2
x′1 = 2 x′
2 = 3
y′1, 1 = 0
y′1, 2 = 1
y′2, 1 = 1
y′2, 2 = 3
y′3, 1 = 0
0 T 2T
Figure 5.10: Construct (f1, f2) from (x′n, y′n) (T ≥ ∆). The matching found byIC-SE guarantees that x′
j = y′j, 2 + y′
j+1, 1.
Next, we show that IC-SE is optimal. Due to the constraints of causality
and bounded delay, a packet in slot j can only be matched to packets from slots
j, . . . , j + ⌈∆T⌉, and IC-SE allows all such matches. Combining this argument with
the fact that BGM is optimal yields the optimality of IC-SE.
¥
186
Proof of Theorem 16
By arguments parallel to those in the proof of Theorem 15, we only need to consider
independent Poisson processes of equal rate λ. Following the idea of that proof,
we will prove Theorem 16 if we show that PrCII/n ≤ η decays exponentially for
any η < 2λTαII(T ).
In δII, no matter how large T is (relative to ∆), the numbers of chaff packets
in consecutive slots are still correlated. If, however, we run δII only on every other
slot, and let C2i (i = 1, 2, . . .) be the number of chaff packets inserted in the (2i)th
slot, then C2, C4, C6, . . . will be i.i.d. . Obviously9, CII ≥n/2∑
i=1
C2i. Then we have
PrCII/n ≤ η ≤ Pr 2
n
n/2∑
i=1
C2i ≤ 2η.
By Cramer’s Theorem, we can prove the exponential decay if we show that E[C2] ≥
4λTαII(T ). Since
E[C2] = E[max(Y2 − X1 − X2, 0) + max(X2 − Y2 − Y3, 0)],
and (Y2 − X1 − X2), (X2 − Y2 − Y3) ∼ N (−λT, 3λT ) for large T , we have
1
2E[C2] = E[max(Y2 − X1 − X2, 0)]
≈∫ ∞
0
z√6πλT
e−(z+λT )2/(6λT )dz
=
√
3λT
2πe−λT/6 − λTQ
(
√
λT
3
)
≈ c1
√λTe−λT/6 = 2λTαII(T ), (5.15)
where c1 = 0.0014, and (5.15) is obtained by the following approximation of Q(·)
in [25]
Q(x) ≈ e−x2/2
1.64x +√
0.76x2 + 4.
9Note that CII is not exactly equal to the total number of chaff packets found by IC-SE, butthe difference becomes negligible for large n.
187
This completes the proof.
¥
Proof of Proposition 18
By the construction in the proof of Proposition 16, we can construct an information
flow based on the matching found by IC-III. Since IC-III is a backward bounded
greedy match, its optimality can be proved by arguments parallel to those proving
the optimality of BGM; see [5].
¥
Proof of Theorem 17
By arguments similar to those in the proof of Theorem 15, we only need to consider
independent Poisson processes of equal rate λ. Let Ck be the number of chaff
packets inserted in the kth slot. Note that if chaff is inserted in S1, then only one
chaff packet is needed in each slot. For large T , C1, . . . , Cn are approximately i.i.d.
.
First, we show that Pr 1n
n∑
k=1
Ck ≤ η decays exponentially for any η <
2λTαIII(T ). By Cramer’s Theorem, it is reduced to showing that E[Ck] ≥
2λTαIII(T ). Then note that if Xk = 0 (with probability e−λT ), then all the epochs
in [(k − 1)T + ∆, kT ) in S2 will be chaff; if Xk = 1 and Y ((k − 1)T, kT + ∆) = 0,
188
then there will be at least one chaff packet in S1 in slot k. Thus,
E[Ck] ≥ e−λT λ(T − ∆) + (1 − e−λT )e−λ(T+∆) ≈ 2λTαIII(T ).
Next, since Xk’s are i.i.d. , X converges to the true mean 1−e−λT exponentially
(by Cramer’s Theorem), and thus N1 converges to λT exponentially. Following the
arguments in the proof of Theorem 15 leads to the conclusion that PF (δIII) decays
exponentially for any τIII < αIII(T ).
¥
Proof of Proposition 19
If IC-OE inserts a chaff packet in slot j, then we must have either xj = 1 and
yk = 0 (k = j, . . . , j + ⌈∆/T ⌉) or yj = 1 and xk = 0 (k = j −⌈∆/T ⌉, . . . , j). That
is, there is either a nonempty slot in s1 such that all the slots that can relay its
epochs are empty, or a nonempty slot in s2 such that all the slots that can generate
a relay packet in that slot are empty. Thus, any other chaff-inserting algorithm
will have to insert a chaff packet in slot j as well.
To find a realization of an information flow, we use the following variation of
BGM. For every xj = 1, we match it with the first yk = 1 for k ∈ j, . . . , j +
⌈∆/T ⌉. Let xj (or yj) be the number of times that xj (or yj) is matched. Then
we can construct a realization of an information flow based on (xn, yn) as in the
proof of Proposition 17. This realization plus the chaff noise inserted by IC-OE
generates (xn, yn) after one-bit quantization. This completes the proof.
¥
189
Proof of Theorem 18
It suffices to consider independent Poisson processes of equal rate λ. Let Ti (i ≥ 1)
denote the number of slots after the (i−1)th chaff packet until the ith chaff packet
is inserted (including the slot with the ith chaff packet). Then we can characterize
the number of chaff packets using Ti as
CIV
n≤ τ
d=
1
nτ
nτ∑
i=1
Ti ≥1
τ
. (5.16)
We will be able to bound the probability of this event if we know the joint dis-
tribution of Ti’s. Unfortunately, it is difficult to characterize the joint distribution.
If, however, we only consider even slots, and let Ti be the number of slots between
chaff packets, then Ti’s are i.i.d. and Ti ≤ Ti. We claim that Tid= 2Z, where Z
has the geometric distribution
PrZ = n = (1 − ρ)n−1ρ, n ≥ 1,
in which ρ = 2e−2λT (1 − e−λT ). To understand this result, note that the event
∃ chaff in slot 2i is equivalent to
A2i∆= X2i−1 + X2i < Y2i, or X2i > Y2i + Y2i+1,
which has probability ρ, and is i.i.d. for i = 1, 2, . . .. Then Z is the number of slot
pairs until the event A2i occurs, i.e., Zd= infi ≥ 1 : A2i occurs .
By (5.16), we have that
PrCIV/n ≤ τ = Pr 1
n
n∑
i=1
Ti ≥1
τ
≤ Pr 1
n
n∑
i=1
Ti ≥1
τ,
190
where n = nτ . By Cramer’s Theorem, we can be show that PrCIV/n ≤ τ
decays exponentially for any τ < 2λTαIV(T ) if E[Ti] ≤ 1/(2λTαIV(T )). Since
E[Ti] = 2E[Z] = 2/ρ, and 2λTαIV(T ) = e−2λT (1 − e−λT ) = ρ/2, we have that
E[Ti] = 1/(2λTαIV(T )).
This result coupled with arguments similar to those in the proof of Theorem
17 completes the proof.
¥
Proof of Theorem 19
By arguments similar to those in the proof of Theorem 14, we can prove the results
of Theorem 19 if the minimum statistics in δi (i = I, . . . , IV) converge a.s. under
H0. The limits give definitions to α∗i (T ) (i = I, . . . , IV). Since equal rate Poisson
processes of the maximum rate λ generates the minimum statistics (see proofs of
Theorems 15, 16, 17, and 18), it suffices to consider this case.
Case I
In Case I, it suffices to show that the CTR of IC-SF converges a.s. Let the kth
interarrival time in S2 be
Vk∆= S2(k) − S2(k − 1), k ≥ 1.
Let Zj (j ≥ 0) be the starting time for finding matches in the jth slot of S2 (Z0 = 0
by definition). For each Xj (j ≥ 1), IC-SF matches the Xj reconstructed epochs
with epochs in S2. Let Kj (j ≥ 1) be the index of the last epoch in S2 that is
191
matched or assigned as chaff after the Xj epochs are matched, and K0 = 0. Then
Zj (j ≥ 1) satisfies the following recursion
Zj = min(max(Zj−1 + VKj−1+1 +
Kj−1+Xj∑
k=Kj−1+2
Vk − T, 0), ∆),
where VKj−1+1 = S2(Kj−1 + 1) − (j − 1)T − Zj−1 is a truncated inerarrival time.
Since Vk’s are i.i.d. exponential random variables, by the memoryless property of
exponential distribution, we have that Zj∞j=0 is a random walk with reflecting
barriers at 0 and ∆, and the step distribution is equal to the distribution ofX1∑
k=1
Vk−
T, where X1 is a Poisson variable with mean λT , and Vk (k ≥ 1) are i.i.d. and
independent of X1, distributed by exponential distribution with mean 1/λ. It is
easy to check that Zj∞j=0 is an ergodic process, and thus its limiting distribution
exists.
Let Cj (j ≥ 1) denote the number of chaff packets in slot j. Then we have that
Cj = max(Y ((j − 1)T + Zj−1, jT ) − Xj,
Xj − Y ((j − 1)T + Zj−1, jT + ∆), 0).
It is clear that Cj’s only depend on each other through Zj’s. Now that Zj∞j=0 is
ergodic, (n∑
j=1
Cj)/n converges a.s. Since we know that the average traffic size per
slot converges a.s., their ratio, which is equal to the CTR, converges a.s. The limit
is given by
α∗I (T )
∆=
E[max(Y (Z, T ) − X1, X1 − Y (Z, T + ∆), 0)]
2λT,
where Z is a random variable with the limiting distribution of Zj∞j=0.
192
Case II
In Case II, it suffices to show that the CTR of IC-SE converges a.s. Consider the
case T ≥ ∆ for simplicity; the proof can be easily modified for T < ∆.
Let Zj (j ≥ 0) be the number of packets in slot j in S2 which are matched with
packets before slot j in S1 (Z0 = 0 by definition). It can be shown that Zj satisfies
the following recursion
Zj+1 = min(max(Zj + Xj − Yj, 0), Yj+1).
Note that Zj∞j=0 is not Markovian because given Zj, Zj+1 still depends on Zj−1
through Yj. We can solve this issue by including Yj in the state. Specifically, it
can be shown that (Zj, Yj)∞j=0 is a Markov chain (note that it is not a random
walk) and is ergodic.
If Cj (j ≥ 1) denotes the number of chaff packets in slot j, we have that
Cj = max(Yj − Zj − Xj, Xj − Yj + Zj − Yj+1, 0).
Since (Zj, Yj)∞j=0 is ergodic, and Xj’s are i.i.d., we see that (n∑
j=1
Cj)/n converges
a.s., implying that the CTR converges a.s. The limiting CTR can be computed by
α∗II(T )
∆=
max(Y − Z − X1, X1 − Y + Z − Y1, 0)
2λT,
where (Z, Y ) is distributed by the limiting distribution of (Zj, Yj)∞j=0, and X1 and
Y1 are independent Poisson variables with mean λT , which are also independent
with (Z, Y ).
193
Case III
In Case III, we need to show that the CTR of IC-OF converges a.s. Let Uj (j ≥ 0)
be the starting time for finding matches in the (j + 1)th slot in S2 if Xj+1 = 0
(define U0 = ∆); similarly, let Lj (j ≥ 0) be the starting time if Xj+1 = 1 (define
L0 = 0). Then it can be shown that Uj and Lj satisfy the following recursions
Uj =
max(Uj−1, jT ) if Xj = 0,
jT + ∆ o.w.
Lj =
max(Lj−1, jT ) if Xj = 0,
S2(K) if Xj = 1,
S2(K) ≥ jT + ∆,
max(S2(K + 1), jT ) o.w.,
where K∆= infk : S2(k) ≥ Lj−1. It is easy to see that (Uj, Lj)∞j=0 is a Markov
process. Moreover, it can be shown that the process (Uj − jT, Lj − jT )∞j=0 is
ergodic.
Let Cj (j ≥ 1) be the number of chaff packets in slot j. Then we have
Cj =
Y (Uj−1, jT ) if Xj = 0,
IY (Lj−1, jT+∆)=0 o.w.
By the ergodicity of (Uj −jT, Lj −jT )∞j=0 and the (time) homogeneous property
of Poisson processes, one can show that (n∑
j=1
Cj)/n converges a.s., and so does the
CTR. The limit is given by
α∗III(T )
∆=
E[Y (U, T )]e−λ1T + PrY (L, T + ∆) = 0(1 − e−λ1T )
2λT,
where (U, L) is distributed according to the limiting distribution of (Uj−jT, Lj−
jT )∞j=0.
194
Case IV
In Case IV, it suffices to show that the CTR of IC-OE converges a.s. It is easy
to check that the process (Xj−1, Xj, Yj, Yj+1)∞j=1 (where X0 ≡ 1) is an ergodic
Markov chain. Since the number of chaff packets in slot j is computed by
Cj = max(Yj − Xj − Xj−1, Xj − Yj − Yj+1, 0),
we see that (n∑
j=1
Cj)/n converges a.s. Therefore, the CTR converges a.s. The limit
is given by
α∗IV(T )
∆=
max(Y − X − X1, X − Y − Y1, 0)
2λT,
where (X1, X, Y, Y1) has the limiting distribution of (Xj−1, Xj, Yj, Yj+1)∞j=1.
¥
Proof of Theorem 20
Let “id” denote the identity function. Since the quantization in Cases II–IV can be
viewed as further compression of the quantization in Case I, it suffices to prove the
bound for Case I, i.e., φ(H0; γ, id) ≤ E[|X − Y |]/(2λT ) for independent Poisson
random variables X, Y with mean λT .
Consider the following method to mimic H0 in Case I. Given (xn, s2), construct
a realization s1 of a point process (over n slots) as follows. For j = 1, . . . , n, if
xj ≤ y((j−1)T, jT ), then randomly select xj epochs from the epochs in the jth slot
of s2; otherwise, select the epochs in the jth slot of s2, and select xj−y((j−1)T, jT )
more epochs randomly (i.i.d. uniformly) from [(j − 1)T, jT ). The overall method
is the following:
195
1. generate a Poisson process S2 of rate λ, and i.i.d. Poisson random variables
Xj (j ≥ 1) of mean λT , which are independent of S2;
2. construct a process S1 as described above;
3. use BGM with delay bound ∆ to decompose (S1, S2) into an information
flow and chaff noise.
The traffic containing an information flow generated by this method is identical
with traffic under H0 (independent Poisson processes of equal rate λ, to be specific)
both marginally and jointly after quantization. Therefore, its CTR will give an
upper bound on φ(H0; γ, id), the minimum CTR to mimic H0.
The rest of the proof follows by observing that the construction of S1 guarantees
at least min(Xj, Yj) (j ≥ 1) pairs of epochs can be matched in slot j (BGM may
find even more matches), which implies that the average number of chaff packets
per slot is upper bounded by E[|X1 − Y1|]. Thus, the CTR of the generated traffic
is no more than E[|X1 − Y1|]/(2λT ).
¥
196
APPENDIX 5.B
ALGORITHMS OF CHAPTER 5
The pseudo code implementation of δI is presented in Table 5.1. In this imple-
mentation, δI uses BGM with delay bound T + ∆ to compute the number of chaff
packets in (s1, s2), denoted by CI. Then it makes detection if the fraction of chaff
packets is upper bounded by a threshold τI.
Table 5.1: Detector for Case I.δI(x
n, s2, ∆, τI):i = 1;CI = 0;for k = 1 : n
if s2(i + xk − 1) < kTCI = CI + |S2 ∩ [max(s2(i + xk), ∆), kT )|;i = infj : s2(j) ≥ kT;
else if s2(i + xk − 1) > kT + ∆CI = CI + xk − |S2 ∩ [s2(i), kT + ∆)|;i = infj : s2(j) ≥ kT + ∆;
elsei = i + xk;
endend
end
return
H1 if CI/(n∑
k=1
xk + |S2|) ≤ τI,
H0 o.w.;
Detector δII for T ≥ ∆ is implemented in Table 5.2. This implementation can be
generalized for arbitrary T . In this implementation, δII computes CII, the number
of chaff packets inserted by BGM in the batched processes (s1, s2) constructed
from (xn, yn) with delay bound T (generally, the delay bound should be ⌈∆T⌉T ).
Then it returns H1 if the fraction of chaff packets is bounded by a given threshold
τII.
197
Table 5.2: Detector for Case II.δII(x
n, yn, ∆, τII):i = max(0, y1 − x1);CII = 0;for k = 1 : n
if xk < yk − iCII = CII + yk − i − xk;i = 0;
else if xk > yk − i + yk+1
CII = CII + xk − yk + i − yk+1;i = yk+1;
elsei = i + xk − yk;
endend
end
return
H1 if CII/(n∑
k=1
(xk + yk)) ≤ τII,
H0 o.w.;
The implementation of δIII is presented in Table 5.3. Detector δIII computes
the number of chaff packets CIII inserted by BGM in (s1, s2) with delay bound
T + ∆. If xk = 1, then we assume that the kth slot contains the number of epochs
that minimizes the number of chaff packets among all positive integers. Then δIII
estimates the total number of packets and returns H1 if the estimated fraction of
chaff packets is bounded by τIII.
An implementation of δIV for the case T ≥ ∆ is presented in Table 5.4. This im-
plementation can be easily amended for other values of T . In the implementation,
δIV uses a variable CIV to count the number of chaff packets inserted by IC-OE,
estimates the total traffic size, and then reports H1 if their ratio is bounded by τIV.
198
Table 5.3: Detector for Case III.δIII(x
n, s2, ∆, τIII):v = 0;u = ∆;CIII = 0;for k = 1 : n
if xk == 0if y(u, kT ) > 0
CIII = CIII + y(u, kT );endv = max(v, kT );u = max(u, kT );
else if y(v, kT + ∆) == 0CIII = CIII + 1;
endj′ = infj : s2(j) ≥ v;if s2(j
′) < kT + ∆v = max(s2(j
′ + 1), kT );else
v = s2(j′);
endu = kT + ∆;
endend
N = |S2| − n log (1 − 1n
n∑
k=1
xk);
return
H1 if CIII/N ≤ τIII,H0 o.w.;
199
Table 5.4: Detector for Case IV.δIV(xn, yn, ∆, τIV):
CIV = 0;x0 = 1;for k = 1 : n
if (xk > yk + yk+1) or (yk > xk−1 + xk)CIV = CIV + 1;
endend
N = −n(log (1 − 1n
n∑
k=1
xk) + log (1 − 1n
n∑
k=1
yk));
return
H1 if CIV/N ≤ τIV,H0 o.w.;
200
Chapter 6
Conclusions
In this dissertation, we investigate statistical inference in sensor networks and
general ad hoc networks when there is no or incomplete parametric description of
the underlying distributions.
In Chapter 2, we considered the problem of detecting unknown changes in the
unknown distribution of alarmed sensors in a randomly deployed sensor field. We
proposed a threshold detector based on the distance between the empirical dis-
tributions in two data collections and provided an estimate of the set with the
maximum change by the set with the maximum change in the empirical distri-
butions. By applying the Vapnik-Chervonenkis Theory, we derived exponential
upper bounds on detection error probabilities and proved the consistency of the
estimator under certain regularity conditions for arbitrary distributions. We also
developed several practical algorithms to implement the detector and the estimator
efficiently. Specifically, we simplified the search in infinitely many sets to a search
in a finite number of sets defined by sample points and developed polynomial-time
algorithms for regular sets such as disks, rectangles, and stripes. Comparison of
their complexity and performance suggests that prior knowledge about the changes
allows us to design searching sets to fit the changed sets and therefore significantly
improve the performance.
In Chapter 3, we considered the problem of detecting information flows by
timing analysis when there is no chaff noise in the measurements. We modelled in-
formation flows by constraints such as causality, packet conservation, and bounded
delay or bounded memory. While the bounded delay condition is only applicable to
201
interactive information flows, the bounded memory condition is always satisfied in
sensor networks due to limited memory size per sensor. We proposed a matching-
based algorithm under the bounded delay model and a rank-based algorithm under
the bounded memory model. We showed that the algorithms have zero miss de-
tection and exponentially decaying false alarm probabilities if independent traffic
can be modelled as Poisson processes. A comparison of error exponents and sim-
ulations both show that the proposed algorithms outperform existing algorithms.
Comparison between the proposed algorithms suggests that it is easier to detect
information flows with bounded delay than with bounded memory if the traffic
rate is sufficiently low and vice versa. Since pairwise detection already yields suf-
ficiently good performance, we can safely decompose the detection of multi-hop
flows into subproblems of detecting 2-hop flows for every pairs.
In Chapter 4, we generalized the detection of information flows to allow chaff
noise in the measurements. The insertion of chaff noise makes it impossible to
detect information flows when the mixture of information flows and chaff noise
becomes statistically independent. We used the minimum fraction of chaff noise
required to mimic independent traffic to characterize the level of detectability of
information flows. Optimal chaff-inserting algorithms are developed to compute
the minimum fraction of chaff, and threshold detectors based on these algorithms
are proposed to achieve Chernoff-consistent detection in the presence of chaff noise.
Our analysis shows that pairwise detection can be easily defeated by a relatively
small amount of chaff noise. Thus, unlike the case of no chaff noise, pairwise detec-
tion alone can no longer provide satisfactory performance. To solve this problem,
we extend the scope of detector to multiple hops. Such extension significantly
improves the robustness against chaff noise. In particular, for Poisson null hy-
pothesis, the fraction of chaff noise for which Chernoff-consistent detection can be
202
achieved converges to one as the number of hops increases, implying that it is al-
most impossible to hide arbitrarily long paths. Although Poisson assumption has
been made under the null hypothesis to facilitate analysis, we showed both theo-
retically and experimentally that independent traffic in practice is even easier to
distinguish from information flows, implying that our results for Poisson processes
provide lower bounds on the detection performance of practical information flows.
In Chapter 5, we further extended the detection of information flows to the
scenario where there are capacity constraints in data collection. In this chapter,
we focus on bounded delay information flows through a pair of nodes. Still mea-
suring performance by the maximum fraction of chaff noise for Chernoff-consistent
detection, we extended the definitions in Chapter 4 to incorporate quantization
performed at the eavesdroppers. The minimum fraction of chaff noise required
to mimic both the marginal distributions at the eavesdroppers and the joint dis-
tribution of quantized measurements at the fusion center gives an upper bound
on the level of detectability as a function of the capacity constraints. Although
the optimal performance remains unknown, we designed practical detection sys-
tems to give achievable lower bounds. The detection systems consist of simple
slot-based quantizers and threshold detectors based on the optimal chaff-inserting
algorithms for quantized measurements. Specifically, we proposed a slotted quan-
tizer which quantizes transmission epochs to numbers of epochs in each slot and
a one-bit quantizer which further compresses the results of slotted quantization
to binary indicators of empty or nonempty slots. For each quantizer, linear-time
algorithms are developed to implement the detector both with and without full
side-information. Numerical comparison of the performance of the proposed de-
tection systems for Poisson processes shows that the two types of quantization
schemes have similar performance at low traffic rate, but slotted quantization be-
203
comes increasingly advantageous as traffic rate increases. This result combined
with previous results in [31] suggests that slotted quantization is a reasonably
good method to compress Poisson processes.
The change detection and estimation problem in Chapter 2 is purely non-
parametric. The information flow detection problem in Chapter 3–5 is partially
nonparametric because no parametric assumption is made for information flows,
but distributions under the null hypothesis are assumed to be known (indepen-
dent Poisson processes in the analysis). Moreover, in Chapter 5, the processes are
assumed to have the same marginal distributions under both hypotheses.
6.1 Publications
The following is a list of journal publications/submissions that contain parts of
this thesis.
• T. He, S. Ben-David, and L. Tong, “Nonparametric Change Detection and
Estimation in Large-Scale Sensor Networks,” IEEE Transactions on Signal
Processing, vol. 54, no. 4, pp. 1204–1217, April 2006.
• T. He and L. Tong, “Detecting Encrypted Stepping-Stone Connections,”
IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1612–1623,
May 2007.
• T. He and L. Tong, “Detection of Information Flows,” submitted to IEEE
Transactions on Information Theory, 2007.
• T. He and L. Tong, “Distributed Detection of Information Flows,” submitted
to IEEE Transactions on Information Theory, 2007.
204
6.2 Future Directions
The advantage of nonparametric techniques over their parametric counterparts lies
in that they provide reasonable performance without specific parametric knowledge
about the actual distributions. Therefore, it is crucial that in partially nonpara-
metric techniques such as those for information flow detection, the parametric
assumptions are generally satisfied in applications of practical interest. Although
we have shown by analytical arguments and some experimental data that the pro-
posed detectors will probably have even better performance on real traffic, it is
desirable to verify the statement by more extensive study and experiments with
actual traces. Moreover, since most of the related experimental work has been
done in the context of internet, it is of interest to implement the detection schemes
in wireless networks, especially wireless sensor networks, to investigate the oppor-
tunities and challenges present in these contexts.
205
BIBLIOGRAPHY
[1] R. Ahlswede and I. Csiszar. Hypothesis testing with communication con-straints. Information Theory, IEEE Transactions on, 32(4):533–542, 1986.
[2] S. Ben-David, J. Gehrke, and D. Kifer. Detecting Change in Data Streams.In Proc. 2004 VLDB Conference, Toronto, Canada, 2004.
[3] S. Ben-David, T. He, and L. Tong. Non-Parametric Approach to ChangeDetection and Estimation in Large Scale Sensor Networks. In Proceedings ofthe 2004 Conference on Information Sciences and Systems, Princeton, NJ,March 2004.
[4] D. P. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1992.
[5] A. Blum, D. Song, and S. Venkataraman. Detection of Interactive Step-ping Stones: Algorithms and Confidence Bounds. In Conference of RecentAdvance in Intrusion Detection (RAID), Sophia Antipolis, French Riviera,France, September 2004.
[6] O. Bousquet, U. V. Luxburg, and G. R atsch. Advanced Lectures on MachineLearning. Springer, Heidelberg, Germany, 2004.
[7] T. Cover and J. Thomas. Elements of Information Theory. John Wiley &Sons, Inc., 1991.
[8] D.R. Cox and H.D. Miller. The Theory of Stochastic Processes. John Wiley& Sons Inc., New York, 1965.
[9] Frank den Hollander. Large Deviations (Fields Institute Monographs, 14).American Mathematical Society, 2000.
[10] J. Deng, R. Han, and S. Mishra. Intrusion tolerance and anti-traffic analysisstrategies for wireless sensor networks. In IEEE International Conferenceon Dependable Systems and Networks (DSN), pages 594–603, Florence, Italy,June 2004.
[11] D. Donoho, A.G. Flesia, U. Shankar, V. Paxson, J. Coit, and S. Staniford.Multiscale stepping-stone detection: Detecting pairs of jittered interactivestreams by exploiting maximum tolerable delay. In 5th International Sympo-sium on Recent Advances in Intrusion Detection, Lecture Notes in ComputerScience 2516, 2002.
206
[12] N. Ferguson and B. Schneier. Practical Cryptography. John Wiley & Sons,Inc., Indianapolis,IN, 2003.
[13] J. D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference. Mar-cel Dekker, 2003.
[14] J. Giles and B. Hajek. An Information-Theoretic and Game-Theoretic Studyof Timing Channels. IEEE Transactions on Information Theory, 48(9):2455–2477, September 2002.
[15] Piyush Gupta and P. R. Kumar. The capacity of wireless networks. IEEETrans. Inform. Theory, 46(2):388–404, March 2000.
[16] Te Sun Han and S. Amari. Statistical inference under multiterminal datacompression. IEEE Trans. Inform. Theory, 44(6):2300–2324, Oct. 1998.
[17] T. He and L. Tong. On A-distance and Relative A-distance.Technical Report ACSP-TR-08-04-02, Cornell University, August 2004.http://acsp.ece.cornell.edu/pubR.html.
[18] T. He and L. Tong. An Almost Surely Complete Subset of PlanarDisks. Technical Report ACSP-TR-04-05-01, Cornell University, April 2005.http://acsp.ece.cornell.edu/pubR.html.
[19] T. He and L. Tong. Detecting Encrypted Stepping-Stone Connections. IEEETransactions on Signal Processing, 55(5):1612–1623, May 2007.
[20] T. He, L. Tong, and A. Swami. Nonparametric Change Estimation in 2DRandom Fields. In Proc. of IEEE MILCOM’05, Atlantic City, NJ, October2005.
[21] T. Hettmansperger and M. Keenan. Tailweight, Statistical Inference, andFamilies of Distributions - A Brief Survey. Statistical Distributions in Scien-tific Work, 1:161–172, 1980.
[22] Myles Hollander and Douglas A. Wolfe. Nonparametric Statistical Methods.Wiley Interscience, 1973.
[23] X. Hong, P. Wang, J. Kong, Q. Zheng, and J. Liu. Effective ProbabilisticApproach Protecting Sensor Traffic. In Military Communications Conference,2005, pages 1–7, Atlantic City, NJ, Oct. 2005.
207
[24] Y. Hong and A. Scaglione. Distributed change detection in large scale sensornetworks through the synchronization of pulse-coupled oscillators. In Proc.Intl. Conf. Acoust., Speech, and Signal Processing, pages 869 – 872, Montreal,Canada, May 2004.
[25] N. Kingsbury. Approximation formulae for the Gaussian error in-tegral Q(x). Technical Report m11067, Connexions, June 2005.http://cnx.org/content/m11067/latest/.
[26] D. Kotz and K. Essien. Analysis of a campus-wide wireless network. ACMWireless Networks Journal, 11(1-2):115–133, Jan. 2005.
[27] N. Patwari, A. O. Hero, and B. M. Sadler. Hierarchical censoring sensors forchange detection. In 2003 IEEE Workshop on Statistical Signal Processing,pages 21–24, St. Louis, MO, September 2003.
[28] V. Paxson and S. Floyd. Wide-Area Traffic: The Failure of Poisson Modeling.IEEE/ACM Transactions on Networking, 3(3):226–244, June 1995.
[29] P. Peng, P. Ning, D.S. Reeves, and X. Wang. Active Timing-Based Correla-tion of Perturbed Traffic Flows with Chaff Packets. In Proc. 25th IEEE In-ternational Conference on Distributed Computing Systems Workshops, pages107–113, Columbus, OH, June 2005.
[30] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag, New York, 1994.
[31] I. Rubin. Information Rates and Data-Compression Schemes for Poisson Pro-cesses. IEEE Transactions on Information Theory, 20(2):200–210, March1974.
[32] J. Shao. Mathematical Statistics. Springer, 1999.
[33] David J. Sheskin. Handbook of Parametric and Nonparametric StatisticalProcedures. Chapman & Hall/CRC, 2004. 3rd Ed.
[34] S. Staniford-Chen and L.T. Heberlein. Holding intruders accountable on theinternet. In Proc. the 1995 IEEE Symposium on Security and Privacy, pages39–49, Oakland, CA, May 1995.
[35] D. Tang and M. Baker. Analysis of a local-area wireless network. In MOBI-COM, pages 1–10, Boston, MA, Aug. 2000.
208
[36] L. Tong, Q. Zhao, and S. Adireddy. Sensor Networks with Mobile Agents. InProc. 2003 Intl. Symp. Military Communications, Boston, MA, Oct. 2003.
[37] John N. Tsitsiklis. Decentralized Detection. Advances in Statistical SignalProcessing, 2:297–344, 1993.
[38] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,New York, NY, 1995.
[39] V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York,NY, 1998.
[40] V.N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of rela-tive frequencie of events to their probabilities. Theory of Probability and itsApplications, 16:264–280, 1971.
[41] P. Venkitasubramaniam, T. He, and L. Tong. Anonymous Networking amidstEavesdroppers. Submitted to IEEE Transactions on Information Theory: Spe-cial Issue on Information-Thoeretic Security, Feb. 2007.
[42] S. Verdu. The Exponential Distribution in Information Theory. Problems ofInformation Transmission, 32(1):86–95, 1996.
[43] X. Wang. The loop fallacy and serialization in tracing intrusion connectionsthrough stepping stones. In Proc. of the 2004 ACM Symposium on AppliedComputing, pages 404–411, Nicosia, Cyprus, March 2004.
[44] X. Wang and D. Reeves. Robust correlation of encrypted attack traffic throughstepping stones by manipulation of inter-packet delays. In Proc. of the 2003ACM Conference on Computer and Communications Security, pages 20–29,2003.
[45] X. Wang, D. Reeves, and S. Wu. Inter-packet delay-based correlation for trac-ing encrypted connections through stepping stones. In 7th European Sympo-sium on Research in Computer Security, Lecture Notes in Computer Science2502, pages 244–263, 2002.
[46] X. Wang, D. Reeves, S. Wu, and J. Yuill. Sleepy watermark tracing: Anactive network-based intrusion response framework. In Proc. of the 16th In-ternational Information Security Conference, pages 369–384, 2001.
209
[47] R. S. Wenocur and R. M. Dudley. Some Special Vapnik-Chervonenkis Classes.Discrete Mathematics, 33:313–318, 1981.
[48] K. Yoda and H. Etoh. Finding a connection chain for tracing intruders. In6th European Symposium on Research in Computer Security, Lecture Notesin Computer Science 1895, Toulouse, France, October 2000.
[49] L. Zhang, A.G. Persaud, A. Johson, and Y. Guan. Stepping Stone AttackAttribution in Non-cooperative IP Networks. In Proc. of the 25th IEEE Inter-national Performance Computing and Communications Conference (IPCCC2006), Phoenix, AZ, April 2006.
[50] Y. Zhang, W. Lee, and Y. Huang. Intrusion detection techniques for mobilewireless networks. ACM Wireless Networks Journal, 9(5):545–556, Sept. 2003.
[51] Y. Zhang and V. Paxson. Detecting stepping stones. In Proc. the 9th USENIXSecurity Symposium, pages 171–184, August 2000.
[52] Y. Zhu, X. Fu, B. Graham, R.Bettati, and W. Zhao. On flow correlationattacks and countermeasures in mix networks. In Proceedings of Privacy En-hancing Technologies workshop, May 26-28 2004.
210
Top Related