Sequential analysis: balancing the tradeoff between detection accuracy and detection delay
-
Upload
daniel-gentry -
Category
Documents
-
view
26 -
download
0
description
Transcript of Sequential analysis: balancing the tradeoff between detection accuracy and detection delay
![Page 1: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/1.jpg)
Sequential analysis:balancing the tradeoff between detection
accuracy and detection delay
XuanLong Nguyen
Radlab, 11/06/06
![Page 2: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/2.jpg)
Outline
• Motivation in detection problems– need to minimize detection delay time
• Brief intro to sequential analysis– sequential hypothesis testing– sequential change-point detection
• Applications– Detection of anomalies in network traffic
(network attacks), faulty software, etc
![Page 3: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/3.jpg)
Three quantities of interest in detection problems
• Detection accuracy– False alarm rate– Misdetection rate
• Detection delay time
![Page 4: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/4.jpg)
Network volume anomaly detection[Huang et al, 06]
![Page 5: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/5.jpg)
So far, anomalies treated as isolated events
• Spikes seem to appear out of nowhere
• Hard to predict early short burst– unless we reduce the time
granularity of collected data
• To achieve early detection– have to look at medium to
long-term trend– know when to stop
deliberating
![Page 6: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/6.jpg)
Early detection of anomalous trends
• We want to– distinguish “bad” process from good process/ multiple
processes– detect a point where a “good” process turns bad
• Applicable when evidence accumulates over time (no matter how fast or slow)– e.g., because a router or a server fails– worm propagates its effect
• Sequential analysis is well-suited – minimize the detection time given fixed false alarm and
misdetection rates– balance the tradeoff between these three quantities (false
alarm, misdetection rate, detection time) effectively
![Page 7: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/7.jpg)
Example: Port scan detection• Detect whether a remote host is a
port scanner or a benign host
• Ground truth: based on percentage of local hosts which a remote host has a failed connection
• We set:– for a scanner, the probability of
hitting inactive local host is 0.8– for a benign host, that probability
is 0.1
• Figure: – X: percentage of inactive local
hosts for a remote host– Y: cumulative distribution function
for X
(Jung et al, 2004)
80% bad hosts
![Page 8: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/8.jpg)
Hypothesis testing formulation
• A remote host R attempts to connect a local host at time ilet Yi = 0 if the connection attempt is a success,
1 if failed connection
• As outcomes Y1, Y2,… are observed we wish to determine whether R is a scanner or not
• Two competing hypotheses:
– H0: R is benign
– H1: R is a scanner
1.0)|1( 0 HYP i
8.0)|1( 1 HYP i
![Page 9: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/9.jpg)
An off-line approach
1. Collect sequence of data Y for one day
(wait for a day)
2. Compute the likelihood ratio accumulated over a day
This is related to the proportion of inactive local hosts that R tries to connect (resulting in failed connections)
3. Raise a flag if this statistic exceeds some threshold
![Page 10: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/10.jpg)
A sequential (on-line) solution1. Update accumulative likelihood ratio statistic in an online fashion
2. Raise a flag if this exceeds some threshold
Threshold a
Threshold b
Acc. Likelihood ratio
Stopping time
hour0 24
![Page 11: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/11.jpg)
Comparison with other existing intrusion detection systems (Bro & Snort)
• Efficiency: 1 - #false positives / #true positives• Effectiveness: #false negatives/ #all samples
• N: # of samples used (i.e., detection delay time)
0.9630.0404.08
1.0000.0084.06
![Page 12: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/12.jpg)
Two sequential decision problems
• Sequential hypothesis testing– differentiating “bad” process from “good
process” – E.g., our previous portscan example
• Sequential change-point detection– detecting a point(s) where a “good” process
starts to turn bad
![Page 13: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/13.jpg)
Sequential hypothesis testing• H = 0 (Null hypothesis): normal situation• H = 1 (Alternative hypothesis): abnormal
situation
• Sequence of observed data– X1, X2, X3, …
• Decision consists of– stopping time N (when to stop taking
samples?)– make a hypothesis
H = 0 or H = 1 ?
![Page 14: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/14.jpg)
Quantities of interest
• False alarm rate• Misdetection rate• Expected stopping time (aka number of
samples, or decision delay time) E N
)|1( 0HDP
Frequentist formulation: Bayesian formulation:
)|0( 1HDP
10 and both wrt
][ Minimize
,Fix
ff
NE
][ Minimize
,, weightssomeFix
321
321
NEccc
ccc
![Page 15: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/15.jpg)
Key statistic: Posterior probability
• As more data are observed, the posterior is edging closer to either 0 or 1
• Optimal cost-to-go function is a function of
• G(p) can be computed by Bellman’s update
– G(p) = min { cost if stop now, or cost of taking one more
sample}– G(p) is concave
• Stop: when pn hits thresholds a or b
N(m0,v0)
N(m1,v1)
),...,,|1( 21 nn XXXHPp
np:= optimal G)( npG
0 1 p
G(p)
p1, p2,..,pn
a b
![Page 16: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/16.jpg)
Multiple hypothesis test
• Suppose we have m hypotheses H = 1,2,…,m
• The relevant statistic is posterior probability vector in (m-1) simplex
• Stop when pn reaches on of the corners (passing through red boundary)
nppp ,...,, 10
H=1
H=2
H=3
)),...,,|(),...,,...,,|1(( 2121 nnn XXXmHPXXXHPp
![Page 17: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/17.jpg)
Thresholding posterior probability = thresholding sequential log likelihood ratio
Applying Bayes’ rule:
n
i i
in HXP
HXP
HXP
HXPS
1 )0|(
)1|(log
)0|(
)1|(log:
Log likelihood ratio:
n
n
S
S
n
ec
e
HXPHXPHPHP
HXPHXP
HPHXPHPHXP
HPHXP
XXHP
)0|(/)1|()1(/)0(
)0|(/)1|(
)1()1|()0()0|(
)1()1|(
),...,|1( 1
![Page 18: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/18.jpg)
Thresholds vs. errors
Threshold b
Threshold a
Acc. Likelihood ratio
Stopping time (N)0
Sn
ab
b
ab
a
ee
e
ee
e
bb
aa
1 and
1 So,
1log
1log
1
log 1
log
:ionapproximat sWald'
Exact if
there’s no overshootat hitting
time!
![Page 19: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/19.jpg)
Expected stopping times vs errors
))(/)(log( where,... 011 nnnnn XfXfZZZS
ENEZES iN
),(
1log)1(
1log
),(
)1(
]/[log
]|[)1(]|[
][
][]1|[
01
01
011
11
1
1
ffKL
ffKL
ba
ffE
bthresholdhitsSEathresholdhitsSE
ZE
SEHNE
NN
i
N
The stopping time of hitting time N of a random walk
What is E[N]?
Wald’s equation
![Page 20: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/20.jpg)
Outline
• Sequential hypothesis testing
• Change-point detection– Off-line formulation
• methods based on clustering /maximum likelihood
– On-line (sequential) formulation• Minimax method • Bayesian method
– Application in detecting network traffic anomalies
![Page 21: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/21.jpg)
Change-point detection problem
Identify where there is a change in the data sequence– change in mean, dispersion, correlation function, spectral
density, etc…– generally change in distribution
Xt
t1 t2
![Page 22: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/22.jpg)
Off-line change-point detection
• Viewed as a clustering problem across time axis– Change points being the boundary of clusters
• Partition time series data that respects– Homogeneity within a partition– Heterogeneity between partitions
![Page 23: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/23.jpg)
A heuristic: clustering by minimizing intra-partition variance
• Suppose that we look at a mean changing process
• Suppose also that there is only one change point
• Define running mean x[i..j]
• Define variation within a partition Asq[i..j]
• Seek a time point v that minimizes the sum of variations G
]..[]..1[:
])..[(:]..[
)...(1
1:]..[
2
nvAvAG
jixxjiA
xxij
jix
sqsq
j
ikksq
ji
(Fisher, 1958)
![Page 24: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/24.jpg)
Statistical inference of change point
• A change point is considered as a latent variable
• Statistical inference of change point location via– frequentist method, e.g., maximum likelihood
estimation– Bayesian method by inferring posterior
probability
![Page 25: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/25.jpg)
Maximum-likelihood method
n
vii
v
iiv
n
xfxfxl
H
nv
H
XXX
)(log)(log)(
: toingcorrespondfunction Likelihood
},...,2,1{ dist.uniformly is
hypothesisconsider n,1,2,...,each For
observed are ,...,,
1
1
10
21
vjxlxl
H
jv allfor )()(
if accepted is :estimate MLE
k
i i
ik
k
xf
xfS
kS
1 0
1
)(
)(log
, toup ratio likelihood thebeLet
Hypothesis Hv: sequence has density f0 before v, and f1 after
Hypothesis H0: sequence is stochastically homogeneous
This is the precursor for varioussequential procedures (to come!)
Sk
v1 n
f0f1
k
[Page, 1965]
vjxlxl
H
jv allfor )()(
if accepted is :estimate MLE
vkSS
vkSSkv
vk
vk
allfor
, allfor |:
as written becan estimateour then
![Page 26: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/26.jpg)
Maximum-likelihood method
2
1111
2
)(1
maxarg:
thenknown, are If
),(~ that Suppose
n
tiint
i
ii
xtn
v
Nf
[Hinkley, 1970,1971]
n
tiit
t
iit
ttnt
i
xtn
xxt
x
xxn
tntv
1
*
1
2*11
1 ,
1
where
)()(
maxarg:
thenunknown, are both If
![Page 27: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/27.jpg)
Sequential change-point detection
• Data are observed serially• There is a change from
distribution f0 to f1 in at time point v
• Raise an alarm if change is detected at N
Need to (a) Minimize the false alarm rate
(b) Minimize the average delay to detection
Change point v
False alarm
Delayed alarm
f0 f1
timeN
![Page 28: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/28.jpg)
Minimax formulationAmong all procedures such that the time to false alarm is bounded from below by a constant T, find a procedure thatminimizes the average delay to detection
}:{ TNENT
point) change no (i.e., at vpoint change~
at vpoint change ~
E
kEk
Class of procedures with false alarm condition
Average delay to detection
]|[max:)( kNkNENWAD kk average-worst delay
]|)1[(maxmax:)( )1...(1 kkXk XkNENWWDworst-worst delay
Cusum,SRP tests
Cusum test
![Page 29: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/29.jpg)
Bayesian formulationAssume a prior distribution of the change point
Among all procedures such that the false alarm probability is less than \alpha, find a procedure that minimizes the average delay to detection
1
)()()(k
kk kNPvNPNPFA
False alarm condition
]|[:)( vNvNENADD
)|()()(
1
0
kNkNEkNPvNP k
kkk
Average delay to detecion
Shiryaev’s test
![Page 30: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/30.jpg)
All procedures involve running likelihood ratios
H
Hypothesis Hv: sequence has density f0 before v, and f1 after
Hypothesis : no change point
njv j
j
ni i
vi njv ji
n
vnvn
Xf
Xf
Xf
XfXf
HXP
HXPXS
)(
)(log
)(
)()(log
)|(
)|(log:)(
0
1
1 0
1 10
...1
...1
Likelihood ratio for v = k vs. v = infinity
All procedures involve online thresholding: Stop whenever the statistic exceeds a threshold b
)(max)( 1 XSXg knnkn Cusum test :
nk
XSn
kneXh
1
)()(Shiryaev-Roberts-Polak’s:
nk
XSk
nn
kne
XnvPXu
1
)(
...1
~
)|()(
Shiryaev’s Bayesian test:
![Page 31: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/31.jpg)
Cusum test (Page, 1966)
]|[max:)( kNkNENWAD kk
gn
b
Stopping time N
))(
)(log,0max(;0
formrecurrent in written becan
0
110
n
nnn
n
xf
xfggg
g
b
bgnN n
thresholdsomefor
}:1min{
:rule following theproposed Page
This test minimizes the worst-average detection delay (in an asymptotic sense):
)(max)( 1 XSXg knnkn
![Page 32: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/32.jpg)
Generalized likelihood ratio
1,0|)|(~ ixPf ii
),...,(maxarg: 11 nXXP
Unfortunately, we don’t know f0 and f1
Assume that they follow the form
f0 is estimated from “normal” training data f1 is estimated on the flight (on test data)
Sequential generalized likelihood ratio statistic (same as CUSUM):
)(max
)(
)|(logmax
0
1 0
11
1
knnk
n
k
j j
jn
RRg
xf
xfR
Our testing rule: Stop and declare the change point at the first n such that
gn exceeds a threshold b
![Page 33: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/33.jpg)
Change point detection in network traffic
Data features: number of good packets received that were directed to the broadcast address
number of Ethernet packets with an unknown protocol type
number of good address resolution protocol (ARP) packets on the segment
number of incoming TCP connection requests (TCP packetswith SYN flag set)
[Hajji, 2005]
N(m,v)
N(m1,v1)
Changed behavior
N(m0,v0)
Each feature is modeled as a mixture of 3-4 gaussiansto adjust to the daily traffic patterns (night hours vs day times,weekday vs. weekends,…)
![Page 34: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/34.jpg)
Subtle change in traffic(aggregated statistic vs individual variables)
Caused by web robots
![Page 35: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/35.jpg)
Adaptability to normal daily and weekely fluctuations
weekend
PM time
![Page 36: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/36.jpg)
Anomalies detected
Broadcast storms, DoS attacksinjected 2 broadcast/sec
16mins delay
Sustained rate of TCP connection requests
injecting 10 packets/sec
17mins delay
![Page 37: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/37.jpg)
Anomalies detected
ARP cache poisoning attacks
TCP SYN DoS attack, excessivetraffic load
16mins delay
50 seconds delay
![Page 38: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/38.jpg)
Summary
• Sequential hypothesis test– distinguish “good” process from “bad”
• Sequential change-point detection– detecting where a process changes its behavior
• Framework for optimal reduction of detection delay
• Sequential tests are very easy to apply– even though the analysis might look difficult
![Page 39: Sequential analysis: balancing the tradeoff between detection accuracy and detection delay](https://reader036.fdocuments.in/reader036/viewer/2022062422/5681355f550346895d9cc6af/html5/thumbnails/39.jpg)
References• Wald, A. Sequential analysis, John Wiley and Sons, Inc, 1947.• Arrow, K., Blackwell, D., Girshik, Ann. Math. Stat., 1949.• Shiryaev, R. Optimal stopping rules, Springer-Verlag, 1978.• Siegmund, D. Sequential analysis, Springer-Verlag, 1985.• Brodsky, B. E. and Darkhovsky B.S. Nonparametric methods in change-point
problems. Kluwer Academic Pub, 1993.• Baum, C. W. & Veeravalli, V.V. A Sequential Procedure for Multihypothesis Testing.
IEEE Trans on Info Thy, 40(6)1994-2007, 1994. • Lai, T.L., Sequential analysis: Some classical problems and new challenges (with
discussion), Statistica Sinica, 11:303—408, 2001.• Mei, Y. Asymptotically optimal methods for sequential change-point detection,
Caltech PhD thesis, 2003.• Hajji, H. Statistical analysis of network traffic for adaptive faults detection, IEEE
Trans Neural Networks, 2005.• Tartakovsky, A & Veeravalli, V.V. General asymptotic Bayesian theory of quickest
change detection. Theory of Probability and Its Applications, 2005• Nguyen, X., Wainwright, M. & Jordan, M.I. On optimal quantization rules in sequential
decision problems. Proc. ISIT, Seattle, 2006.