1
In-Network PCA andAnomaly Detection
Ling Huang* XuanLong Nguyen* Minos Garofalakis§
Michael Jordan* Anthony Joseph* Nina Taft§
*UC Berkeley §Intel Research
{hling, xuanlong, jordan, adj}@cs.berkeley.edu {minos.garofalakis, [email protected]
2
Detection of Network-wide Anomalies A volume anomaly is a sudden change in
an Origin-Destination flow (i.e., point to point traffic)
Given only link traffic measurements, efficiently diagnose the volume anomalies
H1
H2
The backbone network
Regional network 1Regional network 2
3
An Illustration
Observed network link flow = aggregate of application-level flows
Anomalies in (unobserved) application-level flow
Finding anomalies in high-dimensional, noisy data is difficult !
4
The PCA Method An approach to separate normal from anomalous
traffic Normal Subspace : space spanned by the top
k principal components Anomalous Subspace : space spanned by the
remaining principal components Then, decompose traffic on all links by projecting
onto and to obtain:
Traffic vector of all links at a particular point in time
Normal trafficvector
Residual trafficvector
5
A Geometric Illustration
In general, anomalous traffic results in a large value of , where
yCyyCy ababnono
aby
y
aby
noy
6
Detection Illustration
Value ofover time
at anomalous time points clearly stand out2
abyC
over time(residual part)
Value of
αQ
7
Y
Y
n
m
Operation center
The Centralized Algorithm
Y/mY A onPCA T
The Network
Eigen values
αQ
Threshold
Eigen vectors
abC
Projection
Data matrix Y (m x n) n links
m time points (data)
Detection procedure Raise a flag if
αab QyC Periodically (e.g. once a week)
8
Scalability Issues of Centralized Approach As the number of monitoring devices grow (up to
hundreds or thousands network data features) central processing site overloaded certain networks do not overprovision inter-site connectivity
When anomalies occur on smaller time scales (down to second or sub-second scales) “periodic push” has to be applied on second or sub-second
scales the volume of data transmitted through network would
explode
9
Our Distributed Approach A communication-efficient framework that
detects anomalies at desired accuracy level minimizes data communication cost
A distributed protocol for data processing local monitors decide when to update data to coordinator coordinator makes global decision and feedback to monitors
An algorithmic frame guide the tradeoff simple algorithm for determining filtering parameters given
desired detection accuracy stochastic matrix perturbation theory quantify
how a perturbed data matrix impacts its eigen structures how perturbed eigen structures impacts the detection accuracy
10
Our In-Network Detection Framework
Anomaly
User inputs
Originalmonitoredtime series
Processedtime series
Distr. Monitors
Coordinator
n ,,1
11
The Protocol At Monitors Each monitor updates information to
coordinator if its incoming signal
where (filtering slacks) are adaptively computed by the coordinator
can be based on any prediction model built by on its data at an update time e.g., the average of last 5 signal values observed
locally at
iM
iii tRt )()(Y *
)(Y ti
n ,,1
iM
iM
)( *tRi*t
12
The coordinator makes a new row
where
The Protocol At The Coordinator
otherwise ,)(Rimonitor from
updates getting if ,)(Y
)(Y*t
t
t
i
i
i
)(Y)(Yy 1 tt n
If any element in is updated Update Compute new
Perform detection usingαab QyC
y
αab Q and C
Y
13
The Tradeoff The bigger the , the less communication,
but the more the detection error Need an algorithm to related to
detection accuracy
αab QyC
αab QyC
Difference?
n ,,1
Eigen Vectors
Eigen Values
i
Raw data y in blue, data available for detection in red
y
14
Parameter Design and Error Control (I) Given upper bound of false alarm , determine the
monitor slacks ’s
Perturbation analysis: from deviation of false alarm to monitor slacks
15
Let and are eigenvalues of the covariance matrices and
Define the perturbation matrix
Define the eigen error
From matrix perturbation theory, we have
So the key point is to estimate in terms of slaks ’s
Parameter Design and Error Control (II)
16
Let and Standard assumptions on the filtering error
matrix W:
Eigen Error Monitor Slacks ’s (I)
17
Eigen-Error Monitor Slacks ’s (II)
Where: , n is number of monitors and m is the number of data points.
18
Detection Error Eigen-Error (I) Basic idea: study how eigen error impacts
detection error With full data, false alarm rate is
With approximate data, we only have perturbed version
Given eigen error, we can compute the false alarm rate (though not in closed-form solution) Inverse dependency: given desired false alarm rate, we can
determine tolerable eigen error by fast binary search
19
Detection Error Eigen-Error (I) Consider normalized random variable
For approximate data, we only obtain
Let denote an upper bound on The deviation of false alarm rate can be
approximate as
The upper bound of false alarm rate is
X
20
Evaluation Given a tolerable deviation of false alarm
rate, we can determine system parameters Using system parameters, we can evaluate
the actual detection accuracy using simulation
Experiment setup Abilene backbone network data Traffic matrices of size 1008 X 41 Set uniform slack for all monitors
21
Results
Monitor slacks, communication cost and detection error
22
Results (II)
Top Related