Download - 1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.

1

In-Network PCA andAnomaly Detection

Ling Huang* XuanLong Nguyen* Minos Garofalakis§

Michael Jordan* Anthony Joseph* Nina Taft§

*UC Berkeley §Intel Research

{hling, xuanlong, jordan, adj}@cs.berkeley.edu {minos.garofalakis, [email protected]

2

Detection of Network-wide Anomalies A volume anomaly is a sudden change in

an Origin-Destination flow (i.e., point to point traffic)

Given only link traffic measurements, efficiently diagnose the volume anomalies

H1

H2

The backbone network

Regional network 1Regional network 2

3

An Illustration

Observed network link flow = aggregate of application-level flows

Anomalies in (unobserved) application-level flow

Finding anomalies in high-dimensional, noisy data is difficult !

4

The PCA Method An approach to separate normal from anomalous

traffic Normal Subspace : space spanned by the top

k principal components Anomalous Subspace : space spanned by the

remaining principal components Then, decompose traffic on all links by projecting

onto and to obtain:

Traffic vector of all links at a particular point in time

Normal trafficvector

Residual trafficvector

5

A Geometric Illustration

In general, anomalous traffic results in a large value of , where

yCyyCy ababnono

aby

y

aby

noy

6

Detection Illustration

Value ofover time

at anomalous time points clearly stand out2

abyC

over time(residual part)

Value of

αQ

7

Y

Y

n

m

Operation center

The Centralized Algorithm

Y/mY A onPCA T

The Network

Eigen values

αQ

Threshold

Eigen vectors

abC

Projection

Data matrix Y (m x n) n links

m time points (data)

Detection procedure Raise a flag if

αab QyC Periodically (e.g. once a week)

8

Scalability Issues of Centralized Approach As the number of monitoring devices grow (up to

hundreds or thousands network data features) central processing site overloaded certain networks do not overprovision inter-site connectivity

When anomalies occur on smaller time scales (down to second or sub-second scales) “periodic push” has to be applied on second or sub-second

scales the volume of data transmitted through network would

explode

9

Our Distributed Approach A communication-efficient framework that

detects anomalies at desired accuracy level minimizes data communication cost

A distributed protocol for data processing local monitors decide when to update data to coordinator coordinator makes global decision and feedback to monitors

An algorithmic frame guide the tradeoff simple algorithm for determining filtering parameters given

desired detection accuracy stochastic matrix perturbation theory quantify

how a perturbed data matrix impacts its eigen structures how perturbed eigen structures impacts the detection accuracy

10

Our In-Network Detection Framework

Anomaly

User inputs

Originalmonitoredtime series

Processedtime series

Distr. Monitors

Coordinator

n ,,1

11

The Protocol At Monitors Each monitor updates information to

coordinator if its incoming signal

where (filtering slacks) are adaptively computed by the coordinator

can be based on any prediction model built by on its data at an update time e.g., the average of last 5 signal values observed

locally at

iM

iii tRt )()(Y *

)(Y ti

n ,,1

iM

iM

)( *tRi*t

12

The coordinator makes a new row

where

The Protocol At The Coordinator

otherwise ,)(Rimonitor from

updates getting if ,)(Y

)(Y*t

t

t

i

i

i

)(Y)(Yy 1 tt n

If any element in is updated Update Compute new

Perform detection usingαab QyC

y

αab Q and C

Y

13

The Tradeoff The bigger the , the less communication,

but the more the detection error Need an algorithm to related to

detection accuracy

αab QyC

αab QyC

Difference?

n ,,1

Eigen Vectors

Eigen Values

i

Raw data y in blue, data available for detection in red

y

14

Parameter Design and Error Control (I) Given upper bound of false alarm , determine the

monitor slacks ’s

Perturbation analysis: from deviation of false alarm to monitor slacks

15

Let and are eigenvalues of the covariance matrices and

Define the perturbation matrix

Define the eigen error

From matrix perturbation theory, we have

So the key point is to estimate in terms of slaks ’s

Parameter Design and Error Control (II)

16

Let and Standard assumptions on the filtering error

matrix W:

Eigen Error Monitor Slacks ’s (I)

17

Eigen-Error Monitor Slacks ’s (II)

Where: , n is number of monitors and m is the number of data points.

18

Detection Error Eigen-Error (I) Basic idea: study how eigen error impacts

detection error With full data, false alarm rate is

With approximate data, we only have perturbed version

Given eigen error, we can compute the false alarm rate (though not in closed-form solution) Inverse dependency: given desired false alarm rate, we can

determine tolerable eigen error by fast binary search

19

Detection Error Eigen-Error (I) Consider normalized random variable

For approximate data, we only obtain

Let denote an upper bound on The deviation of false alarm rate can be

approximate as

The upper bound of false alarm rate is

X

20

Evaluation Given a tolerable deviation of false alarm

rate, we can determine system parameters Using system parameters, we can evaluate

the actual detection accuracy using simulation

Experiment setup Abilene backbone network data Traffic matrices of size 1008 X 41 Set uniform slack for all monitors

21

Results

Monitor slacks, communication cost and detection error

22

Results (II)