Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices...

15
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1 , Augustin Soule 2 , Jennifer Rexford 1 , Christophe Diot 2 1 Princeton University, 2 Thomson Research

description

3 Background Promising applications of PCA to AD [Lakhina et al, SIGCOMM 04 & 05] But we weren’t nearly as successful applying technique to a new data set Same source code What were we doing wrong? Unable to tune the technique

Transcript of Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices...

Page 1: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Sensitivity of PCA forTraffic Anomaly Detection

Evaluating the robustness of current best practices

Haakon Ringberg1, Augustin Soule2,Jennifer Rexford1, Christophe Diot2

1Princeton University, 2Thomson Research

Page 2: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

2

Outline Context

Background and motivation Bigger picture PCA (subspace method) in one slide

Challenges with current PCA methodology Conclusion & future directions

Page 3: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

3

Background

Promising applications of PCA to AD [Lakhina et al, SIGCOMM 04 & 05]

But we weren’t nearly as successful applying technique to a new data set Same source code

What were we doing wrong? Unable to tune the technique

Page 4: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

4

Bigger Picture

Many statistical techniques evaluated for AD e.g., Wavelets, PCA, Kalman filters Promising early results

But questions about performance remain What did the researchers have to do in order to

achieve presented results?

Page 5: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

5

Questions about techniques “Tunability” of technique

Number of parameters Sensitivity to parameters Interpretability of parameters

Other aspects of robustness Sensitivity to drift in underlying data Sensitivity to sampling

Assumptions about the underlying data

Page 6: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

6

Principal Components Analysis (PCA) PCA transforms data

into new coordinate system

Principal components (new bases) ordered by captured variance

The first k (topk) tend to capture periodic trends normal subspace vs. anomalous subspace

Page 7: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

7

Data used

Géant and Abilene networks IP flow traces 21/11 through 28/11 2005 Detected anomalies were

manually inspected

Page 8: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

8

Outline Context Challenges with current PCA methodology

Sensitivity to its parameters Contamination of normalcy Identifying the location of detected anomalies

Conclusion & future directions

Page 9: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

9

Sensitivity to topk

Where is the line drawn between normal and anomalous?

What is too anomalous?

topk

signal

anomalous

normalPCA

Page 10: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

10

Sensitivity to topk

Very sensitive to topk

Total detections and FP Not an issue if topk

were tunable Tried many methods

3σ deviation heuristic Cattell’s Scree Test Humphrey-Ilgen Kaiser’s Criterion

None are reliable

Page 11: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

11

Contamination of normalcy

Large anomalies may be included among topk

Invalidates assumption that top PCs are periodic

Pollutes definition of normal In our study, the outage to

the left affected 75/77 links Only detected on a handful!

signal

anomalous

normalPCA

Page 12: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

12

Conclusion & future directions PCA (subspace method) methodology issues

Sensitivity to topk parameter Contamination of normal subspace Identifying the location of detected anomalies

Generally: room for rigorous evaluation of statistical techniques applied to AD Tunability, robustness

Assumptions about underlying data Under what conditions does method excel?

Page 13: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Thanks!Questions?

Haakon RingbergPrinceton University Computer Sciencehttp://www.cs.princeton.edu/~hlarsen/

Page 14: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

14

Identifying anomaly locations Spikes when state

vector projected on anomaly subspace But network operators

don’t care about this They want to know

where it happened! How do we find the

original location of the anomaly?

state vector

anomaly subspace

Page 15: Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

15

Identifying anomaly locations Previous work used a

simple heuristic Associate detected spike

with k flows with the largest contribution to the state vector v

No clear a priori reason for this association

state vector

anomaly subspace

Network

A

B