Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams...

17
Anomaly Detection in a Mobile Communication Network Alec Pawling, Nitesh V. Chawla, and Greg Madey University of Notre Dame Anomaly Detection in a Mobile Communication Network. – p.1

Transcript of Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams...

Page 1: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Anomaly Detection in a MobileCommunication Network

Alec Pawling, Nitesh V. Chawla, and Greg Madey

University of Notre Dame

Anomaly Detection in a Mobile Communication Network. – p.1

Page 2: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Overview

We present a technique that uses hybrid clustering in con-

junction with statistical process control to handle concept drift

in a data stream.

Anomaly Detection in a Mobile Communication Network. – p.2

Page 3: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Outline

• Motivation• Background

◦ Data streams◦ Concept drift◦ Statistical process control

• Related work• Hybrid clustering for streams• Setup• Results• Conclusion

Anomaly Detection in a Mobile Communication Network. – p.3

Page 4: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Motivation

Application• Detection and Alert System component of WIPER

Emergency Response System [Schoenharl etal., 2006], [Madey, et al., 2006]◦ Detect and report anomalies in network usage◦ Notify Simulation and Prediction System

Difficulties• Massive volume of data• Dynamic system

Anomaly Detection in a Mobile Communication Network. – p.4

Page 5: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Data Streams

• Data can only be read once (due to volume)• Order of data cannot be manipulated• Often, if the underlying process is stationary, anomaly

detection is straightforward• If the underlying process is dynamic, the problem is

difficult

Anomaly Detection in a Mobile Communication Network. – p.5

Page 6: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Concept Drift

• Change in process that generates the data streamover time

• May or may not be periodic

Anomaly Detection in a Mobile Communication Network. – p.6

Page 7: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Concept Drift

0

100

200

300

400

500

600

700

0 1440 2880 4320 5760 7200 8640 10080 11520 12960 14400 15840 17280 18720

Num

ber

of in

stan

ces

Time (min)

Figure 1: GPRS usage over 12 days

Anomaly Detection in a Mobile Communication Network. – p.7

Page 8: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Statistical Process Control

Distinguish between random and assignable variation:threshold is µ ± lω.• Random variation

◦ High probability, little effect on process output• Assignable variation

◦ Low probability, significant effect on process output◦ Change in underlying process

Anomaly Detection in a Mobile Communication Network. – p.8

Page 9: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Statistical Process Control

0

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

1.6e+06

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Tel

epho

ne c

all v

olum

e

Hour of the day

Figure 2: Range of random variance

Anomaly Detection in a Mobile Communication Network. – p.9

Page 10: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Related Work

Intrusion detection (Portnoy, 2001)• Identify intrusions in an unlabeled data set using

leader clustering.• The leader algorithm (Hartigan 1975)

◦ Let d be a distance threshold.◦ Let the first instance assigned to cluster Ci be the

defining instance, ci

◦ For each instance x

• Find the closest cluster, Cj

• If dist(x, ci) < d, add c to Ci• Otherwise, create a new cluster with the defining

instance x.

Anomaly Detection in a Mobile Communication Network. – p.10

Page 11: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Related Work

Problem:• Uses z-score normalization to allow for arbitrary data

distribution:

v′i =vi − v̄i

σi

• This is not possible in one pass

Anomaly Detection in a Mobile Communication Network. – p.11

Page 12: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Related Work

Hybrid clustering algorithms, (Cheu et al., 2004)

1. Cluster to reduce the data set

2. Produce final clusters

Anomaly Detection in a Mobile Communication Network. – p.12

Page 13: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Hybrid Algorithm for Streams

1. Establish clusters with some minimum number ofinstances using a partitional or hierarchical algorithm

2. Incrementally update cluster center and standarddeviations using a variation on the leader algorithm.

Anomaly Detection in a Mobile Communication Network. – p.13

Page 14: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Setup

Data set• Feature vector consists of timestamp and number of

instances of 5 services• One example for each minute of a 12 day period

(18721 examples)

Clustering Algorithms• Expectation Maximization — Weka, cross-validation to

determine number of clusters• Leader• Hybrid for streams: (1) k-means, (2) modified leader

Anomaly Detection in a Mobile Communication Network. – p.14

Page 15: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Results

Hybrid algorithm• Small clusters compared to EM• Little consistency in detected outliers among different

thresholds or values of k

Leader algorithm• More consistency in anomaly detection

Anomaly Detection in a Mobile Communication Network. – p.15

Page 16: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Conclusion

• Algorithms using random values may be a bad idea• Algorithms requiring only threshold parameter seem

promising

Future work• Hierarchical clustering to establish clusters• Examine further how the number of clusters grows

over time

Anomaly Detection in a Mobile Communication Network. – p.16

Page 17: Anomaly Detection in a Mobile Communication Networkdddas/Papers/Pawling_slides.pdfData streams Concept drift Statistical process control • Related work • Hybrid clustering for

Acknowledgments

This work is supported, in part, by the National Sci-

ence Foundation, the DDDAS Program, under Grant No.

CNS_050348.

Anomaly Detection in a Mobile Communication Network. – p.17