Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang...

24
Sketch-based Change Sketch-based Change Detection Detection Balachander Krishnamurthy (AT&T) Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference ACM Internet Measurement Conference 2003 2003

Transcript of Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang...

Page 1: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

Sketch-based Change Sketch-based Change DetectionDetection

Balachander Krishnamurthy (AT&T)Balachander Krishnamurthy (AT&T)Subhabrata Sen (AT&T) Subhabrata Sen (AT&T)

Yin Zhang (AT&T)Yin Zhang (AT&T)Yan Chen (UCB/AT&T)Yan Chen (UCB/AT&T)

ACM Internet Measurement Conference 2003ACM Internet Measurement Conference 2003

Page 2: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

2

Network Anomaly DetectionNetwork Anomaly Detection• Network anomalies are common

– Flash crowds, failures, DoS, worms, …

• Want to catch them quickly and accurately• Two basic approaches

– Signature-based: looking for known patterns• E.g. backscatter [Moore et al.] uses address uniformity• Easy to evade (e.g., mutating worms)

– Statistics-based: looking for abnormal behavior• E.g., heavy hitters, big changes• Prior knowledge not required

This talk

Page 3: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

3

Change DetectionChange Detection

• Lots of prior work– Simple smoothing & forecasting

• Exponentially weighted moving average (EWMA)

– Box-Jenkins (ARIMA) modeling• Tsay, Chen/Liu (in statistics and economics)

– Wavelet-based approach• Barford et al. [IMW01, IMW02]

Page 4: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

4

The ChallengeThe Challenge• Potentially tens of millions of time series !

– Need to work at very low aggregation level (e.g., IP level)• Changes may be buried inside aggregated traffic

– The Moore’s Law on traffic growth …

• Per-flow analysis is too slow / expensive– Want to work in near real time

• Existing approaches not directly applicable– Estan & Varghese focus on heavy-hitters

Need scalable change detection

Page 5: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

5

Sketch-based Change Sketch-based Change DetectionDetection

• Input stream: (key, update)

Sketchmodule

Forecastmodule(s)

Change detectionmodule

(k,u) … SketchesError

Sketch Alarms

• Report flows with large forecast errors

• Summarize input stream using sketches• Build forecast models on top of sketches

Page 6: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

6

OutlineOutline

• Sketch-based change detection– Sketch module– Forecast module– Change detection module

• Evaluation• Conclusion & future work

Page 7: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

7

SketchSketch

• Probabilistic summary of data streams– Originated in STOC 1996 [AMS96]– Widely used in database research to

handle massive data streams

Space Accuracy

Hash table

Per-key state

100%

Sketch CompactWith probabilistic guarantees (better for larger values)

Page 8: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

8

K-ary SketchK-ary Sketch

• Array of hash tables: Tj[K] (j = 1, …, H)– Similar to count sketch, counting bloom

filter, multi-stage filter, …

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)

• Update (k, u): Tj [ hj(k)] += u (for all j)

Page 9: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

9

K

KsumkhT jjj /11

/)]([median

K-ary Sketch (cont’d)K-ary Sketch (cont’d)• Estimate v(S, k): sum of updates for key

k

compensatefor signal loss

v(S, k) + noise

v(S, k)/K + E(noise)

boostconfidence

unbiased estimator of v(S,k) with low variance

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)

Page 10: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

10

K-ary Sketch (cont’d)K-ary Sketch (cont’d)

• Estimate the second moment (F2)

• Sketches are linear– Can combine sketches

– Can aggregate data from different times, locations, and sources

+ =

k

22 k)v(S, (S)F

Page 11: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

11

Forecast Model: EWMAForecast Model: EWMA

• Compute forecast error sketch: Serror

=

Sforecast(t) Sobserved(t-1) Sforecast(t-1)

= -

Serror(t-1) Sobserved(t-1) Sforecast(t-1)

• Update forecast sketch: Sforecast

Page 12: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

12

Other Forecast ModelsOther Forecast Models

• Simple smoothing methods– Moving Average (MA)– S-shaped Moving Average (SMA)– Non-Seasonal Holt-Winters (NSHW)

• ARIMA models (p,d,q)– ARIMA 0 (p 2, d=0, q 2)

– ARIMA 1 (p 2, d=1, q 2)

Page 13: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

13

Find Big ChangesFind Big Changes

• Top N – Find N biggest forecast errors– Need to maintain a heap

• Thresholding– Find forecast errors above a

threshold)(SFThresh k) ,v(S error2error

Page 14: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

14

Evaluation MethodologyEvaluation Methodology• Accuracy

– Metric: similarity to per-flow analysis results– This talk focuses on

• TopN (Thresholding is very similar)• Accuracy on real traces (Also has data-

independent probabilistic accuracy guarantees)

• Efficiency– Metric: time per operation

• Dataset description

DataDuratio

n#router

s

#records

Total Range

Netflow 4 hours 10 190M861K – 60M

Page 15: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

15

Experimental parametersExperimental parameters

Parameter

Values

H 1, 5, 9, 25

K 8K, 32K, 64K

N 50, 100, 500, 1000

Interval 1 min., 5 min.

Model 6 forecast models

Router 10 (this talk: Large, Medium)

Page 16: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

16

AccuracyAccuracy

0.5

0.6

0.7

0.8

0.9

1

0 30 60 90 120 150 180

Sim

ilari

ty

Time (unit = 1 min.)

topN=50topN=100topN=500

topN=10000.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Sim

ilari

ty

Time (unit = 5 min.)

topN=50topN=100topN=500

topN=1000

H = 5, K = 32768, Router = Large

Similarity = | TopN_sketch TopN_perflow | / N

Page 17: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

17

Accuracy (cont’d)Accuracy (cont’d)

0

0.2

0.4

0.6

0.8

1

8192 32768 65536

Ave

rage

Sim

ilari

ty

K

topN=50, H=5topN=100, H=5topN=500, H=5

topN=1000, H=50

0.2

0.4

0.6

0.8

1

8192 32768 65536

Ave

rage

Sim

ilari

ty

K

topN=50, H=5topN=100, H=5topN=500, H=5

topN=1000, H=5

Interval = 1 min.Interval = 5 min.

Model = EWMA, Router = Large

Page 18: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

18

Accuracy (cont’d)Accuracy (cont’d)

0

0.2

0.4

0.6

0.8

1

8192 32768 65536

Ave

rage

Sim

ilari

ty

K

topN=50, H=5topN=100, H=5topN=500, H=5

topN=1000, H=50

0.2

0.4

0.6

0.8

1

8192 32768 65536

Ave

rage

Sim

ilari

ty

K

topN=50, H=5topN=100, H=5topN=500, H=5

topN=1000, H=5

Router = Large Router = Medium

Model = ARIMA 0, Interval = 5 min.

Page 19: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

19

Accuracy SummaryAccuracy Summary

• For small N (50, 100), even small K (8K) gives very high accuracy

• For large N (1000), K = 32K gives about 95% accuracy

• Router, interval, and forecast model make little difference

• H generally has little impact

Page 20: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

20

EfficiencyEfficiency

Operation(H=5, K = 64K)

Nanoseconds per operation

400MHzSGI R12k

900MHzUltrasparc-II

Hash computation 34 89

Update cost 81 45Estimate cost 269 146

Can potentially work in near real time.

1 Gbps = 320 nsec per 40-byte packet

Page 21: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

21

• Sketch-based change detection

• Scalable– Can handle tens of millions of time series

• Accurate– Provable probabilistic accuracy guarantees– Even more accurate on real Internet traces

• Efficient– Can potentially work in near real time

ConclusionConclusion

Sketchmodule

Forecastmodule(s)

Change detectionmodule

(k,u) … Sketches

ErrorSketch Alarms

Page 22: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

22

Ongoing and Future WorkOngoing and Future Work• Refinements

– Avoid boundary effects due to fixed interval– Automatically reconfigure parameters– Combine with sampling

• Extensions– Online detection of multi-dimensional

hierarchical heavy hitters and changes

• Applications– Building block for network anomaly

detection

Page 23: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

23

Thank you!Thank you!

Page 24: Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

24

Heavy Hitter vs. Change Heavy Hitter vs. Change DetectionDetection

• Change detection for heavy hitters– Can potentially use different parameters for

different flows

– Heavy hitter ≠ big change• Need small threshold to avoid missing changes

– Aggregation is difficult

• Sketch-based change detection– All flows share the same model parameters– Aggregation is very easy due to linearity