1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert...

27
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert...

Page 1: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

1

Reversible Sketches for Efficient and Accurate Change Detection over

Network Data Streams

Robert Schweller Ashish GuptaElliot ParsonsYan Chen

Computer Science Department, Northwestern University

Page 2: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

2

Online Change Detection• Network anomalies are common

– Flash crowds, failures, DoS, worms, …

Online Detection over Data Streams

• Data Stream: key/update pairs (k,u)

–Heavy hitters (lots of prior work)

–Heavy changes

Page 3: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

3

-first to detect flow-level heavy changes in massive data streams at network traffic speeds.

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

Page 4: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

4

k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]

1

j

H

0 1 K-1…

……

hj(k)

hH(k)

h1(k)

Update (k, u): Tj [ hj(k)] += u (for all j)

Estimate v(S, k): sum of updates for key k

K

KsumkhT jjj /11

/)]([median

Page 5: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

5

??

Page 6: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

6

??

• Main problem– Cannot efficiently report keys with heavy change

• Our Contribution– Determine set of keys that have “large” estimates in sketch

• Requires very little space:–E.g. 5 hash tables with 16 K buckets = 80 KB–Fits in high speed memory

Page 7: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

7

1

2

3

5

4

“Heavy”

Input:

Output: Set of keys that hash to heavy buckets in majority (or all) hash tables

-Sketch-Threshold

Reverse Sketch Problem

Page 8: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

8

Outline

Streamingdatarecording

k-ary sketch

value

key

Heavychangedetection

k-ary sketch

heavychangekeys

changethreshold

fast

slow

Modularhashing

IP mangling

ReverseHashing

Algorithms

Improve Heavy Change Detection

Page 9: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

9

• Intersect A1, A2, A3, A4, A5

Taking Intersections

H = 5 K = 212 #keys = 232 (IP addresses)

E[false positives] << 1

Page 10: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

10

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

H = 5 K = 212 #keys = 232 (IP addresses)

|A1| = 232 / 212 = 220

Page 11: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

11

The problem with simple intersection• Why is this difficult ?

• Each set Ai can be very large !

• Solution:

Modular hashing

Page 12: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

12

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

010 110 001 101

h()

12 bits

Page 13: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

13

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

Page 14: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

14

Modular hashing reduces the set size

32 bits

8 bits

10010100 10101011 10010101 10100011

h1() h2() h3() h4()

010 110 001 101

010 110 001 101

Greatly reduces size of reverse mapped sets

28/23 = 25

Page 15: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

15

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 16: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

16

1

2

3

5

4

b1

b2

b4

b5

b3

A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25

Modular hashing reduces the set size

Intersection:

Only 32 elements per partition

Page 17: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

17

1

2

3

5

4

b1

b2

b4

b5

b3b3

b1

b2

b4

b5

Handling Multiple Intersections…

2H different intersections

Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

Page 18: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

18

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

Page 19: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

19

Problem: Too many collisions

129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...

7 . 4 . 0 . *

32 bits 12 bits

IP Mangling

Solution:

Page 20: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

20

IP-mangling

Page 21: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

21

Invertible Modular Linear Equation

f(x) a·x mod n

To be invertible: Must be relatively prime

• a is odd, chosen randomly

Page 22: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

22

Modular Hashing

Optimal Hashing

Page 23: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

23

Modular Hashing

Modular Hashing with IP Mangling Optimal Hashing

Page 24: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

24

Recap:

Streamingdatarecording

reversiblek-ary

sketch

value storedvalue

Modularhashing

IP manglingkey

Heavychangedetection

reversiblek-ary

sketch

Reversehashing

ReverseIP mangling

heavychangekeys

changethreshold

)( loglog/1 nn

)loglog

log(

n

n

Page 25: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

25

Evaluation• Traffic traces from Northwestern University edge router

– Each 5 min interval average traffic 7.5 GB in each interval

• Compared with Ground Truth• 6 hash tables, 4K buckets each, totally 192KB memory• Up to 140 true heavy change keys in 1.5 seconds

– Over 95% TPP– Less than 2% FPP

• All missing changes are due to boundary effects

Page 26: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

26

Conclusions/ Future Work

• Sketches: efficient summary structures • Our contribution: Reversible Sketches

– efficient online detection of keys with heavy changes

Work in Progress (see tech report)

• Improved reverse hashing• Statistical guarantee on detection accuracy• More advanced applications:

– Hierarchical change detection• E.g. 129.105.100.* shows a big change !

Page 27: 1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

27

See tech report for more!

http://list.cs.northwestern.edu

Thank you !