Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications
description
Transcript of Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications
![Page 1: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/1.jpg)
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications
Robert Schweller1, Zhichun Li1, Yan Chen1, Yan Gao1, Ashish Gupta1, Yin Zhang2, Peter Dind
a1, Ming-Yang Kao1, Gokhan Memik1
1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin
![Page 2: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/2.jpg)
The Spread of Sapphire/Slammer Worms
![Page 3: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/3.jpg)
Motivation (online change detection)
• Online network anomaly/intrusion detection over high speed links– Small memory usage– Small # of memory access per packet– Scalable to large key space size
• Primitives for online anomaly detection– Heavy hitters (lots of prior work)– Heavy changes: enabler for aggregate queries
over multiple data streams• Asymmetric routing demands spatial aggregation• Time Series Analysis (TSA) need temporal
aggregation
![Page 4: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/4.jpg)
Outline
• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion
![Page 5: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/5.jpg)
[Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]First to detect flow-level heavy changes in massive data streams at network traffic speeds
K-ary sketch
1
j
H
0 1 K-1…
……
![Page 6: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/6.jpg)
k-ary sketch
1
j
H
0 1 K-1…
……
hj(k)
hH(k)
h1(k)Update (k, u): Tj [ hj(k)] += u (for all j)
Estimate v(S, k): sum of updates for key k
KKsumkhT jj
j /11/)]([
median
[Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]APIs:
+ =
S=COMBINE(,S1,,S2):
![Page 7: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/7.jpg)
??
• Main problem– Cannot efficiently report keys with heavy change
INFERENCE(S,t)– Important function for anomaly detection!
• Our Contribution– Determine set of keys that have “large” estimates in
a sketch
Reverse Sketch Problem
![Page 8: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/8.jpg)
Reversible sketch framework
Streamingdatarecording
reversiblek-ary
sketch
value storedvalue
Modularhashing
IP manglingkey
Heavychangedetection reversible
k-ary sketch
Reversehashing
ReverseIP mangling
heavychangekeys
changethreshold
![Page 9: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/9.jpg)
Outline
• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion
![Page 10: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/10.jpg)
• Intersect A1, A2, A3, A4, A5
Taking Intersections
H = 5 K = 212 #keys = 232 (IP addresses)
E[false positives] << 1
![Page 11: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/11.jpg)
The problem with simple intersection
• Each set Ai can be very large !H = 5 K = 212 #keys = 232 (IP addresses)
|A1| = 232 / 212 = 220
![Page 12: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/12.jpg)
The problem with simple intersection
• Each set Ai can be very large !
• Solution:
Modular hashing
![Page 13: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/13.jpg)
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
010 110 001 101
h()
12 bits
![Page 14: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/14.jpg)
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
![Page 15: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/15.jpg)
Modular hashing reduces the set size
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25 Intersection:Only 32 elements per word set
![Page 16: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/16.jpg)
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25
Intersection:
Modular hashing reduces the set size
![Page 17: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/17.jpg)
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
![Page 18: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/18.jpg)
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
IP Mangling with GF (Galois Extension Field)Solution:
IP Mangling: a bijective mapping function for breaking the key space continuity
![Page 19: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/19.jpg)
Outline
• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion
![Page 20: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/20.jpg)
Handling Multiple Intersections…
1
2
3
5
4
b1
b2
b4
b5
b3b3
b1
b2
b4
b5
2H different intersections
Much more difficult – Solution: Reverse Hashing algorithms• Step 1: Reverse hashing for each module• Step 2: Infer the whole key through bucket index matching among candidates from each module
![Page 21: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/21.jpg)
Reverse Hashing for Each Module
123
54
H=5, r=1, K=212
r tolerance level
412
312
212
11212 AAAAA 32w
ijA
}5,2{}3,2{112
111
11 AAG
i
ir GI 11
candidate set of the first word in Hash table i
All possible values of the first word in the sketch
1iG
Take the first word as an example
}3,2{}3,0{132
131
13 AAG
}10,9{}6,2{122
121
12 AAG
}8,2{}10,3{142
141
14 AAG
}9,6{}7,3{152
151
15 AAG
{ 2,3,5}{ 2,
6,9,10}{0,2,3}{ 2,3,8,10}{ 3,6,7,9}
{2}{2,3}
![Page 22: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/22.jpg)
Bucket Index Matrix of Candidates
H=5, r=1, K=212 For each x in I1, we can get B1(x), a vector of the heavy bucket sets which x hashes to.
192.168.0.1
123
54
b11
b21
b42
b51
b32b31
b12
b22
b41
b52
123
54
b11
b21
b42
b51
b32b31
b12
b22
b41
b52
192.123.47.62
123
54
b11
b21
b42
b51
b32b31
b12
b22
b41
b52
192.*.*.* hash to the red heavy buckets
5251
4241
32
21
1211
1
,,
,
)192(
bbbb
bb
bb
B
![Page 23: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/23.jpg)
Prefix Extension Algorithm
I1 I2B1 B2
150
47
236
36,3,19,4,1
15,2
41153
31
5,27,3,2
2
72
104
8,7,35
9,45,12,1
9,312
6,22,1+ =
<150.72>
}8,7,3{}3{}5{}6,3,1{
}9,4{}9,4,1{}5,1{}1{
}2,1{}5,2{
3*
9,412
<47.72>
***5*
* more than r=1Ignore!
<236.104>
31222
Ignore!
Path discovery algorithm
![Page 24: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/24.jpg)
<150.72>
3*
9,412
<236.104>
31222
+ =
<150.72.182>
3*412
<236.104.49>
31222
<150.72.32>
3*912
182
32
49
31
4,31
2,1
37,1
912
312
6,22
I3 B3
+ =75
9,5,314
2,12
I4 B4
3*412
<150.72.182.75>
31*22
<236.104.49.75>
Prefix Extension Algorithm
![Page 25: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/25.jpg)
Recap:
Streamingdatarecording
reversiblek-ary
sketch
value storedvalue
Modularhashing
IP manglingkey
Heavychangedetection reversible
k-ary sketch
Reversehashing
ReverseIP mangling
heavychangekeys
changethreshold
)( loglog/1 nn
)loglog
log(n
n
n is the size of key space
![Page 26: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/26.jpg)
Outline
• Background on k-ary sketch• Reversible sketch problem• Modular hashing• IP mangling • Reverse hashing• Evaluation• Conclusion
![Page 27: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/27.jpg)
Evaluation
• Dataset– A large US ISP (330M Netflow records)– NU (19M Netflow records)
• Efficient data recordingFor the worst case traffic, all 40-byte packets– Software: 526Mbps on P4 3.2Ghz PC– Hardware: 16Gbps on a single FPGA broad– Only a few hundred KB to a couple of MB memory used– Only 15 memory access per packet for 48 bit reversible s
ketches and 16 per packet for 64 bit reversible sketches• Efficient heavy change detection and key inference
– 0.34 seconds for 100 changes. 13.33 seconds for 1000 change
![Page 28: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/28.jpg)
Key Inference Accuracy• True positives and false positives of 16bit
reversible sketches for 32bit IP addresses
88
92
96
100
20001600120085045050
0.040.060.080.120.252.40
True
Pos
itive
Per
cent
age
Number of heavy changes
H=6, r=1H=6, r=2H=5, r=1Deltoids
0.2
0.6
1
20001600120085045050
0.040.060.080.120.252.40
Fals
e P
ositi
ve P
erce
ntag
e
Number of heavy changes
H=6, r=1H=6, r=2H=5, r=1
Deltoid
[Deltoids]: S.Muthukrishnan and Graham Cormode, What's New: Find Significant Differences in Network Data Streams. Infocom 2004
![Page 29: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/29.jpg)
• Stress test with larger dataset still accurate• Scalable to larger key space size: similar res
ults for 64bit IP pairs• Built anomaly/intrusion detection system to d
etect, e.g., SYN flooding and port scans [ICDCS 2006]
More Results
![Page 30: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/30.jpg)
Conclusions
Proposed the first reversible sketches which• Record high speed network streams online• Detect the heavy changes and infer the
keys online• Small memory usage, small # of memory
access per packet• Scalable to large key space size
![Page 31: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/31.jpg)
Backup Slides
![Page 32: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/32.jpg)
Related work
• Compare with [deltoids]– Accuracy better– Scalable to large key space better– # of Memory access less
• [PCF, IMC2004]: not reversible• [Q. Zhao et al, IMC2005] [S.Venkataraman,
NDSS2005]: unique fan-out (fan-in) estimation.
![Page 33: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/33.jpg)
Modular Hashing
Optimal Hashing
![Page 34: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/34.jpg)
However… Not reversibleLack of an inference API: INFERENCE(S,t)• Important function for anomaly detection!• Decouple the recording stage of sketches from the detection stage to enable efficient combine and inference.• Given a threshold t, report keys whose corresponding sum of updates are larger than the threshold.Our contribution: an efficient algorithm for inference
Reversible sketch problem
![Page 35: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/35.jpg)
??
![Page 36: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/36.jpg)
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
IP Mangling with
Solution:
![Page 37: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/37.jpg)
IP-mangling
• Use GF (Galois Extension Field) function for attack resilience
![Page 38: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/38.jpg)
Modular Hashing
Modular Hashing with IP Mangling Optimal Hashing
![Page 39: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/39.jpg)
Reverse Hashing for Each Module
123
54
b11
b21
b42
b51
b32b31
b12
b22
b41
b52
H=5, r=1, K=212
411
311
211
11111 AAAAA 4
12312
212
11212 AAAAA 32w
ijA
{*}112
111
11 AAG
{*}122
121
12 AAG
{*}132
131
13 AAG
{*}152
151
15 AAG
{*}142
141
14 AAG
s}hash table r)-(Hleast at in bucketsheavy tomapped is |{ 111 vGvGIi
ii
ir
all possible value of the first word for the No. j heavy bucket in Hash table i
all possible value of the first word in Hash table i
All possible value of the first word in the sketch
1ijA
1iG
Take the first word as an example
![Page 40: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/40.jpg)
False positive reduction by original sketch verifying
<150.72.182.75>
Estimate(<150.72.182.75>, 180)
Threshold150
(<150.72.182.75>, 180)
Final result
Verified original k-ary sketch
![Page 41: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications](https://reader033.fdocuments.in/reader033/viewer/2022051623/56815b1f550346895dc8d440/html5/thumbnails/41.jpg)
K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]
• first to detect flow-level heavy changes in massive data streams at network traffic speeds• APIs
– UPDATE(S,k,u): Tj [ hj(k)] += u (for all j)– ESTIMATE(S, k): sum of updates for key k– Linear combination: S=COMBINE(,S1,,S2)
+ =