Rafid Ahmed - db.in.tum.dedb.in.tum.de/.../seminarHauptspeicherdbs/slides/04-cuckoo-hashing.pdf ·...
Transcript of Rafid Ahmed - db.in.tum.dedb.in.tum.de/.../seminarHauptspeicherdbs/slides/04-cuckoo-hashing.pdf ·...
Fakultät für InformatikTechnische Universität München
SIMD-Accelerated Hash Tables: Cuckoo Hashing
Rafid Ahmed
Seminar: Techniques for implementing main memory database
Fakultät für InformatikTechnische Universität München
Cuckoo Hashing[1] :
4 4
72
16 19
72
2
16
16
45
68
53
T1 T2
alternate position for older key is vacant
if not the procedure recurses until an empty slot is found or enters a cycle
Two locations must be checked for
lookup
Two locations must be checked for
deletion
Inserting a new key may push an
older key to another position:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 1
--
Fakultät für InformatikTechnische Universität München
Worst-Case Time Complexity of Hash Tables:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 2
LinearProbingOperations Double
HashingRandomHashing
SeparateChaining
CuckooHashing
Search
Insertion
Deletion
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(n)
O(1)
O(1)
O(n)
Fakultät für InformatikTechnische Universität München
SIMD (Single Instruction Multiple Data):
+ +
= =
3 15 25 1 33 46 89 6
23 3 16 77 45 3 7 12
26 18 41 78 78 49 96 18
+
=
6
12
18
Scalar operation SIMD operation
Input vecotr 1
O(f(n)) O(f(n)/W)
Input vector 2
Output vector
SIMD Register
SIMD-Accelerated Hash Tables: Cuckoo Hashing 3
Fakultät für InformatikTechnische Universität München
Fundamental Vector Operations:
2 9 0 5 11 10 3 9
54 29 19 67 93 71 23 29
memory
index vector
output vector
547419 23 11 67 99 77 21 71 9329
Gather:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 4
__m256i _mm256_i32gather_epi64 (__int64 const* base_addr, __m128i vindex, const int scale)
Fakultät für InformatikTechnische Universität München
Fundamental Vector Operations:
Selective Gather:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 5
__m256i _mm256_mask_i32gather_epi64 (__m256i src, __int64 const* base_addr, __m128i vindex, __m256i mask, const int scale)
2 9 0 5 11 10 3 9
63 29 19 59 93 13 23 29
63 34 45 59 78 13 89 19
memory
index vector
output vector
source vector
mask
547419 23 11 67 99 77 21 71 9329
0 1 1 0 1 0 1 1
3 15 25 1 33 46 89 6
memory
vector
mask
15 25 33 89 6
0 1 1 0 1 0 1 1
Fakultät für InformatikTechnische Universität München
Fundamental Vector Operations:
Selective Store :
SIMD-Accelerated Hash Tables: Cuckoo Hashing 6
__m256i _mm256_permutevar8x32_epi32 (__m256i a, __m256i idx)
void _mm256_maskstore_epi32 (int* mem_addr, __m256i mask, __m256i a)
Fakultät für InformatikTechnische Universität München
Cuckoo Hash tables:
Scalar Hash tables :k
k1p1
p2k2
h1 h2
T1 T2
Branching
Branchless [2]
SIMD-Accelerated Hash Tables: Cuckoo Hashing 7
- conditionally accesssecond location
- access both locations
Fakultät für InformatikTechnische Universität München
Cuckoo Hash tables:
k
k1k3p1p3
k2 k4p2 p4
h1 h2
T1 T2
Vector Hash tables (Horizontal)[3] :
SIMD-Accelerated Hash Tables: Cuckoo Hashing 8
- Hash buckets
- Building
- Contiguous Keys
- Contiguous Payloads
- loop unrolling
- Probing techniques
- SIMD masking
Fakultät für InformatikTechnische Universität München
Cuckoo Hash tables:
k3
k2
k7
k9
p3
k1
k5
k4
k6
p1p6
p5
p4
p9
p2
p7
h1 h2
k1
k3
k4
k7
T1 T2
Vector Hash tables (Vertical)[4] :
SIMD-Accelerated Hash Tables: Cuckoo Hashing 9
Blending
Selective
-
-gather all bucketsgenerate masks fromcomparing keys
-
-
gather �rst bucketper key
selectively gather secondbucket for keys that did notmatch
j 0for i 0 to |S| - 1 step W do k Skeys[i] v Spayloads[i] h1 (k f1) >> |T| h2 (k f2) >> |T| kT Tkeys[h1] vT Tpayloads[h1] m k = kT
kT m Tkeys[h2] vT m Tpayloads[h2] m k = kT
RSkeys[j] m k RSS_payloads[j] m v RSR_payloads[j] m vT
j j + |m|end for
Fakultät für InformatikTechnische Universität München
Vertical (Selective) Cuckoo Hashing - Probing[4] :
SIMD-Accelerated Hash Tables: Cuckoo Hashing 10
load input tuples
calculate 1st hash value
calculate 2nd hash value
gather 1st hash bucket
compare hash bucket key with probe key
selectively gather 2nd hash bucket
again compare hash bucket key with probe key
selectively store matches
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (30% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 11
0
1
2
3
4
5
6
7
8
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (30% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 12
0
1
2
3
4
5
6
7
8
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (30% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 13
0
1
2
3
4
5
6
7
8
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal) Vector (Blending_4)
Vector (Blending_8)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (30% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 14
0
1
2
3
4
5
6
7
8
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal) Vector (Blending_4)
Vector (Blending_8) Vector (Selec�ve_4) Vector (Selec�ve_8)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (90% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 15
0.
2
4.
6
8
10
12
14
16
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (90% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 16
0.
2
4.
6
8
10
12
14
16
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (90% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 17
0.
2
4.
6
8
10
12
14
16
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal) Vector (Blending_4)
Vector (Blending_8)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Results:
Cuckoo Hashing - Probing (90% Selectivity) Benchmark:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 18
0.
2
4.
6
8
10
12
14
16
4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB 64 MB
Thro
ughp
ut (G
B/s)
Hash table size
Scalar (Branching) Scalar (Branchless) Vector (Horizontal) Vector (Blending_4)
Vector (Blending_8) Vector (Selec�ve_4) Vector (Selec�ve_8)
Run on (4 X 3500 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Uni�ed 256K (x4) L3 Uni�ed 6144K (x1)
Fakultät für InformatikTechnische Universität München
Summary:
SIMD-Accelerated Hash Tables: Cuckoo Hashing 19
For both selectivity rate, vertically vectorized code performs about 1.8x fasterthan horizontal vector and scalar implementation
-
Fakultät für InformatikTechnische Universität München
Sources:
O. Polychroniou, A. Raghavan, K. A. Ross. Rethinking SIMD Vectorization forIn-Memory Databases
K. A. Ross. E�cient hash probes on modern processors.
M. Zukowski et al. Architecture-conscious hashing.
R. Pagh et al. Cuckoo hashing.[1]
[2]
[3]
[4]
SIMD-Accelerated Hash Tables: Cuckoo Hashing 20