Write-Optimized Dynamic Hashing for Persistent Memory...Write-Optimized Dynamic Hashing for...
Transcript of Write-Optimized Dynamic Hashing for Persistent Memory...Write-Optimized Dynamic Hashing for...
Write-Optimized Dynamic Hashing for Persistent Memory
π΄πππππππ π΅ππβΈΈ, π»ππππ’π πΆβπβΈΉ, πππ’πποΌππ πΆβππβΈΈ, πππ π».ππββΈΈ, π΅ππππ πππ πππβΈΉ
UNIST (Ulsan National Institute of Science and Technology)βΈΈ
Sungkyunkwan University (ππΎπΎπ)βΈΉ
1
Backgroundβ’ Static Hashing
β’ Extendible Hashing
β’ Persistent Memory
Cacheline-Conscious Extendible Hashingβ’ Challenges and Contributions
β’ 3-Level Structure of CCEH
β’ Failure-atomic Directory Update
Evaluation
Conclusion
2
Outline
β’ Hash key collision β Full table rehashingβ’ The most expensive operation in hash table
Background:
Static Hashing
0011
&val3
H(key) =
key % size
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
1001 &val9
1011 &val11
0100 &val4
1101 &val13
0111 &val7
3
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
0011 &val3
0100 &val4
0111 &val7
[8]
[9]
[A]
[B]
[C]
[D]
[E]
[F]
1001 &val9
1001 &val11
1101 &val13
Rehash
Hash Collision
Linear Probing
β’ To avoid full table rehashing: β’ Linear probingβ’ Chainingβ’ Double hashing such as Cuckoo hashing
Background:
Static Hashing
0011
&val3
H(key) =
key % size
4
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
1001 &val9
1011 &val11
0100 &val4
1101 &val13
0011 &val3
0111 &val7
Chaining
β’ To avoid full table rehashing: β’ Linear probingβ’ Chainingβ’ Double hashing such as Cuckoo hashing
Background:
Static Hashing
0011
&val3
H(key) =
key % size
5
0011 &val3
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
1001 &val9
1011 &val11
0100 &val4
1101 &val13
0111 &val7
β’ To avoid full table rehashing:β’ Linear probingβ’ Chainingβ’ Double hashing such as Cuckoo hashing
Background:
Static Hashing
0011
&val3
H(key) =
key % size
6
Cuckoo Displacement
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
1001 &val9
0011 &val3
0100 &val4
1101 &val13
1011 &val11
0111 &val7
Cuckoo Hashing
7
Insertion Latency CDF
Flat Horizontal Lines
Because of the expensive full-table rehashing
0001 &val1
0111 &val7
1111 &val15
0101 &val5
β’ Dynamically splits one bucket or merges two buckets at a time
0010 &val2
0110 &val6
002 012 102 112
Background:
Disk-based Extendible Hashing
8
0000 &val0
1000 &val8
Directory
(G=2)
Buckets
L=2
Hash Function: H(key) = key % 2G
L=2 L=1
0001 &val1
0111 &val7
1111 &val15
0101 &val5
0010 &val2
0110 &val6
002 012 102 112
Background:
Extendible Hashing β Insertion
0011
&val3
H(key) =
key % 2G
9
0000 &val0
1000 &val8Key
Value
Directory
(G=2)
Buckets
Hash Collision
L=2 L=1 L=2
0001 &val1
0101 &val5
0111 &val7
1111 &val15
0001 &val1
0111 &val7
1111 &val15
0101 &val5
β’ Only overflown bucket is modified
0010 &val2
0110 &val6
002 012 102 112
Background:
Extendible Hashing β Bucket Split
0011
&val3
H(key) =
key % 2G
10
0000 &val0
1000 &val8Key
Value
Directory
(G=2)
Buckets
Bucket SplitL=2 L=1 L=2
0010 &val2
0110 &val6
0001 &val1
0101 &val5
0111 &val7
1111 &val15
002 012 102 112
Background:
Extendible Hashing β Bucket Split
0011
&val3
H(key) =
key % 2G
11
0000 &val0
1000 &val8
0011 &val3
Key
Value
Directory
(G=2)
Buckets
Update Directory
β’ Update Directoryβ’ At least two pointers need to be updated
012 112
L=2 L=2
L=2 L=2
0111 &val7
1011 &val11
1111 &val15
0011 &val3
002 012 102 112
Background:
Extendible Hashing β Directory doubling
β’ If a single pointer points to overflow bucketβ Directory Doubling
11011
&val27
H(key) =
key % 2G
12
0000 &val0
1000 &val8
0001 &val1
0101 &val5Key
Value
Buckets
Directory
(G=2)
Hash Collision
L=1 L=2 L=2
1011 &val11
0011 &val3
0111 &val7
1111 &val15
0111 &val7
1011 &val11
1111 &val15
0011 &val3
002 012 102 112
Background:
Extendible Hashing β Directory doubling
β’ If a single pointer points to overflow bucketβ Directory Doubling
11011
&val27
H(key) =
key % 2G
13
0000 &val0
1000 &val8
0001 &val1
0101 &val5Key
Value
Buckets
Directory
(G=2)
Bucket Split
L=1 L=2
L=2
β’ If a single pointer points to overflow bucketβ Directory Doubling
0111 &val7
1011 &val11
1111 &val15
0011 &val3
1011 &val11
0011 &val3
002 012 102 112002 012 102 1120002 0012 0102 0112 1002 1012 1102 1112
Background:
Extendible Hashing β Directory doubling
11011
&val27
H(key) =
key % 2G
14
0000 &val0
1000 &val8
0001 &val1
0101 &val5Key
Value
Buckets
Directory
(G=2)
Directory
(G=3)
Directory Doubling
0111 &val7
1111 &val15
L=1 L=2
L=3 L=3
β’ If a single pointer points to overflow bucketβ Directory Doubling
1011 &val11
0011 &val3
Directory
(G=3)0002 0012 0102 0112 1002 1012 1102 1112
Background:
Extendible Hashing β Directory doubling
11011
&val27
H(key) =
key % 2G
15
0000 &val0
1000 &val8
0001 &val1
0101 &val5
11011 &val27
Key
Value
Buckets
Update Directory
0111 &val7
1111 &val15
L=1 L=2
L=3 L=3
Persistent Memory
Characteristicsβ’ High performance β Comparable to DRAM
β’ Byte-addressability β As DRAM
β’ Persistence β As storage devices (HDD/SSD)
Challengesβ’ Atomic unit of writes β 8-bytes
β’ Data transfer unit between CPU cache and PM β 64 byte cacheline
β’ Order of memory writes is not guaranteed
16
Backgroundβ’ Static Hashing
β’ Extendible Hashing
β’ Persistent Memory
Cacheline-Conscious Extendible Hashingβ’ Challenges and Contributions
β’ 3-Level Structure of CCEH
β’ Failure-atomic Directory Update
Evaluation
Conclusion
17
Outline
Challenge in In-Memory Extendible Hashing
18
Problems
- Page-sized bucket β 64 cacheline accesses per bucket
Directory
(G=2)
Bucket
002 012 102 112
0000 &val0
1000 &val8
β¦ β¦
β¦ β¦
β¦ β¦
β¦ β¦
0110 &val6
0010 &val2
β¦
Cacheline 1
Cacheline 2
Cacheline 64
Cacheline 3
Cacheline 1
Cacheline 2
Cacheline 64
Cacheline 3
Cacheline 1
Cacheline 2
Cacheline 64
Cacheline 3
Directory
(G=20)
Challenge in In-Memory Extendible Hashing
19
00..00Buckets
Problems
Cacheline-sized small bucket β a large directory (8 byte pointer per cacheline)
β¦
Cacheline-Size
β¦β¦
β¦ β¦ β¦
β¦ β¦ β¦β¦ β¦
Directory
(G=2)0002 0012 0102 0112
1001 &val11
0001 &val3
0111 &val7
1111 &val15
Challenge in Extendible Hashing on PM
20
0000 &val0
1000 &val8
0010 &val1
0110 &val5
11001 &val27
Buckets
Problems
Split operation updates multiple pointers β Not Failure-Atomic
3-Level Structure β Introduces an intermediate level, Segment
β Lookup via only two cacheline accesses
Failure-atomic Directory Updatesβ Introduces the split buddy tree to manage split history
Failure-atomic Segment Split β Lazy deletion scheme to minimize dirty writes
21
Contributions
β’ A group of multiple cacheline-sized buckets = Segment
22
Segment: Intemediate Level
Directory
(G=20)
00..00Buckets β¦
Cacheline-Size
β¦β¦
β¦ β¦β¦ β¦
β¦ β¦ β¦β¦ β¦
3-Level StructureDirectory β Segment β Cacheline-sized Bucket
23
Segment: Intemediate Level
Directory
(G=2)
00..00Buckets β¦
Cacheline-Size
β¦β¦
β¦ β¦
Using intermediate level βSegmentβ,
CCEH reduces directory size while keeping bucket size small
β¦ β¦ β¦β¦ β¦
Q: With large segments, how can we minimize cacheline accesses?
24
Minimize Cacheline Accesses in Segment
002 012 102 112Directory
(G=2)
1000 &val8 Bucket 002
1001 &val9 Bucket 012
1010 &val10 Bucket 102
1011 &val11 Bucket 112
Segment 102
1100 &val12 Bucket 002
1101 &val13 Bucket 012
1110 &val14 Bucket 102
1111 &val15 Bucket 112
Segment 112
0000 &val0 Bucket 002
0001 &val1 Bucket 012
0010 &val2 Bucket 102
0011 &val3 Bucket 112
Segment 002
10012
Hash Key
Q: With large segments, how can we minimize cacheline accesses?
25
Minimize Cacheline Accesses in Segment
002 012 102 112Directory
(G=2)
1000 &val8 Bucket 002
1001 &val9 Bucket 012
1010 &val10 Bucket 102
1011 &val11 Bucket 112
Segment 102
1100 &val12 Bucket 002
1101 &val13 Bucket 012
1110 &val14 Bucket 102
1111 &val15 Bucket 112
Segment 112
0000 &val0 Bucket 002
0001 &val1 Bucket 012
0010 &val2 Bucket 102
0011 &val3 Bucket 112
Segment 002
10 012
Hash Key
26
Minimize Cacheline Accesses in Segment
002 012 102 112Directory
(G=2)
1000 &val8 Bucket 002
1001 &val9 Bucket 012
1010 &val10 Bucket 102
1011 &val11 Bucket 112
Segment 102
1100 &val12 Bucket 002
1101 &val13 Bucket 012
1110 &val14 Bucket 102
1111 &val15 Bucket 112
Segment 112
0000 &val0 Bucket 002
0001 &val1 Bucket 012
0010 &val2 Bucket 102
0011 &val3 Bucket 112
Segment 002
10 012
Hash Key
Segment Index
Bucket Index
Q: With large segments, how can we minimize cacheline accesses?
27
Minimize Cacheline Accesses in Segment
002 012 102 112Directory
(G=2)10 012
Hash Key
1000 &val8 Bucket 002
1001 &val9 Bucket 012
1010 &val10 Bucket 102
1011 &val11 Bucket 112
Segment 012
1100 &val12 Bucket 002
1101 &val13 Bucket 012
1110 &val14 Bucket 102
1111 &val15 Bucket 112
Segment 112
0000 &val0 Bucket 002
0001 &val1 Bucket 012
0010 &val2 Bucket 102
0011 &val3 Bucket 112
Segment 002Use hash key as index for both directory and segment
β No need to access irrelevant buckets
3-Level Structure β Introduces an intermediate level, Segment
β Lookup via only two cacheline accesses
Failure-atomic Directory Updatesβ Introduces the split buddy tree to manage split history
Failure-atomic Segment Split β Lazy deletion scheme to minimize dirty writes
28
Contributions
(G=3)Directory
L=1
L=1
L=1
L=1
L=2
L=2
L=2
L=2
L=2
L=2
L=3
L=3
29
S1 S1
S2
S1
S2
S3 S3
S4
Depth: 0 1 2 3
Global Depth:
S1
S1
S1
S2
S3
S4
S2
SPLITS5
(1) Update a directory entry
(2) Update a directory entry
(3) Update local depth of S1
S5
S5
S1
Using MSB segment index, split segments are pointed by adjacent directory entries
Recovery: Split History Buddy Tree in CCEH
L=1L=2
(G=3)Directory
L=1
L=1
L=1
L=2
L=2
L=3
L=3
S1S5
S1
Suppose a system crashes while S5 splits
30
Recovery: Split History Buddy Tree in CCEH
S1 S1
S2
S1
S2
S2
S3
S1
S3
S4
S1
S2
S3
S4
S5
Depth: 0 1 2 3
Global Depth:
S2
L=2
L=1
L=1
L=1
L=2
L=2
L=3
L=3
(G=3)Directory
S1S5
S1
31
S1 S1
S2
S1
S2
S2
S3
S1
S3
S4
S1
S2
S3
S4
S5
Depth: 0 1 2 3
Global Depth:
S2
πΊππππ π = ππβπ
β’ Each segment must be pointed by ππβπ directory entries
β’ If not, rollback the split
Global Depth G = 3
Local Depth L = 1
Stride = 4
Recovery: Split History Buddy Tree in CCEH
L=2L=1
L=1
L=1
L=1
L=2
L=2
L=3
L=3
(G=3)Directory
S1S5
S1
32
S1 S1
S2
S1
S2
S2
S3
S1
S3
S4
S1
S2
S3
S4
S5
Depth: 0 1 2 3
Global Depth:
S2
S1
πΊππππ π = ππβπ
Global Depth G = 3
Local Depth L = 1
Stride = 4
Recovery: Split History Buddy Tree in CCEH
β’ Each segment must be pointed by ππβπ directory entries
β’ If not, rollback the split
3-Level Structure β Introduces an intermediate level, Segment
β Lookup via only two cacheline accesses
Failure-atomic Directory Updatesβ Introduces the split buddy tree to manage split history
Failure-atomic Segment Split β Lazy deletion scheme to minimize dirty writes
33
Contributions
002 012 102 112
Segment Split: Legacy CoW
Directory
34
Copy-on-Write Split
β Lock-Free Search is enabled 002 012
Local Depth = 1
002 01000 &val8
012 00001 &val1
102 01010 &val10
112 00011 &val3Segment 02
Local Depth = 2
002
012
102
112
Segment 002
Local Depth = 2
002
012
102
112
Segment 012
01000 &val8
00001 &val1
01010 &val10
00011 &val3
Local Depth = 2
002
012
102
112
Segment 012
Local Depth = 1
002 01000 &val8
012 00001 &val1
102 01010 &val10
112 00011 &val3Segment 02
&val801000
01010 &val10
002 012 102 112
Segment Split: Lazy Deletion
Directory
35
Lazy Deletion
β Minimizes dirty writes002 012
Single
Cacheline-flush
invalidates all the
migrated data
Segment 002
Local Depth = 2
002 01000 &val8
012 00001 &val1
102 01010 &val10
112 00011 &val3
01000 &val8
01010 &val10
Backgroundβ’ Static Hashing
β’ Extendible Hashing
β’ Persistent Memory
Cacheline-Conscious Extendible Hashingβ’ Challenges and Contributions
β’ 3-Level Structure of CCEH
β’ Failure-atomic Directory Update
Evaluation
Conclusion
36
Outline
Experimental Setup
37
64GB of DDR3 DRAM
2x Intel Xeon Haswell-Ex E7-4809 v3β 8 cores, 2.0 GHz
β 20MB L3 cache
Quartz: A DRAM-based PM latency emulator
* To emulate write latency, we inject stall cycle after
each clflush instructions
CPU
Memory
PM
Workload 160 Million random number dataset
CCEH VS Legacy Extendible Hash
38
0
200
400
600
800
1000
1200
1400
256B 1KB 4KB 16KB 64KB 256KB
Th
rou
gh
pu
t (O
ps/m
sec)
Segment Size for CCEH/Bucket Size for Extendible Hash
0
0.5
1
1.5
2
2.5
3
3.5
4
256B 1KB 4KB 16KB 64KB 256KB
Segment Size for CCEH/Bucket Size for Extendible Hash
Avg
. # o
f clf
lush
CCEH Extendible Hashing
CCEH compared to legacy Extendible Hashing
Cons: Low utilization and more cacheline flushes due to hash collisions
Pros: Constant number of cacheline accesses with varying directory size
Insertion Performance Breakdown
39
0
0.5
1
1.5
2
2.5
3
3.5
4
CCEH CCEH(C) CUCK LEVL PATH
0
0.5
1
1.5
2
2.5
3
3.5
4
CCEH CCEH(C) CUCK LEVL PATH
Write Rehash Cuckoo Displacement
R:120ns/W:120ns
(DRAM)
R:240ns/W:700ns
Avg
. E
xec.
Tim
e (
usec)
CCEH is 70% faster than Level Hashing (OSDIβ18) on PM
Fewer # of cacheline accesses
Lazy deletion β Efficient segment split
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700 800 900
40
Load Factor
Number of Records (k)
Lo
ad
Facto
r (%
)CCEH (K=4) CCEH (K=64) Level Hashing
Level Hashing
Suffer from full table rehashing like the other static hashing
CCEH
Dynamic allocation of small segments β Smooth curves
CCEH (Optimization)
K = Linear probing distance in Segment
Cacheline-Conscious Extendible Hashing (CCEH) β’ 3-Level Structure
β’ Introduced an intermediate level, Segment
β’ Constant Lookup: Only two cacheline accesses β Write-Optimal
β’ Failure-Atomic Write-Optimal Lazy Deletion β Minimize I/O
β’ Failure-Atomic Directory Updates β Log-less directory update
Disk-based hashing needs to be modified for PM to make effective use of cachelines.
Source Codes: http://github.com/DICL/CCEH
Conclusion
41
Question?