Write-Optimized Dynamic Hashing for Persistent Memory...Write-Optimized Dynamic Hashing for...

Write-Optimized Dynamic Hashing for Persistent Memory

𝑴𝒐𝒐𝒉𝒚𝒆𝒐𝒏 𝑵𝒂𝒎⸸, 𝐻𝑜𝑘𝑒𝑢𝑛 𝐶ℎ𝑎⸹, 𝑌𝑜𝑢𝑛𝑔－𝑟𝑖 𝐶ℎ𝑜𝑖⸸, 𝑆𝑎𝑚 𝐻.𝑁𝑜ℎ⸸, 𝐵𝑒𝑜𝑚𝑠𝑒𝑜𝑘 𝑁𝑎𝑚⸹

UNIST (Ulsan National Institute of Science and Technology)⸸

Sungkyunkwan University (𝑆𝐾𝐾𝑈)⸹

1

Background• Static Hashing

• Extendible Hashing

• Persistent Memory

Cacheline-Conscious Extendible Hashing• Challenges and Contributions

• 3-Level Structure of CCEH

• Failure-atomic Directory Update

Evaluation

Conclusion

2

Outline

• Hash key collision → Full table rehashing• The most expensive operation in hash table

Background:

Static Hashing

0011

&val3

H(key) =

key % size

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

1001 &val9

1011 &val11

0100 &val4

1101 &val13

0111 &val7

3

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

0011 &val3

0100 &val4

0111 &val7

[8]

[9]

[A]

[B]

[C]

[D]

[E]

[F]

1001 &val9

1001 &val11

1101 &val13

Rehash

Hash Collision

Linear Probing

• To avoid full table rehashing: • Linear probing• Chaining• Double hashing such as Cuckoo hashing

Background:

Static Hashing

0011

&val3

H(key) =

key % size

4

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

1001 &val9

1011 &val11

0100 &val4

1101 &val13

0011 &val3

0111 &val7

Chaining

• To avoid full table rehashing: • Linear probing• Chaining• Double hashing such as Cuckoo hashing

Background:

Static Hashing

0011

&val3

H(key) =

key % size

5

0011 &val3

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

1001 &val9

1011 &val11

0100 &val4

1101 &val13

0111 &val7

• To avoid full table rehashing:• Linear probing• Chaining• Double hashing such as Cuckoo hashing

Background:

Static Hashing

0011

&val3

H(key) =

key % size

6

Cuckoo Displacement

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

1001 &val9

0011 &val3

0100 &val4

1101 &val13

1011 &val11

0111 &val7

Cuckoo Hashing

7

Insertion Latency CDF

Flat Horizontal Lines

Because of the expensive full-table rehashing

0001 &val1

0111 &val7

1111 &val15

0101 &val5

• Dynamically splits one bucket or merges two buckets at a time

0010 &val2

0110 &val6

002 012 102 112

Background:

Disk-based Extendible Hashing

8

0000 &val0

1000 &val8

Directory

(G=2)

Buckets

L=2

Hash Function: H(key) = key % 2G

L=2 L=1

0001 &val1

0111 &val7

1111 &val15

0101 &val5

0010 &val2

0110 &val6

002 012 102 112

Background:

Extendible Hashing – Insertion

0011

&val3

H(key) =

key % 2G

9

0000 &val0

1000 &val8Key

Value

Directory

(G=2)

Buckets

Hash Collision

L=2 L=1 L=2

0001 &val1

0101 &val5

0111 &val7

1111 &val15

0001 &val1

0111 &val7

1111 &val15

0101 &val5

• Only overflown bucket is modified

0010 &val2

0110 &val6

002 012 102 112

Background:

Extendible Hashing – Bucket Split

0011

&val3

H(key) =

key % 2G

10

0000 &val0

1000 &val8Key

Value

Directory

(G=2)

Buckets

Bucket SplitL=2 L=1 L=2

0010 &val2

0110 &val6

0001 &val1

0101 &val5

0111 &val7

1111 &val15

002 012 102 112

Background:

Extendible Hashing – Bucket Split

0011

&val3

H(key) =

key % 2G

11

0000 &val0

1000 &val8

0011 &val3

Key

Value

Directory

(G=2)

Buckets

Update Directory

• Update Directory• At least two pointers need to be updated

012 112

L=2 L=2

L=2 L=2

0111 &val7

1011 &val11

1111 &val15

0011 &val3

002 012 102 112

Background:

Extendible Hashing – Directory doubling

• If a single pointer points to overflow bucket→ Directory Doubling

11011

&val27

H(key) =

key % 2G

12

0000 &val0

1000 &val8

0001 &val1

0101 &val5Key

Value

Buckets

Directory

(G=2)

Hash Collision

L=1 L=2 L=2

1011 &val11

0011 &val3

0111 &val7

1111 &val15

0111 &val7

1011 &val11

1111 &val15

0011 &val3

002 012 102 112

Background:



11011

&val27

H(key) =

key % 2G

13

0000 &val0

1000 &val8

0001 &val1

0101 &val5Key

Value

Buckets

Directory

(G=2)

Bucket Split

L=1 L=2

L=2


0111 &val7

1011 &val11

1111 &val15

0011 &val3

1011 &val11

0011 &val3

002 012 102 112002 012 102 1120002 0012 0102 0112 1002 1012 1102 1112

Background:


11011

&val27

H(key) =

key % 2G

14

0000 &val0

1000 &val8

0001 &val1

0101 &val5Key

Value

Buckets

Directory

(G=2)

Directory

(G=3)

Directory Doubling

0111 &val7

1111 &val15

L=1 L=2

L=3 L=3


1011 &val11

0011 &val3

Directory

(G=3)0002 0012 0102 0112 1002 1012 1102 1112

Background:


11011

&val27

H(key) =

key % 2G

15

0000 &val0

1000 &val8

0001 &val1

0101 &val5

11011 &val27

Key

Value

Buckets

Update Directory

0111 &val7

1111 &val15

L=1 L=2

L=3 L=3

Persistent Memory

Characteristics• High performance – Comparable to DRAM

• Byte-addressability – As DRAM

• Persistence – As storage devices (HDD/SSD)

Challenges• Atomic unit of writes → 8-bytes

• Data transfer unit between CPU cache and PM → 64 byte cacheline

• Order of memory writes is not guaranteed

16







Evaluation

Conclusion

17

Outline

Challenge in In-Memory Extendible Hashing

18

Problems

- Page-sized bucket → 64 cacheline accesses per bucket

Directory

(G=2)

Bucket

002 012 102 112

0000 &val0

1000 &val8

… …

… …

… …

… …

0110 &val6

0010 &val2

…

Cacheline 1

Cacheline 2

Cacheline 64

Cacheline 3

Cacheline 1

Cacheline 2

Cacheline 64

Cacheline 3

Cacheline 1

Cacheline 2

Cacheline 64

Cacheline 3

Directory

(G=20)

Challenge in In-Memory Extendible Hashing

19

00..00Buckets

Problems

Cacheline-sized small bucket → a large directory (8 byte pointer per cacheline)

…

Cacheline-Size

……

… … …

… … …… …

Directory

(G=2)0002 0012 0102 0112

1001 &val11

0001 &val3

0111 &val7

1111 &val15

Challenge in Extendible Hashing on PM

20

0000 &val0

1000 &val8

0010 &val1

0110 &val5

11001 &val27

Buckets

Problems

Split operation updates multiple pointers → Not Failure-Atomic

3-Level Structure → Introduces an intermediate level, Segment

→ Lookup via only two cacheline accesses

Failure-atomic Directory Updates→ Introduces the split buddy tree to manage split history

Failure-atomic Segment Split → Lazy deletion scheme to minimize dirty writes

21

Contributions

• A group of multiple cacheline-sized buckets = Segment

22

Segment: Intemediate Level

Directory

(G=20)

00..00Buckets …

Cacheline-Size

……

… …… …

… … …… …

3-Level StructureDirectory → Segment → Cacheline-sized Bucket

23

Segment: Intemediate Level

Directory

(G=2)

00..00Buckets …

Cacheline-Size

……

… …

Using intermediate level “Segment”,

CCEH reduces directory size while keeping bucket size small

… … …… …

Q: With large segments, how can we minimize cacheline accesses?

24

Minimize Cacheline Accesses in Segment

002 012 102 112Directory

(G=2)

1000 &val8 Bucket 002




Segment 102





Segment 112





Segment 002

10012

Hash Key


25


002 012 102 112Directory

(G=2)





Segment 102





Segment 112





Segment 002

10 012

Hash Key

26


002 012 102 112Directory

(G=2)





Segment 102





Segment 112





Segment 002

10 012

Hash Key

Segment Index

Bucket Index


27


002 012 102 112Directory

(G=2)10 012

Hash Key





Segment 012





Segment 112





Segment 002Use hash key as index for both directory and segment

→ No need to access irrelevant buckets





28

Contributions

(G=3)Directory

L=1

L=1

L=1

L=1

L=2

L=2

L=2

L=2

L=2

L=2

L=3

L=3

29

S1 S1

S2

S1

S2

S3 S3

S4

Depth: 0 1 2 3

Global Depth:

S1

S1

S1

S2

S3

S4

S2

SPLITS5

(1) Update a directory entry

(2) Update a directory entry

(3) Update local depth of S1

S5

S5

S1

Using MSB segment index, split segments are pointed by adjacent directory entries

Recovery: Split History Buddy Tree in CCEH

L=1L=2

(G=3)Directory

L=1

L=1

L=1

L=2

L=2

L=3

L=3

S1S5

S1

Suppose a system crashes while S5 splits

30


S1 S1

S2

S1

S2

S2

S3

S1

S3

S4

S1

S2

S3

S4

S5

Depth: 0 1 2 3

Global Depth:

S2

L=2

L=1

L=1

L=1

L=2

L=2

L=3

L=3

(G=3)Directory

S1S5

S1

31

S1 S1

S2

S1

S2

S2

S3

S1

S3

S4

S1

S2

S3

S4

S5

Depth: 0 1 2 3

Global Depth:

S2

𝑺𝒕𝒓𝒊𝒅𝒆 = 𝟐𝐆−𝐋

• Each segment must be pointed by 𝟐𝐆−𝐋 directory entries

• If not, rollback the split

Global Depth G = 3

Local Depth L = 1

Stride = 4


L=2L=1

L=1

L=1

L=1

L=2

L=2

L=3

L=3

(G=3)Directory

S1S5

S1

32

S1 S1

S2

S1

S2

S2

S3

S1

S3

S4

S1

S2

S3

S4

S5

Depth: 0 1 2 3

Global Depth:

S2

S1

𝑺𝒕𝒓𝒊𝒅𝒆 = 𝟐𝐆−𝐋

Global Depth G = 3

Local Depth L = 1

Stride = 4


• Each segment must be pointed by 𝟐𝐆−𝐋 directory entries

• If not, rollback the split





33

Contributions

002 012 102 112

Segment Split: Legacy CoW

Directory

34

Copy-on-Write Split

→ Lock-Free Search is enabled 002 012

Local Depth = 1

002 01000 &val8

012 00001 &val1

102 01010 &val10

112 00011 &val3Segment 02

Local Depth = 2

002

012

102

112

Segment 002

Local Depth = 2

002

012

102

112

Segment 012

01000 &val8

00001 &val1

01010 &val10

00011 &val3

Local Depth = 2

002

012

102

112

Segment 012

Local Depth = 1

002 01000 &val8

012 00001 &val1

102 01010 &val10

112 00011 &val3Segment 02

&val801000

01010 &val10

002 012 102 112

Segment Split: Lazy Deletion

Directory

35

Lazy Deletion

→ Minimizes dirty writes002 012

Single

Cacheline-flush

invalidates all the

migrated data

Segment 002

Local Depth = 2

002 01000 &val8

012 00001 &val1

102 01010 &val10

112 00011 &val3

01000 &val8

01010 &val10







Evaluation

Conclusion

36

Outline

Experimental Setup

37

64GB of DDR3 DRAM

2x Intel Xeon Haswell-Ex E7-4809 v3→ 8 cores, 2.0 GHz

→ 20MB L3 cache

Quartz: A DRAM-based PM latency emulator

* To emulate write latency, we inject stall cycle after

each clflush instructions

CPU

Memory

PM

Workload 160 Million random number dataset

CCEH VS Legacy Extendible Hash

38

0

200

400

600

800

1000

1200

1400

256B 1KB 4KB 16KB 64KB 256KB

Th

rou

gh

pu

t (O

ps/m

sec)

Segment Size for CCEH/Bucket Size for Extendible Hash

0

0.5

1

1.5

2

2.5

3

3.5

4

256B 1KB 4KB 16KB 64KB 256KB

Segment Size for CCEH/Bucket Size for Extendible Hash

Avg

. # o

f clf

lush

CCEH Extendible Hashing

CCEH compared to legacy Extendible Hashing

Cons: Low utilization and more cacheline flushes due to hash collisions

Pros: Constant number of cacheline accesses with varying directory size

Insertion Performance Breakdown

39

0

0.5

1

1.5

2

2.5

3

3.5

4

CCEH CCEH(C) CUCK LEVL PATH

0

0.5

1

1.5

2

2.5

3

3.5

4

CCEH CCEH(C) CUCK LEVL PATH

Write Rehash Cuckoo Displacement

R:120ns/W:120ns

(DRAM)

R:240ns/W:700ns

Avg

. E

xec.

Tim

e (

usec)

CCEH is 70% faster than Level Hashing (OSDI’18) on PM

Fewer # of cacheline accesses

Lazy deletion → Efficient segment split

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900

40

Load Factor

Number of Records (k)

Lo

ad

Facto

r (%

)CCEH (K=4) CCEH (K=64) Level Hashing

Level Hashing

Suffer from full table rehashing like the other static hashing

CCEH

Dynamic allocation of small segments → Smooth curves

CCEH (Optimization)

K = Linear probing distance in Segment

Cacheline-Conscious Extendible Hashing (CCEH) • 3-Level Structure

• Introduced an intermediate level, Segment

• Constant Lookup: Only two cacheline accesses → Write-Optimal

• Failure-Atomic Write-Optimal Lazy Deletion → Minimize I/O

• Failure-Atomic Directory Updates → Log-less directory update

Disk-based hashing needs to be modified for PM to make effective use of cachelines.

Source Codes: http://github.com/DICL/CCEH

Conclusion

41

Question?

Write-Optimized Dynamic Hashing for Persistent Memory...Write-Optimized Dynamic Hashing for...

Documents

Transcript of Write-Optimized Dynamic Hashing for Persistent Memory...Write-Optimized Dynamic Hashing for...