MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

Post on 22-Feb-2016

32 views 0 download

Tags:

description

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. Bin Fan, David G. Andersen, Michael Kaminsky. Presenter: Son Nguyen. Memcached internal. LRU caching using chaining Hashtable and doubly linked list. Goals. Reduce space overhead (bytes/key) - PowerPoint PPT Presentation

Transcript of MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

MemC3: Compact and Concurrent MemCache with Dumber

Caching and Smarter Hashing

Bin Fan, David G. Andersen, Michael Kaminsky

Presenter: Son Nguyen

Memcached internal• LRU caching using chaining Hashtable and

doubly linked list

Goals

• Reduce space overhead (bytes/key)• Improve throughput (queries/sec)• Target read-intensive workload with small

objects• Result: 3X throughput, 30% more objects

Doubly-linked-list’s problems

• At least two pointers per item -> expensive• Both read and write change the list’s structure

-> need locking between threads (no concurrency)

Solution: CLOCK-based LRU

• Approximate LRU• Multiple readers/single writer• Circular queue instead of linked list -> less

space overhead

CLOCK exampleentry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)

recency 1 0 1 1 0

entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)

recency 1 0 1 0 0Read(kd):

entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve)

recency 1 1 0 0 0Write(kf, vf):

entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve)

recency 0 1 0 1 1Write(kg, vg):

Originally:

Chaining Hashtable’s problems

• Use linked list -> costly space overhead for pointers

• Pointer dereference is slow (no advantage from CPU cache)

• Read is not constant time (due to possibly long list)

Solution: Cuckoo Hashing

• Use 2 hashtables• Each bucket has exactly 4 slots (fits in CPU

cache)• Each (key, value) object therefore can reside at

one of the 8 possible slots

Cuckoo Hashing

(ka,va)

HASH1(ka)

HASH2(ka)

Cuckoo Hashing

• Read: always 8 lookups (constant, fast)• Write: write(ka, va) – Find an empty slot in 8 possible slots of ka– If all are full then randomly kick some (kb, vb) out– Now find an empty slot for (kb, vb)– Repeat 500 times or until an empty slot is found– If still not found then do table expansion

Cuckoo HashingX

X X X

X

X X

X X

X X X

X c X X

X X

X X X X

X

X

(ka,va)

HASH1(ka)

HASH2(ka)

ba

Insert a:

Cuckoo HashingX

X X a X

X

X X

X X

X X X

X X X

X X

X X X X

X

X

(kb,vb)

HASH1(kb)

HASH2(kb) cb

Insert b:

Cuckoo HashingX

X X a X

X

X X

X X

X X X

X b X X

X X

X X X X

X

X

(kc,vc)

HASH1(kc)

HASH2(kc)

c

Insert c:

Done !!!

Cuckoo Hashing

• Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss

• Solution: Compute the kick out path (Cuckoo path) first, then move items backward

• Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c)• Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)

Cuckoo pathX

X X b X

X

X X

X X

X X X

X c X X

X X

X X X X

X

X

(ka,va)

HASH1(ka)

HASH2(ka)

Insert a:

Cuckoo path backward insertX

X X X

X

X X

X X

X X X

X X X

X X

X X X X

X

X

(ka,va)

HASH1(ka)

HASH2(ka)

Insert a:

c

ba

Cuckoo’s advantages

• Concurrency: multiple readers/single writer• Read optimized (entries fit in CPU cache)• Still O(1) amortized time for write• 30% less space overhead• 95% table occupancy

Evaluation68% throughput improvement in all hit case. 235% for all miss

Evaluation3x throughput on “real” workload

Discussion

• Write is slower than chaining Hashtable– Chaining Hashtable: 14.38 million keys/sec– Cuckoo: 7 million keys/sec

• Idea: finding cuckoo path in parallel– Benchmark doesn’t show much improvement

• Can we make it write-concurrent?