Download - Distributed caching for cloud computing

Distributed caching for cloud computing

Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens

February 11, 2013

Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16

Introduction Context

Cloud computing

VM VM VM

Host Host

Google S3

Cloud computingComputing resources as a serviceOn-demand self-serviceElastically provisionedQoS guarantees

Building a virtual platformDeal with different resources

From different providersWith different properties

We need a common interface


Introduction Context

Cloud computing

VM VM VM

Host Host

Cache

Google S3

Cloud computingComputing resources as a serviceOn-demand self-serviceElastically provisionedQoS guarantees

Building a virtual platformDeal with different resources

From different providersWith different properties

We need a common interface


Introduction Caching

Distributed caching

CPU

Localhard drive

Remotehard drive

100ms

Main memory

100ns

10msMain memory

100µs

Local hard driveHigh capacityHigh latencyBottleneck

NetworkLow latencyCapacity?

Main memoryVery low latencySmall capacityPrinciple of locality


Introduction Related works

Distributed cachingRelated works

Operating system layer:Application level [Memcached]

Existing applications have to be updatedFilesystem level [xFS, PAFS, Ceph]

Guest operating system have to use a specificfile system

Bloc level [XHive, dm-cache]Incompatible with distributed file systems

Existing solutions are not “cloud aware”

Applications

Virtual File System

Cache

File systems

Bloc level

Our contribution: a generic approach to develop ditributed caches for cloudcomputing


Development of a distributed cache Constraints

Development of a distributed cache

Implementation constraintsEnsure genericity⇒ Integration into the Linux kernelBe non-intrusive

Performance constraintsLimit overheadMinimise memory footprint

Applications

Virtual File System

Cache

Mem

ory

man

agem

ent

File systems

Bloc level


Development of a distributed cache Remote caches

Remote cacheDirect client cooperation

p

miss

Client

hit

Remote cache

res

IOreq

IOres

req

Remote memory extends localmemoryEasy localisation of dataNo data sharing


Development of a distributed cache Remote caches

Remote cacheDirect client cooperation

p

miss

Client

miss

Remote cache

res

IOreq

IOres

req

Remote memory extends localmemoryEasy localisation of dataNo data sharing


Development of a distributed cache Architecture

Architecture

p

miss

get()

client

server

IOreq

miss ?

IOres

Mres

Mget

ClientBasic operations: get and putBlocking getExecuted by the process inkernel-space

ServeurDedicated kernel threadRequest-responseRed-black tree



Architecture

p

put()

Memory

manager

alloc

client

server

Mput

Mput

ClientBasic operations: get and putput() called inside critical section

Executed in a dedicated thread




Architecture

p

put()

Memory

manager

alloc

client

server

Mput

Mput

ClientBasic operations: get and putput() called inside critical sectionExecuted in a dedicated thread



Development of a distributed cache Details and optimizations

Metadata managementProblem: metadata management efficiency

...1

...

0

...1

...

test(key0)hit ?

test(key2)miss

test(key3)hit ?

Solution: Bloom filter [Bloom’1970]Probabilistic data structureCompactNo false negativeFalse positive possible



Cache accesses managementProblem: sequential access detection

0get(0)

1get(1)

2 3get(2)

get(3)

get(4) 4 5 6

get(5)

get(6)

get(7) 7 8 9 10 11

...

Solution: prefetchingSequential read detectionRead predictionRead ahead of data

Amortized network latency



Communications managementProblem: network buffers memory footprint

p

put()

Memory

manager

alloc

Solution: zero-copyAvoid copying into the network stackDecrease memory allocationsAvoid deadlocks



Communications managementProblem: network buffers memory footprint

p

put()

Memory

manager

alloc

alloc

Solution: zero-copyAvoid copying into the network stackDecrease memory allocationsAvoid deadlocks


Evaluation Experiment setup

EvaluationExperiment setup

Virtualized platformIntel Core i7-2600 (4 hyper-threaded cores), 8GB memoryCache server (2 cores, 4GB)Client (2 cores, 512MB)

Reads from local virtual hard drive

1Gbit/s virtual network (∼ 600µs RTT)

Micro-benchmark32MB readEach read is split into fragments from 512 bytes to 8MBEach fragment is read at a random position from a file


Evaluation Performance evaluation

Remote miss overheadEmpty remote cache

0

20

40

60

80

100

210 212 214 216 218 220 222

Thro

ug

hput(

MB

/s)

Fragment size (bytes)

ClassicRemotecache

Bloom filter avoids remote missCode execution has a negligible



Performance peakData preloaded in remote cache

0

20

40

60

80

100

210 212 214 216 218 220 222

Thro

ug

hp

ut

(MB

/s)


ClassicRemotecache

210 212 214 216 218 220 222

1

2

3

4

5

6

7

8

9

Sp

eed

-up


Speed-up

Up to 8x performance improvement with small fragments (1KB)Performance drop above 128KB



Performance with local memory full of dataData preloaded in remotecache, full local memory

0 10 20 30 40 50 60 70 80 90

210 212 214 216 218 220 222

Thro

ug

hp

ut

(MB

/s)


ClassicRemotecache

210 212 214 216 218 220 222

1

2

3

4

5

6

Sp

eed

-up


Speed-up

Up to 6x performance improvement with small fragments (1KB)Performance drop above 64K


Conclusion

Conclusion

SummaryExisting distributed caches are not ”cloud aware“We propose an approach to develop distributed caches for the cloudWorking non-intrusive prototypePromising: up to 8x performance improvement in random read

Future worksRealistics benchmarks: Memcached, dm-cache, bcache,. . .Sequential read performance improvementsConsistency guarantees