EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced,...

Post on 23-Jul-2020

27 views 0 download

Transcript of EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced,...

EC-Cache: Load-balanced, Low-latency Cluster Caching with

Online Erasure Coding

Joint work with

Mosharaf Chowdhury, Jack Kosaian (U Michigan) Ion Stoica, Kannan Ramchandran (UC Berkeley)

Rashmi'Vinayak'

UC#Berkeley

Caching'for'data4intensive'clusters

• Data.intensive#clusters#rely#on#distributed, in-memory#caching#for#high#performance#

. Reading#from#memory#orders#of#magnitude#faster#than#from#disk/ssd#

. Example:##Alluxio#(formerly#Tachyon†)

†Li#et#al.#SOCC#2014# 2

Imbalances'prevalent'in'clusters'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

3

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Small#fracRon#of#objects#highly#popular#. Zipf.like#distribuRon##. Top#5%#of#objects#7x#more#popular#than#boWom#75%†#

(Facebook#and#MicrosoY#producRon#cluster#traces)

†Ananthanarayanan#et#al.#NSDI#2012#

Imbalances'prevalent'in'clusters'

4

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#

with#>#50%#uRlizaRon##

(Facebook#data.analyRcs#cluster)

Imbalances'prevalent'in'clusters'

5

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#

with#>#50%#uRlizaRon##

(Facebook#data.analyRcs#cluster)

Imbalances'prevalent'in'clusters'

†#Chowdhury#et#al.#SIGCOMM#2013#

. Similar#observaRons#from#other#producRon#clusters†

5

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#load#imbalance#

• Failures/unavailabilites

Norm#rather#than#the#excepRon#. median#>#50#machine#unavailability#events#every#day#in#a#

cluster#of#several#thousand#servers†#

(Facebook#data#analyRcs#cluster)

Imbalances'prevalent'in'clusters'

†Rashmi#et#al.#HotStorage#2013 6

➡ Adverse#effects:#4 load#imbalance'

. high#read#latency

Imbalances'prevalent'in'cluster'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

7

➡ Adverse#effects:#4 load#imbalance'

. high#read#latency

Imbalances'prevalent'in'cluster'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

Single#copy#in#memory#oYen#not#sufficient#to#get#good#performance

7

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

8

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B

GET A GET B

2x 1x

…Server 1 Server 2 Server 3

8

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B A

GET A GET AGET B

1x 1x1x

…Server 1 Server 2 Server 3

8

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B A

GET A GET AGET B

1x 1x1x

• Used#in#data.intensive#clusters†#as#well#as#widely#used#in#key.value#stores#for#many#web.services#such#as#Facebook#Tao‡

…Server 1 Server 2 Server 3

†Ananthanarayanan#et#al.#NSDI#2011,##‡Bronson#et#al.#ATC!20138

Memory'Overhead

Read'performance''

&'Load'balance''

9

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

9

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

9

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

EC4Cache

9

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

EC4Cache

“Erasure'Coding”

9

Quick'primer'on'erasure'coding

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

Decode

d1 d2 d3 d4 d5 p1 p2 p3 p4

d1 d2 d3 d4 d5

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

Decode

d1 d2 d3 d4 d5

10

EC4Cache'bird’s'eye'view:'Writes

11

EC4Cache'bird’s'eye'view:'Writes

XPut

Caching#servers

11

EC4Cache'bird’s'eye'view:'Writes

X

k#=#2Splitd2

Put

d1

• Object#split#into#k#data#units

Caching#servers

11

EC4Cache'bird’s'eye'view:'Writes

k#=#2#r#=#1

X

Encode

p1

k#=#2Splitd2

d1 d2

Put

d1

• Object#split#into#k#data#units

• Encoded#to#generate#r#parity#units

Caching#servers

11

EC4Cache'bird’s'eye'view:'Writes

k#=#2#r#=#1

X

Encode

p1

k#=#2Splitd2

d1 d2

p1d1 d2

Put

d1

• Object#split#into#k#data#units

• Encoded#to#generate#r#parity#units

• (k+r)#units#cached#on#disRnct#servers#chosen#uniformly#at#random Caching#servers

11

EC4Cache'bird’s'eye'view:'Reads

12

EC4Cache'bird’s'eye'view:'Reads

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Decode

Δ#=#1#k#+#Δ#=#3

Read units

d1 d2

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

• Decode#the#data#units

Caching#servers

12

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Decode

Δ#=#1#k#+#Δ#=#3

Read units

d1 d2

p1d1 d2

d2

X

Get X

p1

Combine

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

• Decode#the#data#units

• Combine#the#decoded#units

Caching#servers

12

Erasure'coding:'How'does'it'help?

13

Erasure'coding:'How'does'it'help?

1. Finer'control'over'memory'overhead'

. SelecRve#replicaRon#allows#only#integer#control#

. Erasure#coding#allows#fracRonal#control#

. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1

13

Erasure'coding:'How'does'it'help?

1. Finer'control'over'memory'overhead'

. SelecRve#replicaRon#allows#only#integer#control#

. Erasure#coding#allows#fracRonal#control#

. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1

2. Object'spliRng'helps'in'load'balancing'

. Smaller#granularity#reads#help#to#smoothly#spread#load#

. Analysis#on#a#certain#simplified#model:Theorem 1 For the setting described above:

Var(LEC-Cache)

Var(LSelective Replication)=

1

k.

Proof: Let w > 0 denote the popularity of each of thefiles. The random variable LSelective Replication is distributedas a Binomial random variable with F trials and successprobability 1

S , scaled by w. On the other hand, LEC-Cacheis distributed as a Binomial random variable with kF tri-als and success probability 1

S , scaled by wk . Thus we have

Var(LEC-Cache)

Var(LSelective Replication)=

�wk

�2

(kF ) 1

S

�1� 1

S

w2F 1

S

�1� 1

S

� =1

k,

thereby proving our claim. ⇤Intuitively, the splitting action of EC-Cache leads to

a smoother load distribution in comparison to selectivereplication. One can further extend Theorem 1 to accom-modate a skew in the popularity of the objects. Such anextension leads to an identical result on the ratio of thevariances. Additionally, the fact that each split of an ob-ject in EC-Cache is placed on a unique server furtherhelps in evenly distributing the load, leading to even bet-ter load balancing.

5.2 Impact on Latency

Next, we focus on how object splitting impacts read la-tencies. Under selective replication, a read request foran object is served by reading the object from a server.We first consider naive EC-Cache without any additionalreads. Under naive EC-Cache, a read request for an ob-ject is served by reading k of its splits in parallel fromk servers and performing a decoding operation. Let usalso assume that the time taken for decoding is negligi-ble compared to the time taken to read the splits.

Intuitively, one may expect that reading splits in paral-lel from different servers will reduce read latencies dueto the parallelism. While this reduction indeed occurs forthe average/median latencies, the tail latencies behave inan opposite manner due to the presence of stragglers –one slow split read delays the completion of the entireread request.

In order to obtain a better understanding of the afore-mentioned phenomenon, let us consider the followingsimplified model. Consider a parameter p 2 [0, 1] andassume that for any request, a server becomes a stragglerwith probability p, independent of all else. There are twoprimary contributing factors to the distributions of the la-tencies under selective replication and EC-Cache:

(a) Proportion of stragglers: Under selective replica-tion, the fraction of requests that hit stragglers is p. Onthe other hand, under EC-Cache, a read request for anobject will face a straggler if any of the k servers fromwhere splits are being read becomes a straggler. Hence,

a higher fraction�1� (1� p)k

�of read requests can hit

stragglers under naive EC-Cache.(b) Latency conditioned on absence/presence of strag-

glers: If a read request does not face stragglers, the timetaken for serving a read request is significantly smallerunder EC-Cache as compared to selective replication be-cause splits can be read in parallel. On the other hand, inthe presence of a straggler in the two scenarios, the timetaken for reading under EC-Cache is about as large asthat under selective replication.

Putting the aforementioned two factors together we getthat the relatively higher likelihood of a straggler underEC-Cache increases the number of read requests incur-ring a higher latency. The read requests that do not en-counter any straggler incur a lower latency as comparedto selective replication. These two factors explain the de-crease in the median and mean latencies, and the increasein the tail latencies.

In order to alleviate the impact on tail latencies, weuse additional reads and late binding in EC-Cache. Reed-Solomon codes have the property that any k of the collec-tion of all splits of an object suffice to decode the object.We exploit this property by reading more than k splitsin parallel, and using the k splits that are read first. It iswell known that such additional reads help in mitigatingthe straggler problem and alleviate the affect on tail la-tencies [36, 82].

6 Evaluation

We evaluated EC-Cache through a series of experimentson Amazon EC2 [1] clusters using synthetic workloadsand traces from Facebook production clusters. The high-lights of the evaluation results are:• For skewed popularity distributions, EC-Cache im-

proves load balancing over selective replication by3.3⇥ while using the same amount of memory. EC-Cache also decreases the median latency by 2.64⇥and the 99.9th percentile latency by 1.79⇥ (§6.2).

• For skewed popularity distributions and in the pres-ence of background load imbalance, EC-Cache de-creases the 99.9th percentile latency w.r.t. selectivereplication by 2.56⇥ while maintaining the samebenefits in median latency and load balancing as inthe case without background load imbalance (§6.3).

• For skewed popularity distributions and in the pres-ence of server failures, EC-Cache provides a gracefuldegradation as opposed to the significant degradationin tail latency faced by selective replication. Specif-ically, EC-Cache decreases the 99.9th percentile la-tency w.r.t. selective replication by 2.8⇥ (§6.4).

• EC-Cache’s improvements over selective replicationincrease as object sizes increase in production traces;

13

Erasure'coding:'How'does'it'help?

3. Object'spliRng'reduces'median'latency'but'hurts'tail'

latency'

. Read#parallelism#helps#reduce#median#latency#

. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)

14

Erasure'coding:'How'does'it'help?

3. Object'spliRng'reduces'median'latency'but'hurts'tail'

latency'

. Read#parallelism#helps#reduce#median#latency#

. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)

4. “Any'k'out'of'(k+r)”'property'helps'to'reduce'tail'latency'

. Read#from#(k#+#Δ)#and#use#the#first#k#that#arrive##

. Δ#=#1#oYen#sufficient#to#reign#in#tail#latency

14

Design'considera?ons

15

Design'considera?ons

Storage#systems EC.Cache

• Space.efficient#fault#tolerance • Reduce#read#latency#

• Load#balance

1. Purpose'of'erasure'codes

15

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

16

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012

• OpRmize#resource#usage#during#reconstrucRon#operaRons†#

• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property

16

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012

• No#reconstrucRon#operaRons#in#caching#layer;#data#persisted#in#underlying#storage#

• “Any#k#out#of#(k+r)”#property#helps#in#load#balancing#and#reducing#latency#when#reading#objects

• OpRmize#resource#usage#during#reconstrucRon#operaRons†#

• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property

16

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

17

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#

• Does#not#affect#fault#tolerance#

17

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

• Need#to#encode#within#objects#. To#spread#load#across#both#data#&#parity#

• Encoding#across:#Very#high#BW#overhead#for#reading#object#using#pariRes†

• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#

• Does#not#affect#fault#tolerance#

†Rashmi#et#al.#SIGCOMM#2014,##HotStorage#2013 17

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

18

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

• Reed.Solomon#code#

. Any#k#out#of#(k+r)#property

18

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

• Reed.Solomon#code#

. Any#k#out#of#(k+r)#property

• Intel#ISA.L#hardware#acceleraRon#library##

. Fast#encoding#and#decoding

18

Evalua?on'set4up

19

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

19

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

19

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

• EC.Cache#uses#k#=#10,##Δ#=#1#

. BW#overhead#=#10%

19

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

• EC.Cache#uses#k#=#10,##Δ#=#1#

. BW#overhead#=#10%

• Varying#object#sizes

19

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

20

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

• Percent#imbalance#metric:

e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).

• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).

6.1 Methodology

Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.

As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.

Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.

Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:

Latency Improvement =Latency w/ Compared Scheme

Latency w/ EC-Cache

If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).

We measure load imbalance using the percent imbal-ance metric � defined as follows:

� =

✓Lmax

� Lavg?

Lavg?

◆⇤ 100, (1)

where Lmax

is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.

Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).

Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead

245

238 286 43

5

1226

99 93 141 22

9

478

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

242

238

283 340

881

96 90 134 193

492

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

Figure 8: Read latencies under skewed popularity of objects.

to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.

Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.

Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.

Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.

6.2 Skew Resilience

We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.

Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-

20

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

λSR = 43.45% λEC = 13.14%

• Percent#imbalance#metric:

e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).

• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).

6.1 Methodology

Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.

As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.

Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.

Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:

Latency Improvement =Latency w/ Compared Scheme

Latency w/ EC-Cache

If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).

We measure load imbalance using the percent imbal-ance metric � defined as follows:

� =

✓Lmax

� Lavg?

Lavg?

◆⇤ 100, (1)

where Lmax

is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.

Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).

Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead

245

238 286 43

5

1226

99 93 141 22

9

478

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

242

238

283 340

881

96 90 134 193

492

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

Figure 8: Read latencies under skewed popularity of objects.

to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.

Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.

Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.

Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.

6.2 Skew Resilience

We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.

Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-

>'3x'reduc?on'in'load'imbalance'metric20

Read'latency

242

238

283 340

881

96

90

134 193

492

0 200 400 600 800

1000 1200 1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s) Selective Replication

EC-Cache

21

Read'latency

• Median:#2.64x#improvement#

• 99th#and#99.9th:#~1.75x#improvement

242

238

283 340

881

96

90

134 193

492

0 200 400 600 800

1000 1200 1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s) Selective Replication

EC-Cache

21

Varying'object'sizes

5.5x improvement for 100MB

More'improvement'for'larger'object'sizes

0

500

1000

1500

2000

10 30 50 70 90

Rea

d L

aten

cy (m

s)

Object Size (MB)

EC-Cache (Median) Selective Replication (Median)

Median#latency

0

500

1000

1500

2000

10 30 50 70 90

Rea

d L

aten

cy (m

s)

Object Size (MB)

EC-Cache (99th) Selective Replication (99th)

Tail#latency

3.85x improvement for 100 MB

22

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

CD

F

Read Latency (ms)

EC-Cache, �=0EC-Cache, �=1Selective Replication

Role'of'addi?onal'reads'(Δ)

23

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

CD

F

Read Latency (ms)

EC-Cache, �=0EC-Cache, �=1Selective Replication

Significant degradation in tail latency without additional reads (i.e., Δ = 0)

Role'of'addi?onal'reads'(Δ)

23

Addi?onal'evalua?ons'in'the'paper

• With#background#network#imbalance##

• With#server#failures#

• Write#performance#

• SensiRvity#analysis#for#all#parameters

24

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

• ImplementaRon#on#Alluxio#

• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

• ImplementaRon#on#Alluxio#

• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement

Thanks!