EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced,...

73
EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury, Jack Kosaian (U Michigan) Ion Stoica, Kannan Ramchandran (UC Berkeley) Rashmi Vinayak UC Berkeley

Transcript of EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced,...

Page 1: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC-Cache: Load-balanced, Low-latency Cluster Caching with

Online Erasure Coding

Joint work with

Mosharaf Chowdhury, Jack Kosaian (U Michigan) Ion Stoica, Kannan Ramchandran (UC Berkeley)

Rashmi'Vinayak'

UC#Berkeley

Page 2: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Caching'for'data4intensive'clusters

• Data.intensive#clusters#rely#on#distributed, in-memory#caching#for#high#performance#

. Reading#from#memory#orders#of#magnitude#faster#than#from#disk/ssd#

. Example:##Alluxio#(formerly#Tachyon†)

†Li#et#al.#SOCC#2014# 2

Page 3: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Imbalances'prevalent'in'clusters'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

3

Page 4: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Small#fracRon#of#objects#highly#popular#. Zipf.like#distribuRon##. Top#5%#of#objects#7x#more#popular#than#boWom#75%†#

(Facebook#and#MicrosoY#producRon#cluster#traces)

†Ananthanarayanan#et#al.#NSDI#2012#

Imbalances'prevalent'in'clusters'

4

Page 5: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#

with#>#50%#uRlizaRon##

(Facebook#data.analyRcs#cluster)

Imbalances'prevalent'in'clusters'

5

Page 6: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabilites

Some#parts#of#the#network#more#congested#than#others#. RaRo#of#maximum#to#average#uRlizaRon#more#than#4.5x#

with#>#50%#uRlizaRon##

(Facebook#data.analyRcs#cluster)

Imbalances'prevalent'in'clusters'

†#Chowdhury#et#al.#SIGCOMM#2013#

. Similar#observaRons#from#other#producRon#clusters†

5

Page 7: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#load#imbalance#

• Failures/unavailabilites

Norm#rather#than#the#excepRon#. median#>#50#machine#unavailability#events#every#day#in#a#

cluster#of#several#thousand#servers†#

(Facebook#data#analyRcs#cluster)

Imbalances'prevalent'in'clusters'

†Rashmi#et#al.#HotStorage#2013 6

Page 8: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

➡ Adverse#effects:#4 load#imbalance'

. high#read#latency

Imbalances'prevalent'in'cluster'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

7

Page 9: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

➡ Adverse#effects:#4 load#imbalance'

. high#read#latency

Imbalances'prevalent'in'cluster'

Sources#of#imbalance:#

• Skew#in#object#popularity#

• Background#network#imbalance#

• Failures/unavailabiliRes

Single#copy#in#memory#oYen#not#sufficient#to#get#good#performance

7

Page 10: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

8

Page 11: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B

GET A GET B

2x 1x

…Server 1 Server 2 Server 3

8

Page 12: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B A

GET A GET AGET B

1x 1x1x

…Server 1 Server 2 Server 3

8

Page 13: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Popular'approach:'Selec?ve'Replica?on

• Uses#some#memory#overhead#to#cache#replicas#of#objects#based#on#their#popularity#. more#replicas#for#more#popular#objects

A B A

GET A GET AGET B

1x 1x1x

• Used#in#data.intensive#clusters†#as#well#as#widely#used#in#key.value#stores#for#many#web.services#such#as#Facebook#Tao‡

…Server 1 Server 2 Server 3

†Ananthanarayanan#et#al.#NSDI#2011,##‡Bronson#et#al.#ATC!20138

Page 14: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Memory'Overhead

Read'performance''

&'Load'balance''

9

Page 15: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

9

Page 16: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

9

Page 17: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

EC4Cache

9

Page 18: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Memory'Overhead

Read'performance''

&'Load'balance''

Single'copy''

in'memory

Selec?ve'

replica?on

EC4Cache

“Erasure'Coding”

9

Page 19: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

10

Page 20: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

10

Page 21: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

10

Page 22: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Page 23: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Page 24: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

Decode

d1 d2 d3 d4 d5 p1 p2 p3 p4

d1 d2 d3 d4 d5

10

Page 25: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

10

Page 26: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Quick'primer'on'erasure'coding

• Takes#in#k data units#and#creates#r##“parity” units

• k = 5 • r = 4

• Any$k#of#the#(k+r)#units#are#sufficient#to#decode#the#original#k#data#units

data units parity units

Read

d1 d2 d3 d4 d5 p1 p2 p3 p4

Decode

d1 d2 d3 d4 d5

10

Page 27: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Writes

11

Page 28: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Writes

XPut

Caching#servers

11

Page 29: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Writes

X

k#=#2Splitd2

Put

d1

• Object#split#into#k#data#units

Caching#servers

11

Page 30: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Writes

k#=#2#r#=#1

X

Encode

p1

k#=#2Splitd2

d1 d2

Put

d1

• Object#split#into#k#data#units

• Encoded#to#generate#r#parity#units

Caching#servers

11

Page 31: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Writes

k#=#2#r#=#1

X

Encode

p1

k#=#2Splitd2

d1 d2

p1d1 d2

Put

d1

• Object#split#into#k#data#units

• Encoded#to#generate#r#parity#units

• (k+r)#units#cached#on#disRnct#servers#chosen#uniformly#at#random Caching#servers

11

Page 32: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

12

Page 33: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

12

Page 34: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

Page 35: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

Page 36: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

Get X

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

Page 37: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

Page 38: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Δ#=#1#k#+#Δ#=#3

Read units

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

Caching#servers

12

Page 39: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Decode

Δ#=#1#k#+#Δ#=#3

Read units

d1 d2

p1d1 d2

d2

Get X

p1

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

• Decode#the#data#units

Caching#servers

12

Page 40: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

EC4Cache'bird’s'eye'view:'Reads

… k#=#2#r#=#1

Decode

Δ#=#1#k#+#Δ#=#3

Read units

d1 d2

p1d1 d2

d2

X

Get X

p1

Combine

• Read#from#(k#+#Δ)#units#of#the#object#chosen#uniformly#at#random#

. “AddiRonal#reads”

• Use#the#first#k#units#that#arrive

• Decode#the#data#units

• Combine#the#decoded#units

Caching#servers

12

Page 41: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Erasure'coding:'How'does'it'help?

13

Page 42: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Erasure'coding:'How'does'it'help?

1. Finer'control'over'memory'overhead'

. SelecRve#replicaRon#allows#only#integer#control#

. Erasure#coding#allows#fracRonal#control#

. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1

13

Page 43: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Erasure'coding:'How'does'it'help?

1. Finer'control'over'memory'overhead'

. SelecRve#replicaRon#allows#only#integer#control#

. Erasure#coding#allows#fracRonal#control#

. E.g.,#k#=#10#allows#control#in#of#mulRples#of#0.1

2. Object'spliRng'helps'in'load'balancing'

. Smaller#granularity#reads#help#to#smoothly#spread#load#

. Analysis#on#a#certain#simplified#model:Theorem 1 For the setting described above:

Var(LEC-Cache)

Var(LSelective Replication)=

1

k.

Proof: Let w > 0 denote the popularity of each of thefiles. The random variable LSelective Replication is distributedas a Binomial random variable with F trials and successprobability 1

S , scaled by w. On the other hand, LEC-Cacheis distributed as a Binomial random variable with kF tri-als and success probability 1

S , scaled by wk . Thus we have

Var(LEC-Cache)

Var(LSelective Replication)=

�wk

�2

(kF ) 1

S

�1� 1

S

w2F 1

S

�1� 1

S

� =1

k,

thereby proving our claim. ⇤Intuitively, the splitting action of EC-Cache leads to

a smoother load distribution in comparison to selectivereplication. One can further extend Theorem 1 to accom-modate a skew in the popularity of the objects. Such anextension leads to an identical result on the ratio of thevariances. Additionally, the fact that each split of an ob-ject in EC-Cache is placed on a unique server furtherhelps in evenly distributing the load, leading to even bet-ter load balancing.

5.2 Impact on Latency

Next, we focus on how object splitting impacts read la-tencies. Under selective replication, a read request foran object is served by reading the object from a server.We first consider naive EC-Cache without any additionalreads. Under naive EC-Cache, a read request for an ob-ject is served by reading k of its splits in parallel fromk servers and performing a decoding operation. Let usalso assume that the time taken for decoding is negligi-ble compared to the time taken to read the splits.

Intuitively, one may expect that reading splits in paral-lel from different servers will reduce read latencies dueto the parallelism. While this reduction indeed occurs forthe average/median latencies, the tail latencies behave inan opposite manner due to the presence of stragglers –one slow split read delays the completion of the entireread request.

In order to obtain a better understanding of the afore-mentioned phenomenon, let us consider the followingsimplified model. Consider a parameter p 2 [0, 1] andassume that for any request, a server becomes a stragglerwith probability p, independent of all else. There are twoprimary contributing factors to the distributions of the la-tencies under selective replication and EC-Cache:

(a) Proportion of stragglers: Under selective replica-tion, the fraction of requests that hit stragglers is p. Onthe other hand, under EC-Cache, a read request for anobject will face a straggler if any of the k servers fromwhere splits are being read becomes a straggler. Hence,

a higher fraction�1� (1� p)k

�of read requests can hit

stragglers under naive EC-Cache.(b) Latency conditioned on absence/presence of strag-

glers: If a read request does not face stragglers, the timetaken for serving a read request is significantly smallerunder EC-Cache as compared to selective replication be-cause splits can be read in parallel. On the other hand, inthe presence of a straggler in the two scenarios, the timetaken for reading under EC-Cache is about as large asthat under selective replication.

Putting the aforementioned two factors together we getthat the relatively higher likelihood of a straggler underEC-Cache increases the number of read requests incur-ring a higher latency. The read requests that do not en-counter any straggler incur a lower latency as comparedto selective replication. These two factors explain the de-crease in the median and mean latencies, and the increasein the tail latencies.

In order to alleviate the impact on tail latencies, weuse additional reads and late binding in EC-Cache. Reed-Solomon codes have the property that any k of the collec-tion of all splits of an object suffice to decode the object.We exploit this property by reading more than k splitsin parallel, and using the k splits that are read first. It iswell known that such additional reads help in mitigatingthe straggler problem and alleviate the affect on tail la-tencies [36, 82].

6 Evaluation

We evaluated EC-Cache through a series of experimentson Amazon EC2 [1] clusters using synthetic workloadsand traces from Facebook production clusters. The high-lights of the evaluation results are:• For skewed popularity distributions, EC-Cache im-

proves load balancing over selective replication by3.3⇥ while using the same amount of memory. EC-Cache also decreases the median latency by 2.64⇥and the 99.9th percentile latency by 1.79⇥ (§6.2).

• For skewed popularity distributions and in the pres-ence of background load imbalance, EC-Cache de-creases the 99.9th percentile latency w.r.t. selectivereplication by 2.56⇥ while maintaining the samebenefits in median latency and load balancing as inthe case without background load imbalance (§6.3).

• For skewed popularity distributions and in the pres-ence of server failures, EC-Cache provides a gracefuldegradation as opposed to the significant degradationin tail latency faced by selective replication. Specif-ically, EC-Cache decreases the 99.9th percentile la-tency w.r.t. selective replication by 2.8⇥ (§6.4).

• EC-Cache’s improvements over selective replicationincrease as object sizes increase in production traces;

13

Page 44: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Erasure'coding:'How'does'it'help?

3. Object'spliRng'reduces'median'latency'but'hurts'tail'

latency'

. Read#parallelism#helps#reduce#median#latency#

. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)

14

Page 45: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Erasure'coding:'How'does'it'help?

3. Object'spliRng'reduces'median'latency'but'hurts'tail'

latency'

. Read#parallelism#helps#reduce#median#latency#

. Straggler#effect#hurts#tail#latency#(if#no#addiRonal#reads)

4. “Any'k'out'of'(k+r)”'property'helps'to'reduce'tail'latency'

. Read#from#(k#+#Δ)#and#use#the#first#k#that#arrive##

. Δ#=#1#oYen#sufficient#to#reign#in#tail#latency

14

Page 46: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

15

Page 47: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

• Space.efficient#fault#tolerance • Reduce#read#latency#

• Load#balance

1. Purpose'of'erasure'codes

15

Page 48: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

16

Page 49: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012

• OpRmize#resource#usage#during#reconstrucRon#operaRons†#

• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property

16

Page 50: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

2. Choice'of'erasure'code

†Rashmi#et#al.#SIGCOMM#2014,##Sathiamoorthy#et#al.#VLDB#2013,#Huang#et#al.#ATC!2012

• No#reconstrucRon#operaRons#in#caching#layer;#data#persisted#in#underlying#storage#

• “Any#k#out#of#(k+r)”#property#helps#in#load#balancing#and#reducing#latency#when#reading#objects

• OpRmize#resource#usage#during#reconstrucRon#operaRons†#

• Some#codes#do#not#have###“any#k#out#of#(k+r)”#property

16

Page 51: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

17

Page 52: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#

• Does#not#affect#fault#tolerance#

17

Page 53: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Design'considera?ons

Storage#systems EC.Cache

3. How'do'we'use'erasure'coding:'across'vs.'within'objects

• Need#to#encode#within#objects#. To#spread#load#across#both#data#&#parity#

• Encoding#across:#Very#high#BW#overhead#for#reading#object#using#pariRes†

• Some#systems#encode#across#objects#(e.g.,#HDFS.RAID);#some#within#(e.g.,#Ceph)#

• Does#not#affect#fault#tolerance#

†Rashmi#et#al.#SIGCOMM#2014,##HotStorage#2013 17

Page 54: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

18

Page 55: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

• Reed.Solomon#code#

. Any#k#out#of#(k+r)#property

18

Page 56: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Implementa?on

• EC.Cache#on#top#of#Alluxio#(formerly#Tachyon)#

. Backend#caching#servers:#cache#data#—#unaware#of#erasure#coding##

. EC.Cache#client#library:#all#read/write#logic#handled

• Reed.Solomon#code#

. Any#k#out#of#(k+r)#property

• Intel#ISA.L#hardware#acceleraRon#library##

. Fast#encoding#and#decoding

18

Page 57: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Evalua?on'set4up

19

Page 58: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

19

Page 59: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

19

Page 60: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

• EC.Cache#uses#k#=#10,##Δ#=#1#

. BW#overhead#=#10%

19

Page 61: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Evalua?on'set4up

• Amazon#EC2

• 25#backend#caching#servers#and#30#client#servers#

• Object#popularity:#Zipf#distribuRon#with#high#skew

• EC.Cache#uses#k#=#10,##Δ#=#1#

. BW#overhead#=#10%

• Varying#object#sizes

19

Page 62: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

20

Page 63: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

• Percent#imbalance#metric:

e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).

• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).

6.1 Methodology

Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.

As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.

Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.

Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:

Latency Improvement =Latency w/ Compared Scheme

Latency w/ EC-Cache

If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).

We measure load imbalance using the percent imbal-ance metric � defined as follows:

� =

✓Lmax

� Lavg?

Lavg?

◆⇤ 100, (1)

where Lmax

is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.

Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).

Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead

245

238 286 43

5

1226

99 93 141 22

9

478

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

242

238

283 340

881

96 90 134 193

492

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

Figure 8: Read latencies under skewed popularity of objects.

to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.

Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.

Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.

Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.

6.2 Skew Resilience

We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.

Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-

20

Page 64: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Load'balancing

0

100

200

300

400

Dat

a R

ead

(GB

)

Servers Sorted by Load 0 50

100 150 200 250 300 350 400

Dat

a R

ead

(GB

)

Servers Sorted by Load

SelecRve#ReplicaRon EC.Cache

λSR = 43.45% λEC = 13.14%

• Percent#imbalance#metric:

e.g., 5.5⇥ at median for 100 MB objects with an up-ward trend (§6.5).

• EC-Cache outperforms selective replication across awide range of values of k, r, and � (§6.6).

6.1 Methodology

Cluster Unless otherwise specified, our experimentsuse 55 c4.8xlarge EC2 instances. 25 of these machinesact as the backend servers for EC-Cache, each with 8GB cache space, and 30 machines generate thousandsof read requests to EC-Cache. All machines were in thesame Amazon Virtual Private Cloud (VPC) with 10 Gbpsenhanced networking enabled; we observed around 4-5 Gbps bandwidth between machines in the VPC usingiperf.

As mentioned earlier, we implemented EC-Cache onAlluxio [56], which, in turn, used Amazon S3 [2] as itspersistence layer and runs on the 25 backend servers. Weused DFS-Perf [5] to generate the workload on the 30client machines.

Metrics Our primary metrics for comparison are la-tency in reading objects and load imbalance across thebackend servers.

Given a workload, we consider mean, median, andhigh-percentile latencies. We measure improvements inlatency as:

Latency Improvement =Latency w/ Compared Scheme

Latency w/ EC-Cache

If the value of this “latency improvement” is greater (orsmaller) than one, EC-Cache is better (or worse).

We measure load imbalance using the percent imbal-ance metric � defined as follows:

� =

✓Lmax

� Lavg?

Lavg?

◆⇤ 100, (1)

where Lmax

is the load on the server which is maximallyloaded and Lavg? is the load on any server under an oraclescheme, where the total load is equally distributed amongall the servers without any overhead. � measures thepercentage of additional load on the maximally loadedserver as compared to the ideal average load. BecauseEC-Cache operates in the bandwidth-limited regime, theload on a server translates to the total amount of data readfrom that server. Lower values of � are better. Note thatthe percent imbalance metric takes into account the ad-ditional load introduced by EC-Cache due to additionalreads.

Setup We consider a Zipf distribution for the popular-ity of objects, which is common in many real-world ob-ject popularity distributions [20, 23, 56]. Specifically, weconsider the Zipf parameter to be 0.9 (that is, high skew).

Unless otherwise specified, we allow both selectivereplication and EC-Cache to use 15% memory overhead

245

238 286 43

5

1226

99 93 141 22

9

478

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

242

238

283 340

881

96 90 134 193

492

0

200

400

600

800

1000

1200

1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s)

Selective Replication

EC-Cache

Figure 8: Read latencies under skewed popularity of objects.

to handle the skew in the popularity of objects. Selec-tive replication uses all the allowed memory overheadfor handling popularity skew. Unless otherwise specified,EC-Cache uses k = 10 and � = 1. Thus, 10% of the al-lowed memory overhead is used to provide one parityto each object. The remaining 5% is used for handlingpopularity skew. Both schemes make use of the skew in-formation to decide how to allocate the allowed memoryamong different objects in an identical manner: the num-ber of replicas for an object under selective replicationand the number of additional parities for an object underEC-Cache are calculated so as to flatten out the popu-larity skew to the extent possible starting from the mostpopular object, until the memory budget is exhausted.

Moreover, both schemes use uniform random place-ment policy to evenly distribute objects (splits in case ofEC-Cache) across memory servers.

Unless otherwise specified, the size of each objectconsidered in these experiments is 40 MB. We presentresults for varying object sizes observed in the Facebooktrace in Section 6.5. In Section 6.6, we perform a sensi-tivity analysis with respect to all the above parameters.

Furthermore, we note that while the evaluations pre-sented here are for the setting of high skew in objectpopularity, EC-Cache outperforms selective replicationin scenarios with low skew in object popularity as well.Under high skew, EC-Cache offers significant benefitsin terms of load balancing and read latency. Under lowskew, while there is not much to improve in load balanc-ing, EC-Cache will still provide latency benefits.

6.2 Skew Resilience

We begin by evaluating the performance of EC-Cache inthe presence of skew in object popularity.

Latency Characteristics Figure 8 compares the mean,median, and tail latencies of EC-Cache and selectivereplication. We observe that EC-Cache improves medianand mean latencies by 2.64⇥ and 2.52⇥, respectively.EC-Cache outperforms selective replication at high per-

>'3x'reduc?on'in'load'imbalance'metric20

Page 65: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Read'latency

242

238

283 340

881

96

90

134 193

492

0 200 400 600 800

1000 1200 1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s) Selective Replication

EC-Cache

21

Page 66: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Read'latency

• Median:#2.64x#improvement#

• 99th#and#99.9th:#~1.75x#improvement

242

238

283 340

881

96

90

134 193

492

0 200 400 600 800

1000 1200 1400

Mean Median 95th 99th 99.9th

Rea

d L

aten

cy (m

s) Selective Replication

EC-Cache

21

Page 67: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Varying'object'sizes

5.5x improvement for 100MB

More'improvement'for'larger'object'sizes

0

500

1000

1500

2000

10 30 50 70 90

Rea

d L

aten

cy (m

s)

Object Size (MB)

EC-Cache (Median) Selective Replication (Median)

Median#latency

0

500

1000

1500

2000

10 30 50 70 90

Rea

d L

aten

cy (m

s)

Object Size (MB)

EC-Cache (99th) Selective Replication (99th)

Tail#latency

3.85x improvement for 100 MB

22

Page 68: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

CD

F

Read Latency (ms)

EC-Cache, �=0EC-Cache, �=1Selective Replication

Role'of'addi?onal'reads'(Δ)

23

Page 69: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

CD

F

Read Latency (ms)

EC-Cache, �=0EC-Cache, �=1Selective Replication

Significant degradation in tail latency without additional reads (i.e., Δ = 0)

Role'of'addi?onal'reads'(Δ)

23

Page 70: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Addi?onal'evalua?ons'in'the'paper

• With#background#network#imbalance##

• With#server#failures#

• Write#performance#

• SensiRvity#analysis#for#all#parameters

24

Page 71: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

Page 72: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

• ImplementaRon#on#Alluxio#

• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement

Page 73: EC-Cache Load-balanced, Low-latency Cluster Caching with ... · EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Joint work with Mosharaf Chowdhury,

Summary

• EC.Cache#

. Cluster#cache#employing#erasure#coding#for#load#balancing#and#reducing#read#latencies#

. Demonstrates#new#applicaRon#and#new#goals#for#which#erasure#coding#is#highly#effecRve

• ImplementaRon#on#Alluxio#

• EvaluaRon#. Load#balancing:#>#3x#improvement#. Median#latency:#>#5x#improvement##. Tail#latency:##>#3x#improvement

Thanks!