AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System...

20
AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN USENIX NSDI. Boston, March 28, 2017. Daniel S. Berger Mor Harchol-Balter Ramesh K. Sitaraman

Transcript of AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System...

Page 1: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN

USENIX NSDI. Boston, March 28, 2017.

Daniel S.Berger

MorHarchol-Balter

Ramesh K. Sitaraman

Page 2: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

CDN Caching Architecture

1

Content providers

Users

CDN

100% 100% 100% 100%

1% 1% 1% 1%

DC

HOC

Page 3: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

HOC performance metric

object hit ratio = OHR =

Optimizing CDN CachesTwo caching levels:

❏ Disk Cache (DC)

❏ Hot Object Cache (HOC)

2

# reqsserved

by HOC

total# reqs

Goal: maximize OHR100%

40%

DC

HOC

1%

Page 4: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Frequent decisions required

What to admit What to evict

Prior Approaches to Cache Management

LRU

mixtures of LRU/LFU

concurrentLRU

historicallyToday in practicee.g., Nginx, Varnish

2000s in academiae.g., Modha, Zhang, Kumar

2010s in academiae.g., Kaminsky, Lim, Andersen

everything

everything

everything

500 GBper hour

3

DC

HOC

a few GBs capacity

Page 5: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

We Are Missing a Key Issue

Not all objects are the same

4

9 orders of magnitude

❏ Should we admit every object?(no, we should favor small objects)

❏ A few key companies know this(but don’t know how to it well)

❏ Academia has not been helpful(almost all theoretical work assumes equal-sized objects)

Page 6: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

What’s Hard About Size-Aware AdmissionFixed Size Threshold:

admit if size < Threshold c

5

The best threshold changes with traffic mix

How to pick c:pick c to maximize OHR

Threshold c

2pm 9pm

best c

at 8 a

m

Page 7: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Probabilistic admission:

Can we avoid picking a threshold c

6

Which curve makes big difference

Unfortunately, many curvesexample: exp(c) family

We need to adapt c

high admission probability

low admission probability

Page 8: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

What to admit What to evict

The AdaptSize Caching System

concurrent LRUAdaptSize adaptive size-aware

First system that continuously adaptsthe parameter of size-aware admission

Incorporated into high-throughputproduction caching system (Varnish)

7

adaptwithtraffic

adaptwithtime

Take traffic measurements

Calculate the best c

Enforce admission

control

Calculate the best c

Page 9: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

How to Find Best c Within Each Δ Interval

8

Local optima onOHR-vs-c curve

Traditional approach

Hill climbing

…time

Enables speedy global optimization

AdaptSize approach

Markov model

Δ interval Δ interval Δ interval

Page 10: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

How AdaptSize Gets the OHR-vs-c curve

Markov chain

9

Why hasn’t this been done? Too slow: exponential state space

➢ track IN/OUT for each objectIN OUT

request request

hit

Algorithm For every Δ interval and for every value of c

❏ use Markov chain to solve for OHR(c)

❏ find c to maximize OHR

miss

New technique: approximation with linear state space

Page 11: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

DC

HOC

Implementing AdaptSize

10

Incorporated into Varnishhighly concurrent HOC system, 40+ Gbit/s

Take traffic measurements

Calculate the best c

Enforce admission

control

AdaptSize

Goal: low overheadon request path

Page 12: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

DC

HOC

Implementing AdaptSize

11

Take traffic measurements

Calculate the best c

Enforce admission

control

AdaptSize

1) Concurrent write conflicts

2) Locks too slow [NSDI’13 & 14]

producer/consumer + ring buffer

Challenges

AdaptSize:

40% 1%requests objects

Incorporated into Varnishhighly concurrent HOC system, 40+ Gbit/s

Lock-free implementation

Page 13: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

DC

HOC

Implementing AdaptSize

12

Take traffic measurements

Calculate the best c

Enforce admission

control

AdaptSize

Incorporated into Varnishhighly concurrent HOC system, 40+ Gbit/s

admission is really simpleAdaptSize:

Enables lock free & low overhead implementation

❏ given c, and the object size

❏ admit with P(c, size)

Page 14: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

40 GBit /100ms RTT

AdaptSize Evaluation Testbed

13

Clients: replay Akamai requests trace 440 million / 152 TB total requests

Origin: emulates 100s of web servers 55 million / 8.9 TB unique objects

HOC systems: 1.2 GB 16 threads

❏ unmodified Varnish

❏ NGINX cache

❏ AdaptSize40 GBit /30ms RTT

DC

HOCAdapt

Size

Origin

DC: unmodified Varnish 4x 1TB/ 7200 Rpm

Page 15: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Comparison to Production Systems

14

+92%

what to admit what to evict

Varnish

Nginx

AdaptSize

frequency filter LRU

adaptive size-aware concurrent LRU

everything concurrent LRU

+48%

Page 16: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Comparison to Research-Based Systems

15

manually tuned parameters

manually tuned parameters

manually tuned parameters+67%

recency and frequency

combinations

Page 17: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Robustness of AdaptSizeSize-Aware OPT: offline parameter tuning

16

AdaptSize: our Markovian tuning model

HillClimb: local-search using shadow queues

Page 18: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Approach: size-based admission control

ConclusionGoal: maximize OHR of the Hot Object Cache

17

OHR=

# reqsserved

by HOC

total# reqs

Page 19: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

AdaptSize: adapts c via a Markov chain

Approach: size-based admission control

ConclusionGoal: maximize OHR of the Hot Object Cache

18

OHR=

# reqsserved

by HOC

total# reqs

Key insight: need to adapt parameter c

Result: 48-92% higher OHRs

Page 20: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN · The AdaptSize Caching System concurrent LRU AdaptSize adaptive size-aware First system that continuously adapts the

Key insight: need to adapt parameter c

Approach: size-based admission control

ConclusionGoal: maximize OHR of the Hot Object Cache

19

OHR=

# reqsserved

by HOC

total# reqs

Result: 48-92% higher OHRs

In our paper ❏ Throughput❏ Disk utilization❏ Byte hit ratio❏ Request latency

/dasebe/AdaptSize

AdaptSize: adapts c via a Markov chain