Post on 08-Jan-2018
description
Serving Photos at Scaaale:
Caching and StorageAn Analysis of Facebook Photo Caching. Huang et al.
Finding a Needle in a Haystack. Beaver et al.
Vlad Niculae for CS6410Most slides from Qi Huang (SOSP 2013) and Peter Vajgel (
OSDI 2010)
Dynamic (hard to cache; TAO)
Static (photos, normally easy to cache)
Dynamic (hard to cache; TAO)
Static (photos, normally easy to cache)
Dynamic (hard to cache; TAO)
Static (photos, normally easy to cache)
CDN
Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook),
Sanjeev Kumar, Harry C. Li (Facebook)
An Analysis ofFacebook Photo Caching
Dynamic (hard to cache; TAO)
Static (photos, normally easy to cache)“Normal” site CDN hitrate ~99%
CDN
Dynamic (hard to cache; TAO)
Static (photos, normally easy to cache)“Normal” site CDN hitrate ~99%For Facebook, CDN hitrate ~80%
CDN
CacheLayers
StorageBackend
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
AkamaiOrigin Cache
Backend
Cache layers
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
Akamai Origin Cache
Backend
Cache layers
(no access)
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
Origin Cache
Backend
Cache layers
Points of presence:IndependentFIFOMain goal: reduce bandwidth
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
Origin Cache
Backend
Cache layers
Origin:CoordinatedFIFOMain goal: traffic sheltering
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
Origin Cache
Backend
Cache layers
Origin:Coordinated.FIFOMain goal: traffic sheltering
hash
Facebook Edge Cache
DatacenterBrowse
r Cache
Client
Origin Cache
Backend
Cache layers
Analyze traffic in production!
Instrument client JS
Log successful requests.
Correlate across layers.
18
Sampling on Power-law
Object-based
Object rank• Object-based: fair coverage of unpopular content• Sample 1.4M photos, 2.6M photo objects
Data analysis
21
77.2M
26.6M11.2M
7.6M
Backend (Haystack
)
Browser
Cache
Edge Cache
OriginCache
PoPClient
Data Center
65.5%58.0%
31.8%
R
Traffic Share
65.5% 20.0% 4.6% 9.9%
22
Object rank
23
Popularity Distribution
• Backend resembles a stretched exponential dist.
24
Popularity Impact on Caches
• Backend serves the tail
70% Haystack
A B C D E F G0
10
20
30
40
50
60
70
80
90
100
Hit rates for each level (fig 4c)
Browser Edge Origin
What if?
29
Edge Cache with Different Sizes
• “Infinite” size ratio needs 45x of current capacity
Infinite Cache
65% 68%59%
• Picked San Jose edge (high traffic, median hit ratio)
30
Edge Cache with Different Algos
• Both LRU and LFU outperform FIFO slightly
Infinite Cache
31
S4LRUCache Space
More Recent
L3
L2
L1
L0
35
Edge Cache with Different Algos
• S4LRU improves the most
68%
1/3x
Infinite Cache
59%
36
Edge Cache with Different Algos
• Clairvoyant => room for algorithmic improvement.
Infinite Cache
37
Origin Cache
• S4LRU improves Origin more than Edge
14%Infinite Cache
38
Geographic Coverage of EdgeSmall working set
39
Geographic Coverage of Edge
Atlanta
20% local
5% Dallas
35% D.C.
5% NYC
20% Miami
5% California
10% Chicago
• Atlanta has 80% requests served by remote Edges. Not uncommon!
40
Geographic Coverage of Edge
Amplified working set
41
Collaborative Edge
42
Collaborative Edge
• “Collaborative” Edge increases hit ratio by 18%
18%
Collaborative
What Facebook Could Do:• Improve cache algorithm (+invest in cache algo
research)• Coordinate Edge caches• Let some phones resize their own photos• Use more machine learning at this layer!
Backend storage for blobs• Some requests are bound to miss the caches.• Reads >> writes >> deletes.• Writes often come in batches (Photo Albums)• In this regime, Facebook found default solutions not to
work.