An Analysis of Facebook Photo Caching
-
Upload
logan-lott -
Category
Documents
-
view
34 -
download
0
description
Transcript of An Analysis of Facebook Photo Caching
by Huang et al., SOSP 2013
An Analysis ofFacebook Photo Caching
Presented by Phuong Nguyen
Some animations and figures are borrowed from the original paper and presentation
Photos on Facebook: Overview
Profile
Feed
Album
2
250 billion photos, as of Sep 2013
Photos on Facebook: Overview
3
StorageBackend
FBCacheLayers Full-stack
Study
AkamaiCDN
FACEBOOK PHOTO CACHING: HOW IT WORKS?
4
Client-based Browser CacheClient
Browser Cache
Client
5
LocalFetch
Geo-distributed Edge Cache (FIFO)
Edge Cache
(Tens)
Browser Cache
Client PoP
(Millions)
6
Single Global Origin Cache (FIFO)
Browser Cache
Edge Cache
OriginCache
PoPClient Data Center
(Tens)(Millions) (Four)
7
Hash(url)
Haystack Backend
Backend (Haystack)
Browser Cache
Edge Cache
OriginCache
PoPClient Data Center
(Tens)(Millions) (Four)
8
FULL-STACK CACHE STUDY: DATA COLLECTION
9
• Objective: collecting a representative sample that could permits correlation of events related to the same request
Trace Collection
Instrumentation Scope
Backend (Haystack)
Browser Cache
Edge Cache
OriginCache
PoPClient Data Center
10
Sampling Strategies
• Request-based: sampling requests randomly• Bias on popular content
• Objected-based: focused on some subset of photos selected by a deterministic test on photoId• Fair coverage of unpopular photos• Cross stack analysis
11
WORKLOAD ANALYSIS
12
Analysis Objectives
• Traffic sheltering effects of caches
• Photo popularity distribution
• Geographic traffic distribution & collaborative caching
• Can we make the cache better?
• Impact of sizes & algorithm
• Could we know which photos to cache?
13
ANALYSIS:TRAFFIC SHELTERING
14
Traffic Sheltering
77.2M
26.6M11.2M
7.6M
Backend (Haystack)
Browser Cache
Edge Cache
OriginCache
PoPClient Data Center
65.5%58.0%
31.8%
R
Traffic Share
65.5% 20.0% 4.6% 9.9%
15
ANALYSIS:PHOTO POPULARITY IMPACT
16
Popularity Distribution
Skewness is reduced after layers of cache17
Popularity Impact on Caches
18
ANALYSIS:GEOGRAPHIC TRAFFIC DISTRIBUTION & COLLABORATIVE CACHING
19
Substantial Remote Traffic at Edge
20
Atlanta 20% local
Miami 35% localDallas 50% local
Chicago 60% local
LA 18% local
NYC 35% local
Substantial Remote Traffic at Edge
21
Atlanta 20% local
5% Dallas
35% D.C.
5% NYC
20% Miami
5% California
10% Chicago
• Atlanta has 80% requests served by remote Edges
Collaborative Edge
22
Impact of Using Collaborative Edge
Collaborative Edge increases hit ratio by 18%
18%
23
Collaborative
ANALYSIS:IMPACTS OF CACHE SIZE & ALGORITHM
24
Potential Improvement Study
• Methodology: cache simulation• Replay the trace (25% warm up)• Evaluate using remaining 75%
• Improvement factors:• Cache size• Caching algorithm
• Evaluation metric: hit ratio
25
Edge Cache with Different Sizes & Algorithms
Infinite Cache
26
The same hit ratio can be achieved with a smaller cache and higher-performing algorithms
Edge Cache with Different Sizes & Algorithms
Infinite Cache
27
Sophisticated algorithm can achieve better hit ratio with the same cache size
ANALYSIS:WHICH PHOTOS TO CACHE?
28
Intuitions
• Properties that intuitively associated with photo traffic: • The age of photos • The number of Facebook followers
associated with the owner
29
Content Age Affect
• Age-based cache replacement algorithm could be effective
• Fresh content is popular and tends to be effectively cached throughout the hierarchy
30
Social Affect
• The more popular photo owner is, the more likely the photo is to be accessed
• Browser caches tend to have lower hit ratios for popular users (“viral” effect)
31
DISCUSSIONS
32
Discussions
33
• Evaluation method:• Only consider desktop clients, excluding mobile
clients• Trends by mobility of users
• Sampling: object-based sampling might not represent realistic workload
• Impact of caching done by Akamai CDN• Correlating requests method is not perfect
• Latency issue• Evaluation mainly focuses on hit ratio & traffic
sheltering, not latency• Latency of collaborative caching is note evaluated
Discussions (cont.)
34
• Other potential improvements:• Improved caching algorithm taking into account
metadata of photos• Optimal placement of resizing functionality along
the stack• The use of Clairvoyant caching might be possible
based on predicting future accesses• E.g., photos from the same album, photos
appear on news feed, etc.• Solve geographical diversity by improving routing
policy (e.g., put more weight into locality aspect)
THANK YOU!
35