Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev...

63
Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo Caching

Transcript of Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev...

Page 1: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook),

Sanjeev Kumar, Harry C. Li (Facebook)

An Analysis ofFacebook Photo Caching

Page 2: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

2

250 Billion* Photos on Facebook

Profile

Feed

Album

* Internet.org, Sept., 2013

StorageBackend

CacheLayers

Full-stackStudy

Page 3: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

3

Preview of Results

Current Stack Performance

Opportunities for Improvement

• Browser cache is important (reduces 65+% request )

• Photo popularity distribution shifts across layers

• Smarter algorithms can do much better (S4LRU)

• Collaborative geo-distributed cache worth trying

Page 4: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

4

Client

Facebook Photo-Serving Stack

Page 5: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

5

Client-based Browser CacheClient

Browser

Cache

Client

LocalFetch

Page 6: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

6

Client-based Browser CacheClient

Browser

Cache(Millions)

Client

Page 7: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

7

Stack Choice

Browser

Cache

Client

CachesStorage

Facebook Stack

Akamai

Content Distribution Network(CDN)

• Focus: Facebook stack

(Millions)

Page 8: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

8

Geo-distributed Edge Cache (FIFO)

Edge Cache

(Tens)

Browser

Cache

Client

PoP

(Millions)

Page 9: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

9

Geo-distributed Edge Cache (FIFO)

Edge Cache

(Tens)

Browser

Cache

Client

PoP

(Millions)

Purpose

1. Reduce cross-country latency

2. Reduce Data Center bandwidth

Page 10: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

10

Geo-distributed Edge Cache (FIFO)

Edge Cache

(Tens)

Browser

Cache

Client

PoP

(Millions)

Page 11: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

11

Geo-distributed Edge Cache (FIFO)

Edge Cache

(Tens)

Browser

Cache

Client

PoP

(Millions)

Page 12: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

12

Single Global Origin Cache (FIFO)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Tens)(Millions) (Four)

Page 13: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

13

Single Global Origin Cache (FIFO)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Tens)(Millions) (Four)

Purpose

1. Minimize I/O-bound operations

Page 14: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

14

Single Global Origin Cache (FIFO)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Tens)(Millions) (Four)

Hash(url)

Page 15: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

15

Single Global Origin Cache (FIFO)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Tens)(Millions) (Four)

Page 16: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

16

Haystack Backend

Backend (Haystack

)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Tens)(Millions) (Four)

Page 17: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

17

How did we collect the trace?

Page 18: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

18

Trace Collection

Instrumentation Scope

Backend (Haystack

)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

(Object-based sampling)

• Request-based: collect X% of requests• Object-based: collect reqs for X% objects

Page 19: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

19

Sampling on Power-law

Object rank

Page 20: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

20

Sampling on Power-law

• Req-based: bias on popular content, inflate cache perf

Object rank

Req-based

Page 21: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

21

Sampling on Power-law

Object-based

Object rank

• Object-based: fair coverage of unpopular content

Page 22: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

22

Sampling on Power-law

• Object-based: fair coverage of unpopular content

Object rank

Object-based

Page 23: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

23

Trace Collection

77.2M reqs

(Desktop)

12.3MBrowser

s

Instrumentation Scope1.4M photos, all reqs for each

Backend (Haystack

)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

Resizer

R

2.6M photo objects, all reqs for each

12.3KServers

Page 24: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

24

Analysis

• Traffic sheltering effects of caches

• Photo popularity distribution

• Size, algorithm, collaborative Edge

• In paper– Stack performance as a function of photo age– Stack performance as a function of social

connectivity– Geographical traffic flow

Page 25: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

25

Traffic Sheltering

77.2M

26.6M11.2M

7.6M

Backend (Haystack

)

Browser

Cache

Edge Cache

OriginCache

PoPClient

Data Center

65.5%58.0%

31.8%

R

Traffic Share

65.5% 20.0% 4.6% 9.9%

Page 26: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

26

Photo popularity and its cache impact

Page 27: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

27

Popularity Distribution

• Browser resembles a power-law distribution

2%

Page 28: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

28

Popularity Distribution

• “Viral” photos becomes the head for Edge

Page 29: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

29

Popularity Distribution

• Skewness is reduced after layers of cache

Page 30: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

30

Popularity Distribution

• Backend resembles a stretched exponential dist.

Page 31: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

31

Popularity with Absolute Traffic

• Storage/cache designers: pick a layer

Page 32: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

32

Popularity Impact on Caches

High Low M

Lowest

Each has 25% requests

Page 33: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

33

Popularity Impact on Caches

• Browser traffic share decreases gradually

Page 34: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

34

Popularity Impact on Caches

• Edge serves consistent share except for the tail

7.8%

22~23%

Page 35: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

35

Popularity Impact on Caches

• Origin contributes most for “low” group

9.3%

Page 36: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

36

Popularity Impact on Caches

• Backend serves the tail

70% Haystack

Page 37: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

37

Can we make the cache better?

Page 38: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

38

Simulation

• Replay the trace (25% warm up)

• Estimate the base cache size

• Evaluate two hit-ratios (object-wise, byte-wise)

Page 39: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

39

Edge Cache with Different Sizes

• Picked San Jose edge (high traffic, median hit ratio)

59%

Page 40: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

40

Edge Cache with Different Sizes

• “x” estimates current deployment size (59% hit ratio)

65%68%

59%

Page 41: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

41

Edge Cache with Different Sizes

• “Infinite” size ratio needs 45x of current capacity

Infinite Cache

65%68%

59%

Page 42: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

42

Edge Cache with Different Algos

• Both LRU and LFU outperforms FIFO slightly

Infinite Cache

Page 43: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

43

S4LRU

Cache Space

More Recent

L3

L2

L1

L0

Page 44: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

44

S4LRU

Cache Space

L3

L2

L1

L0Missed Object

More Recent

Page 45: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

45

S4LRU

Cache Space

L3

L2

L1

L0

HitMore Recent

Page 46: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

46

S4LRU

Cache Space

L3

L2

L1

L0

Evict

More Recent

Page 47: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

47

Edge Cache with Different Algos

• S4LRU improves the most

68%

1/3x

Infinite Cache

59%

Page 48: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

48

Edge Cache with Different Algos

• Clairvoyant (Bélády) shows much improvement space

Infinite Cache

Page 49: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

49

Origin Cache

• S4LRU improves Origin more than Edge

14%

Infinite Cache

Page 50: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

50

Which Photo to Cache

• Recency & frequency leads S4LRU

• Does age, social factors also play a role?

Page 51: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

51

Collaborative cache on the Edge

Page 52: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

52

Geographic Coverage of Edge

Small working set

Page 53: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

53

Geographic Coverage of Edge

9 Edges with high-volume traffic

Page 54: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

54

Geographic Coverage of Edge

Do clients stay with local Edge?

Page 55: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

55

Geographic Coverage of Edge

Atlanta

Page 56: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

56

Geographic Coverage of Edge

Atlanta

20% local

5% Dallas

35% D.C.

5% NYC

20% Miami

5% California

10% Chicago

• Atlanta has 80% requests served by remote Edges

Page 57: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

57

Geographic Coverage of Edge

Atlanta

20% local

Miami

35% localDall

as50% local

Chicago

60% local

LA 18% local

NYC 35% local

• Substantial remote traffic is normal

Page 58: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

58

Geographic Coverage of Edge

Amplified working set

Page 59: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

59

Collaborative Edge

Page 60: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

60

Collaborative Edge

• “Independent” aggregates all high-volume Edges

Page 61: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

61

Collaborative Edge

• “Collaborative” Edge increases hit ratio by 18%

18%

Collaborative

Page 62: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

62

Related WorkStorage Analysis

Content Distribution Analysis

Web Analysis

BSD file system (SOSP ’85), Sprite ( SOSP ’91), NT (SOSP ’99),NetApp (SOSP ’11), iBench (SOSP ’11)

Cooperative caching (SOSP ’99), CDN vs. P2P (OSDI ’02),P2P (SOSP ’03), CoralCDN (NSDI ’10), Flash crowds (IMC ’11)

Zipfian (INFOCOM ’00), Flash crowds (WWW ’02),Modern web traffic (IMC ’11)

Page 63: Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.

63

Conclusion

• Quantify caching performance

• Quantify popularity changes across layers of caches

• Recency, frequency, age, social factors impact cache

• Outline potential gain of collaborative caching