1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...

27
1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June 2000
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...

Page 1: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

1

Probabilistic Models for Web Caching

David Starobinski, David Tse

UC Berkeley

Conference and Workshop on Stochastic NetworksMadison, Wisconsin, June 2000

Page 2: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

2

Overview

• Web Caching Goals• Caching Levels• Classical caching algorithms and the

Independent Reference (IR) model • Web caching issues• New algorithms and analysis for Web

caches • Discussion

Page 3: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

3

Web Caching GoalsReduce response latencyReduce bandwidth consumptionReduce server load

Exploit the locality of reference

Page 4: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

4

Web Caching Levels

Internet Internet

Clients

Server

Browsercache

Proxycache

Reverseproxy

Page 5: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

5

Caching: Performance

• Cache buffers have finite capacity

• Goal: Maximize the proportion of requests served by the cache (hit ratio)

• Need to devise algorithms that keep the “hot” documents in the cache

Page 6: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

6

Caching Algorithms

• LRU

• FIFO

• CLIMB (Transpose)

Page 7: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

7

LRU (Least Recently Used)

1234

5

The buffer is arranged as a stack

5

Page 8: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

8

LRU (ii)

123

4

5

Page 9: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

9

LRU (iii)

1234

5

3

Page 10: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

10

LRU (iv)

124

5

3

Page 11: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

11

CLIMB (Transpose)

1234

5

Page 12: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

12

CLIMB (ii)

1324

5

Page 13: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

13

Analysis: The IR model

• N: total number of pages• pi: the probability that page i (i = 1,2,…,N)

is requested• Independent of previous requests• Remarks:

– Model mostly justified for proxy caches– Studies show that web page popularity follow a

Zipf law

Page 14: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

14

Cache algorithms

• K: Capacity storage of the cache (in pages)

• Ideally, place the K pages with the greatest value of pi into the cache

• Problem: the values pi are unknown a priori

Page 15: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

15

LRU, FIFO, CLIMB analysis

• Under the IR model, the cache dynamics can be described by a Markov chain

• Each state {I1, I2,…, IK} represents the identity (URL) and ordering of the pages within the cache

Page 16: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

16

LRU – Stationary Probabilities

K

iN

ij j

i

p

pK

1

,,2,1

• Allows to compute hit ratio• Similar results for FIFO and CLIMB

Page 17: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

17

Analysis - Summary

• Best hit ratio for CLIMB followed by LRU followed by FIFO

• Convergence rate much faster for LRU and FIFO than CLIMB

• Some mathematical issues still open

Page 18: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

18

New Issues

• Non-uniform page size

• Non-uniform access costs– Nearby vs. distant servers– Underloaded vs. overloaded servers

• Page updates

Page 19: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

19

The Extended IR model (Size)

• Same assumptions as in the IR model +

• The size of page i is si

• The cache size is K

Page 20: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

20

Off-Line Problem

2)

possible as large as is 1)

such that },...,2,1{subset a Find

Ks

p

NI

Iii

Iii

Knapsack Problem!

Page 21: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

21

Heuristics

• Place documents in the cache with the greatest pi/si values

• Perform, at most, twice worse than the optimal solution (except for extreme cases)

• Goal: Devise new on-line algorithms that learn to order documents according to pi/si

values

Page 22: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

22

Size-LRU algorithm

• Set smin = min{s1,s2,…,sN }

• A randomized algorithm

• When page i is requested then– Act like LRU with probability smin /si

– Otherwise, do not change the cache ordering

Page 23: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

23

Result

• IR model• LRU

• pi

• Extended IR model• Size-LRU

• pi/si

Size-LRU is dual to LRU

Page 24: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

24

Example: Size-LRU Stationary Probabilities

N

iN

ij jj

ii

sp

spN

1

,,2,1

Page 25: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

25

Numerical Example

• N=100 documents

• Page popularity

• Heavy-tailed document size

8.0

1

ipi

xxs

1

1)Pr(

Page 26: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

26

Numerical Example

Page 27: 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

27

Summary

• New issues in Web caching

• Size-LRU algorithm

• Dual to LRU

• Extensions for cost issue

• On-going research

The End