1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...

1

Probabilistic Models for Web Caching

David Starobinski, David Tse

UC Berkeley

Conference and Workshop on Stochastic NetworksMadison, Wisconsin, June 2000

2

Overview

• Web Caching Goals• Caching Levels• Classical caching algorithms and the

Independent Reference (IR) model • Web caching issues• New algorithms and analysis for Web

caches • Discussion

3

Web Caching GoalsReduce response latencyReduce bandwidth consumptionReduce server load

Exploit the locality of reference

4

Web Caching Levels

Internet Internet

Clients

Server

Browsercache

Proxycache

Reverseproxy

5

Caching: Performance

• Cache buffers have finite capacity

• Goal: Maximize the proportion of requests served by the cache (hit ratio)

• Need to devise algorithms that keep the “hot” documents in the cache

6

Caching Algorithms

• LRU

• FIFO

• CLIMB (Transpose)

7

LRU (Least Recently Used)

1234

5

The buffer is arranged as a stack

5

8

LRU (ii)

123

4

5

9

LRU (iii)

1234

5

3

10

LRU (iv)

124

5

3

11

CLIMB (Transpose)

1234

5

12

CLIMB (ii)

1324

5

13

Analysis: The IR model

• N: total number of pages• pi: the probability that page i (i = 1,2,…,N)

is requested• Independent of previous requests• Remarks:

– Model mostly justified for proxy caches– Studies show that web page popularity follow a

Zipf law

14

Cache algorithms

• K: Capacity storage of the cache (in pages)

• Ideally, place the K pages with the greatest value of pi into the cache

• Problem: the values pi are unknown a priori

15

LRU, FIFO, CLIMB analysis

• Under the IR model, the cache dynamics can be described by a Markov chain

• Each state {I1, I2,…, IK} represents the identity (URL) and ordering of the pages within the cache

16

LRU – Stationary Probabilities

K

iN

ij j

i

p

pK

1

,,2,1

• Allows to compute hit ratio• Similar results for FIFO and CLIMB

17

Analysis - Summary

• Best hit ratio for CLIMB followed by LRU followed by FIFO

• Convergence rate much faster for LRU and FIFO than CLIMB

• Some mathematical issues still open

18

New Issues

• Non-uniform page size

• Non-uniform access costs– Nearby vs. distant servers– Underloaded vs. overloaded servers

• Page updates

19

The Extended IR model (Size)

• Same assumptions as in the IR model +

• The size of page i is si

• The cache size is K

20

Off-Line Problem

2)

possible as large as is 1)

such that },...,2,1{subset a Find

Ks

p

NI

Iii

Iii

Knapsack Problem!

21

Heuristics

• Place documents in the cache with the greatest pi/si values

• Perform, at most, twice worse than the optimal solution (except for extreme cases)

• Goal: Devise new on-line algorithms that learn to order documents according to pi/si

values

22

Size-LRU algorithm

• Set smin = min{s1,s2,…,sN }

• A randomized algorithm

• When page i is requested then– Act like LRU with probability smin /si

– Otherwise, do not change the cache ordering

23

Result

• IR model• LRU

• pi

• Extended IR model• Size-LRU

• pi/si

Size-LRU is dual to LRU

24

Example: Size-LRU Stationary Probabilities

N

iN

ij jj

ii

sp

spN

1

,,2,1

25

Numerical Example

• N=100 documents

• Page popularity

• Heavy-tailed document size

8.0

1

ipi

xxs

1

1)Pr(

26

Numerical Example

27

Summary

• New issues in Web caching

• Size-LRU algorithm

• Dual to LRU

• Extensions for cost issue

• On-going research

The End

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...

Documents

Transcript of 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...