1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of 1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and...
1
Probabilistic Models for Web Caching
David Starobinski, David Tse
UC Berkeley
Conference and Workshop on Stochastic NetworksMadison, Wisconsin, June 2000
2
Overview
• Web Caching Goals• Caching Levels• Classical caching algorithms and the
Independent Reference (IR) model • Web caching issues• New algorithms and analysis for Web
caches • Discussion
3
Web Caching GoalsReduce response latencyReduce bandwidth consumptionReduce server load
Exploit the locality of reference
4
Web Caching Levels
Internet Internet
Clients
Server
Browsercache
Proxycache
Reverseproxy
5
Caching: Performance
• Cache buffers have finite capacity
• Goal: Maximize the proportion of requests served by the cache (hit ratio)
• Need to devise algorithms that keep the “hot” documents in the cache
6
Caching Algorithms
• LRU
• FIFO
• CLIMB (Transpose)
7
LRU (Least Recently Used)
1234
5
The buffer is arranged as a stack
5
8
LRU (ii)
123
4
5
9
LRU (iii)
1234
5
3
10
LRU (iv)
124
5
3
11
CLIMB (Transpose)
1234
5
12
CLIMB (ii)
1324
5
13
Analysis: The IR model
• N: total number of pages• pi: the probability that page i (i = 1,2,…,N)
is requested• Independent of previous requests• Remarks:
– Model mostly justified for proxy caches– Studies show that web page popularity follow a
Zipf law
14
Cache algorithms
• K: Capacity storage of the cache (in pages)
• Ideally, place the K pages with the greatest value of pi into the cache
• Problem: the values pi are unknown a priori
15
LRU, FIFO, CLIMB analysis
• Under the IR model, the cache dynamics can be described by a Markov chain
• Each state {I1, I2,…, IK} represents the identity (URL) and ordering of the pages within the cache
16
LRU – Stationary Probabilities
K
iN
ij j
i
p
pK
1
,,2,1
• Allows to compute hit ratio• Similar results for FIFO and CLIMB
17
Analysis - Summary
• Best hit ratio for CLIMB followed by LRU followed by FIFO
• Convergence rate much faster for LRU and FIFO than CLIMB
• Some mathematical issues still open
18
New Issues
• Non-uniform page size
• Non-uniform access costs– Nearby vs. distant servers– Underloaded vs. overloaded servers
• Page updates
19
The Extended IR model (Size)
• Same assumptions as in the IR model +
• The size of page i is si
• The cache size is K
20
Off-Line Problem
2)
possible as large as is 1)
such that },...,2,1{subset a Find
Ks
p
NI
Iii
Iii
Knapsack Problem!
21
Heuristics
• Place documents in the cache with the greatest pi/si values
• Perform, at most, twice worse than the optimal solution (except for extreme cases)
• Goal: Devise new on-line algorithms that learn to order documents according to pi/si
values
22
Size-LRU algorithm
• Set smin = min{s1,s2,…,sN }
• A randomized algorithm
• When page i is requested then– Act like LRU with probability smin /si
– Otherwise, do not change the cache ordering
23
Result
• IR model• LRU
• pi
• Extended IR model• Size-LRU
• pi/si
Size-LRU is dual to LRU
24
Example: Size-LRU Stationary Probabilities
N
iN
ij jj
ii
sp
spN
1
,,2,1
25
Numerical Example
• N=100 documents
• Page popularity
• Heavy-tailed document size
8.0
1
ipi
xxs
1
1)Pr(
26
Numerical Example
27
Summary
• New issues in Web caching
• Size-LRU algorithm
• Dual to LRU
• Extensions for cost issue
• On-going research
The End