On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman,...
-
Upload
rosa-bridges -
Category
Documents
-
view
213 -
download
0
Transcript of On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman,...
On the Scale and Performance of Cooperative Web Proxy Caching
University of Washington
Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, Henry Levy
2
Caching for a Better WebCaching for a Better Web
Performance is a major concern in the Web Proxy caching is the most commonly used method to
improve Web performance Duplicate requests to the same document served from cache Hits reduce latency, network utilization, server load Misses increase latency (extra hops)
Clients Proxy Cache Servers
Hits
Misses Misses
Internet
3
Cache EffectivenessCache Effectiveness
Previous work has shown that hit rate increases with population size [ Duska et al. 97, Breslau et al. 98 ]
A single proxy cache has practical limits Load, network topology, organizational constraints
One technique to scale the client population is to have proxy caches cooperate
4
Cooperative Web Proxy CachingCooperative Web Proxy Caching
Sharing and/or coordination of cache state among multiple Web proxy cache nodes
Effectiveness of proxy cooperation depends on: Inter-proxy communication distance Size of client population served
Proxy utilization and load balance
Clients
ClientsProxy
Clients
Internet
5
Cooperative Web CachingCooperative Web Caching
How much benefit does cooperative caching provide in the Web environment?
6
OutlineOutline
Introduction & related work Trace methodology Cooperative caching for small & medium scale populations Scaling to larger client populations Latency Conclusions
7
Previous ResearchPrevious Research
Cooperative proxy caching is a popular research topic:[e.g. Chankhunthod et al. 96, Zhang et al. 97, Fan et al. 98, Krishnan et al. 98,
Menaud et al. 98, Tewari et al. 98, Touch 98, Karger et al. 99 ...]
Focus is on highly scalable algorithms Some seek to scale to the entire Web
8
ChallengesChallenges
No real understanding of document sharing across diverse organizations
Little analytic or empirical evaluation of these algorithms using realistic workloads for large-scale client populations
Problem: Evaluating cooperative proxy caching requires multiple
simultaneous traces of Web proxies, across a diverse set of organizations
9
Our ContributionOur Contribution
We use multi-organization traces to evaluate cooperative proxy caching at small and medium scales
We use analytic modelling to evaluate cooperative caching at scales beyond those available in our traces
10
A Multi-Organization TraceA Multi-Organization Trace
University of Washington (UW) is a large and diverse client population Approximately 50K people
UW client population contains 200 independent campus organizations
Museums of Art and Natural History Schools of Medicine, Dentistry, Nursing Departments of Computer Science, History, and Music
A trace of UW is effectively a simultaneous trace of 200 diverse client organizations
11
Cooperation Across OrganizationsCooperation Across Organizations
By considering each UW organization as an independent “company” with its own clients and its own proxy, we can empirically evaluate cooperative caching across diverse client populations
How much Web document reuse is there between these organizations? Place a proxy cache in front of each organization. What is the
benefit of cooperative caching among these 200 proxies?
12
UW Trace CharacteristicsUW Trace Characteristics
Trace collected at UW network border (May 1999) Filtered: requests from UW clients, responses from external Web servers Most of requests come directly from clients (0.5% come from proxies)
Trace UW
Duration 7 daysHTTP objects 18.4 millionHTTP requests 82.8 millionAvg. requests/sec 137Total Bytes 677 GBServers 244,211Clients 22,984
13
QuestionQuestion
What is the benefit of cooperative caching among the 200 UW organizational proxies?
14
Ideal Hit Rates for UW proxiesIdeal Hit Rates for UW proxies
Ideal hit rate - infinite storage, ignore cacheability, expirations
Average ideal localhit rate: 43%
15
Ideal Hit Rates for UW proxiesIdeal Hit Rates for UW proxies
Ideal hit rate - infinite storage, ignore cacheability, expirations
Average ideal localhit rate: 43%
Explore benefits of perfect cooperation rather than a particular algorithm
Average ideal hit rate increases from 43% to 69% with cooperative caching
16
Cacheable Hit Rates forUW proxies
Cacheable Hit Rates forUW proxies
Cacheable hit rate - same as ideal, but doesn’t ignore cacheability
Cacheable hit rates are much lower than ideal (average is 20%)
Average cacheable hit rate increases from 20% to 41% with (perfect) cooperative caching
17
Scaling Cooperative CachingScaling Cooperative Caching
Organizations of this size can benefit significantly from cooperative caching
We don’t need cooperative caching to handle the entire UW population size A single proxy (or small cluster) can handle this entire population! No technical reason to use cooperative caching for this
environment In the real world, decisions of proxy placement are often political or
geographical
How effective is cooperative caching at scales where a single cache will not work?
18
Hit Rate vs. Client PopulationHit Rate vs. Client Population
Curves similar to other studies [e.g., Duska97, Breslau98]
Small organizations Significant increase in hit rate
as client population increases The reason why cooperative
caching is effective for UW Large organizations
Marginal increase in hit rate as client population increases
19
Extrapolation to Larger Client Populations
Extrapolation to Larger Client Populations
Use least squares fit to create a linear extrapolation of hit rates
Hit rate increases logarithmically with client population, e.g., to increase hit rate by 10%:
Need 8 UWs (ideal) Need 11 UWs (cacheable)
“Low ceiling”, though: 100% at 11.3M clients (UW ideal) 61% at 2.1M clients (UW cacheable)
A city-wide cooperative cache would get all the benefit
21
UW & Microsoft CooperationUW & Microsoft Cooperation
What if we ran a wire across Lake Washington, to connect UW & Microsoft?
We collected a Microsoft proxy trace during same time period as the UW trace Combined population is ~80K clients Increases the UW population by a factor of 3.6 Increases the MS population by a factor of 1.4
22
UW & Microsoft TracesUW & Microsoft Traces
Trace UW MS
Duration 7 days 6.25 days
HTTP objects 18.4 million 15.3 million
HTTP requests 82.8 million 107.7 million
Avg. requests/sec 137 199
Total Bytes 677 GB N/A
Servers 244,211 360,586
Clients 22,984 60,233
Population ~50,000 ~40,000
24
What about Latency?What about Latency?
From the client’s perspective, latency matters far more than hitrate
How does latency change with population?
Median latencies improve only a few 100 ms with ideal caching compared to no caching.
On average, a web page consists of 4.5 HTTP objects
25
ConclusionsConclusions
A negative result: without significant workload changes, designing highly-scalable cooperative proxy-cache schemes is unnecessary Largest benefit is achieved with small populations (up to 2K-5K clients) Limited benefit of cooperation when we combined the UW & Microsoft
populations Document cacheability is a severe limitation with current workloads
Analytic model results: Confirm that most of benefit is obtained once you reach populations the
size of a large city Future workloads: large-scale cooperative caching could become more
relevant with different rate-of-change characteristics