Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
2
Transcript of Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Web Caching Schemes 2
AgendaThe World Wide WebProblem and solution (caching)Proxy serversAdvantages of web cachingDisadvantages of web cachingElements of A WWW caching systemDesirable properties of WWW caching systemProblems in designing caching systems for the WWWCaching architecture
Web Caching Schemes 3
The World Wide Web
The WWW can be considered as a large distributed information system.
Exponential growth in size.
On may 1999 included 600 millions of static web pages.
Increases 15% per month.
Very popular.
Web Caching Schemes 4
0
100
200
300
400
500
600
Jun-97 Nov-97 Mar-98 May-99
SIZE OF DISTINCT STATIC WEB PAGES
Web Caching Schemes 5
The World Wide Web
Usage is relatively inexpensive
Accessing information is very fast
Documents appeal to a wide range of interests
But…..
Web Caching Schemes 7
Problem
Internet backbone capacity increases 60% per year.
Bandwidth is not growing fast enough.
Without solution WWW will become too congested and its entire appeal would be lost.
Web Caching Schemes 9
proxy servers
HTTP servers handled by companies
for security reasons.
The bottleneck of the connection
between the client and the internet.
Shared by all clients inside the firewall.
Web Caching Schemes 10
Clients within the firewall
Proxy on firewall machine
HTTP server
HTTP server
HTTP server
Web Caching Schemes 11
proxy servers
Belonging to same organization, clients
share common interests.
They probably access the same set of
documents.
Web Caching Schemes 12
thus
On the proxy server, a previously
requested and cached documents
would likely result in future hits.
Web Caching Schemes 13
proxy severs
Caching most popular web pages on
the proxy server can:
Save network bandwidth
Lower access latency for the client
Web Caching Schemes 14
Advantages of web caching Reduces bandwidth consumption
Decreases network traffic
Lessens network congestion
Access latency:
frequently used docs are cached nearby
less traffic shorter delay for docs not cached
Web Caching Schemes 15
Advantages of web caching (cont.)
Reduces workload of remote server
Data can be accessed when remote server is down (enhanced robustness).
Allows analysis of organization usage patterns
cooperation between caches increases efficiency.
Web Caching Schemes 16
Disadvantages of web caching
Data not updated automatically
Cache miss can cause increase in latency (extra proxy processing).
Bottleneck effect – limit # of clients per proxy.
A single proxy is a single point of failure
Information providers can not monitor # of visits per site.
Web Caching Schemes 17
Elements of A WWW caching system
Documents can be cached at the clients, the proxies and the servers.
Web server Web server
Proxy server cooperation
clients
clients
clients
Web Caching Schemes 18
Elements of a WWW caching system
request
client
Does proxy have requested page yes
no
Does cooperative proxies have web page
yes no
Find web page on server
Web Caching Schemes 19
Desirable properties of WWW caching system
fast access
robustness
transparency
scalability
efficiency
adaptivity
stability
load balance
ability to deal with
heterogeneity
simplicity
Web Caching Schemes 20
Fast access
Reduce web access
latency to a
minimum.
Especially
comparing to other
servers not using
caching techniques.
Web Caching Schemes 21
Robustness
Robustness = Availability to user
eliminate single point failure
in case of failure – fall down gracefully
easy to recover from failure
Web Caching Schemes 22
Transparency
Transparent to
the user
The user should
only notice:
Faster response
Higher availability
Web Caching Schemes 23
Scalability
Scale well along the increasing size and density of the network.
All protocols should be as lightweight as possible.
Web Caching Schemes 24
Efficiency
impose minimal additional burden on the network (in control & data packets)
do not adopt any scheme which leads to under-utilization of the network
Web Caching Schemes 25
Adaptivity
adapt to dynamic
changing in the user
demand and network
environment
achieve optimal
performance
Web Caching Schemes 27
Load balancing
distribute load
evenly through
the entire
network
no bottlenecks /
hot-spots
Web Caching Schemes 28
Ability to deal with heterogeneity
Adapt to a range
of network
architecture
(hardware &
software)
Web Caching Schemes 29
Simplicity
Mechanism simple to deploy
simpler schemes are easier to implement and likely to be accepted as international standards
Web Caching Schemes 31
Problems in designing caching systems for the
WWW
Caching system architecturehow cache proxies
are organized – hierarchically, distributed or hybrid.
Web Caching Schemes 32
Problems in designing caching systems for the
WWW
Proxy placement were to place a
cache proxy in order to optimize performance
Web Caching Schemes 33
Problems in designing caching systems for the
WWW
Caching contentsWhat can be
cached in the
caching system
Web Caching Schemes 34
Problems in designing caching systems for the
WWWProxy
cooperationHow do proxies
cooperate with
each other
Web Caching Schemes 35
Problems in designing caching systems for the
WWW
Data sharingwhat kind of
data/information can be shared among among cooperative proxies
Web Caching Schemes 36
Problems in designing caching systems for the
WWW
Cache resolution/routinghow does a proxy
decide where to fetch a page requested by a client.
Web Caching Schemes 37
Problems in designing caching systems for the
WWW
PrefetchingHow does a proxy
decide what and when to prefetch from webservers or other proxies to reduce access latency.
Web Caching Schemes 38
Problems in designing caching systems for the
WWW
Cache placement/ replacementhow the proxy decides
which page to be stored in its cache and which page to be removed from it.
Web Caching Schemes 39
Problems in designing caching systems for the
WWW
Cache coherencyhow does a proxy
maintain data consistency
Web Caching Schemes 40
Problems in designing caching systems for the
WWW
Control information distributionhow is the control
information (e.g URL) distributed among proxies.
Web Caching Schemes 41
Problems in designing caching systems for the
WWW
Dynamic data cachinghow to deal with
data that is not cachable
Web Caching Schemes 42
Caching architecture
HierarchicalCaches are placed at multiple levels of the
network.
national
regional
institutional
bottom
Web Caching Schemes 43
Hierarchicalarchitecture
Bottom – clients/browsers caches.
national
regional
institutional
bottom web page not found
web page not found
web page not found
Web Caching Schemes 44
Hierarchicalarchitecture
after web page is found
national
regional
institutional
bottom
forward page, leave copy
forward page, leave copy
forward page, leave copy
Web Caching Schemes 45
Hierarchical architecture
Advantages:Bandwidth efficient – especially when
cache servers are slow.
Allows to efficiently diffuse popular
web pages towards the demand.
Web Caching Schemes 46
Hierarchical architecture
DisadvantagesCache server needs to be placed at key
access points of the network requires coordination among caches.
Each level adds a delay.High levels are bottlenecks.multiple copies at different cache levels.
Web Caching Schemes 47
Distributed architecture
Caches at the bottom level only.
No other intermediate caching levels.
Each cache server contains meta-data on the data stored on other servers.
Hierarchy used only for distributing information about location of the copy.
No copying of actual documents.
Web Caching Schemes 48
Advantages:Traffic flows through low network
levels which are less congested.No additional disk space required for
intermediate network levels.Better load sharing.More fault tolerant.
Distributed architecture
Web Caching Schemes 49
Disadvantages:High connection timesHigher bandwidth usageAdministrative issues.
Distributed architecture
Web Caching Schemes 50
ExamplesICP – Internet Cache Protocol (Harvest group)
Retrieve data from neighboring caches +
parent caches
CARP – Cache Array Routing ProtocolURL space divided to an array of caches.
Each cache stores only documents whose
URL are hashed to it.
Distributed architecture
Web Caching Schemes 51
Hybrid architecture
Caches may cooperate with other caches at
the same level or at a higher level using
distributed caching.
ICP is an example:
the document is fetched from a parent/neighbor
cache that has the lowest RTT.
Web Caching Schemes 52
Performance of architectures
Hierarchical caching has shorter connection
times than distributed caching.
Additional copies at intermediate level reduces
retrieval latency for small documents.
Distributed caching has shorter transmission
times & higher bandwidth usage.
“Well configured” hybrid scheme can reduce
both connection time and transmission time.