Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

52
Web Caching Schemes 1 A Survey of Web Caching Schemes for the Internet Jia Wang
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Web Caching Schemes 1

A Survey of Web Caching Schemes for the Internet

Jia Wang

Web Caching Schemes 2

AgendaThe World Wide WebProblem and solution (caching)Proxy serversAdvantages of web cachingDisadvantages of web cachingElements of A WWW caching systemDesirable properties of WWW caching systemProblems in designing caching systems for the WWWCaching architecture

Web Caching Schemes 3

The World Wide Web

The WWW can be considered as a large distributed information system.

Exponential growth in size.

On may 1999 included 600 millions of static web pages.

Increases 15% per month.

Very popular.

Web Caching Schemes 4

0

100

200

300

400

500

600

Jun-97 Nov-97 Mar-98 May-99

SIZE OF DISTINCT STATIC WEB PAGES

Web Caching Schemes 5

The World Wide Web

Usage is relatively inexpensive

Accessing information is very fast

Documents appeal to a wide range of interests

But…..

Web Caching Schemes 6

The World Wide Web

Network congestion

Server overloading

Web Caching Schemes 7

Problem

Internet backbone capacity increases 60% per year.

Bandwidth is not growing fast enough.

Without solution WWW will become too congested and its entire appeal would be lost.

Web Caching Schemes 8

Solution

Caching:

Placing popular objects at

locations close to the clients.

Web Caching Schemes 9

proxy servers

HTTP servers handled by companies

for security reasons.

The bottleneck of the connection

between the client and the internet.

Shared by all clients inside the firewall.

Web Caching Schemes 10

Clients within the firewall

Proxy on firewall machine

HTTP server

HTTP server

HTTP server

Web Caching Schemes 11

proxy servers

Belonging to same organization, clients

share common interests.

They probably access the same set of

documents.

Web Caching Schemes 12

thus

On the proxy server, a previously

requested and cached documents

would likely result in future hits.

Web Caching Schemes 13

proxy severs

Caching most popular web pages on

the proxy server can:

Save network bandwidth

Lower access latency for the client

Web Caching Schemes 14

Advantages of web caching Reduces bandwidth consumption

Decreases network traffic

Lessens network congestion

Access latency:

frequently used docs are cached nearby

less traffic shorter delay for docs not cached

Web Caching Schemes 15

Advantages of web caching (cont.)

Reduces workload of remote server

Data can be accessed when remote server is down (enhanced robustness).

Allows analysis of organization usage patterns

cooperation between caches increases efficiency.

Web Caching Schemes 16

Disadvantages of web caching

Data not updated automatically

Cache miss can cause increase in latency (extra proxy processing).

Bottleneck effect – limit # of clients per proxy.

A single proxy is a single point of failure

Information providers can not monitor # of visits per site.

Web Caching Schemes 17

Elements of A WWW caching system

Documents can be cached at the clients, the proxies and the servers.

Web server Web server

Proxy server cooperation

clients

clients

clients

Web Caching Schemes 18

Elements of a WWW caching system

request

client

Does proxy have requested page yes

no

Does cooperative proxies have web page

yes no

Find web page on server

Web Caching Schemes 19

Desirable properties of WWW caching system

fast access

robustness

transparency

scalability

efficiency

adaptivity

stability

load balance

ability to deal with

heterogeneity

simplicity

Web Caching Schemes 20

Fast access

Reduce web access

latency to a

minimum.

Especially

comparing to other

servers not using

caching techniques.

Web Caching Schemes 21

Robustness

Robustness = Availability to user

eliminate single point failure

in case of failure – fall down gracefully

easy to recover from failure

Web Caching Schemes 22

Transparency

Transparent to

the user

The user should

only notice:

Faster response

Higher availability

Web Caching Schemes 23

Scalability

Scale well along the increasing size and density of the network.

All protocols should be as lightweight as possible.

Web Caching Schemes 24

Efficiency

impose minimal additional burden on the network (in control & data packets)

do not adopt any scheme which leads to under-utilization of the network

Web Caching Schemes 25

Adaptivity

adapt to dynamic

changing in the user

demand and network

environment

achieve optimal

performance

Web Caching Schemes 26

Stability

Do not introduce

instabilities into

the network

Web Caching Schemes 27

Load balancing

distribute load

evenly through

the entire

network

no bottlenecks /

hot-spots

Web Caching Schemes 28

Ability to deal with heterogeneity

Adapt to a range

of network

architecture

(hardware &

software)

Web Caching Schemes 29

Simplicity

Mechanism simple to deploy

simpler schemes are easier to implement and likely to be accepted as international standards

Web Caching Schemes 30

What Problems do we face in designing caching systems for the WWW ???

Web Caching Schemes 31

Problems in designing caching systems for the

WWW

Caching system architecturehow cache proxies

are organized – hierarchically, distributed or hybrid.

Web Caching Schemes 32

Problems in designing caching systems for the

WWW

Proxy placement were to place a

cache proxy in order to optimize performance

Web Caching Schemes 33

Problems in designing caching systems for the

WWW

Caching contentsWhat can be

cached in the

caching system

Web Caching Schemes 34

Problems in designing caching systems for the

WWWProxy

cooperationHow do proxies

cooperate with

each other

Web Caching Schemes 35

Problems in designing caching systems for the

WWW

Data sharingwhat kind of

data/information can be shared among among cooperative proxies

Web Caching Schemes 36

Problems in designing caching systems for the

WWW

Cache resolution/routinghow does a proxy

decide where to fetch a page requested by a client.

Web Caching Schemes 37

Problems in designing caching systems for the

WWW

PrefetchingHow does a proxy

decide what and when to prefetch from webservers or other proxies to reduce access latency.

Web Caching Schemes 38

Problems in designing caching systems for the

WWW

Cache placement/ replacementhow the proxy decides

which page to be stored in its cache and which page to be removed from it.

Web Caching Schemes 39

Problems in designing caching systems for the

WWW

Cache coherencyhow does a proxy

maintain data consistency

Web Caching Schemes 40

Problems in designing caching systems for the

WWW

Control information distributionhow is the control

information (e.g URL) distributed among proxies.

Web Caching Schemes 41

Problems in designing caching systems for the

WWW

Dynamic data cachinghow to deal with

data that is not cachable

Web Caching Schemes 42

Caching architecture

HierarchicalCaches are placed at multiple levels of the

network.

national

regional

institutional

bottom

Web Caching Schemes 43

Hierarchicalarchitecture

Bottom – clients/browsers caches.

national

regional

institutional

bottom web page not found

web page not found

web page not found

Web Caching Schemes 44

Hierarchicalarchitecture

after web page is found

national

regional

institutional

bottom

forward page, leave copy

forward page, leave copy

forward page, leave copy

Web Caching Schemes 45

Hierarchical architecture

Advantages:Bandwidth efficient – especially when

cache servers are slow.

Allows to efficiently diffuse popular

web pages towards the demand.

Web Caching Schemes 46

Hierarchical architecture

DisadvantagesCache server needs to be placed at key

access points of the network requires coordination among caches.

Each level adds a delay.High levels are bottlenecks.multiple copies at different cache levels.

Web Caching Schemes 47

Distributed architecture

Caches at the bottom level only.

No other intermediate caching levels.

Each cache server contains meta-data on the data stored on other servers.

Hierarchy used only for distributing information about location of the copy.

No copying of actual documents.

Web Caching Schemes 48

Advantages:Traffic flows through low network

levels which are less congested.No additional disk space required for

intermediate network levels.Better load sharing.More fault tolerant.

Distributed architecture

Web Caching Schemes 49

Disadvantages:High connection timesHigher bandwidth usageAdministrative issues.

Distributed architecture

Web Caching Schemes 50

ExamplesICP – Internet Cache Protocol (Harvest group)

Retrieve data from neighboring caches +

parent caches

CARP – Cache Array Routing ProtocolURL space divided to an array of caches.

Each cache stores only documents whose

URL are hashed to it.

Distributed architecture

Web Caching Schemes 51

Hybrid architecture

Caches may cooperate with other caches at

the same level or at a higher level using

distributed caching.

ICP is an example:

the document is fetched from a parent/neighbor

cache that has the lowest RTT.

Web Caching Schemes 52

Performance of architectures

Hierarchical caching has shorter connection

times than distributed caching.

Additional copies at intermediate level reduces

retrieval latency for small documents.

Distributed caching has shorter transmission

times & higher bandwidth usage.

“Well configured” hybrid scheme can reduce

both connection time and transmission time.