ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, [email protected] ABSTRACT W...

9
ABSTRACT Keywords 1. INTRODUCTION

Transcript of ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, [email protected] ABSTRACT W...

Page 1: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

The Trickle-Down E�ect: Web Caching and

Server Request Distribution

Ronald P. Doyle�

Application Integration and Middleware

IBM Research Triangle Park

[email protected]

Je�rey S. ChaseSyam Gadde

Amin M. Vahdat y

Department of Computer Science

Duke University

chase, gadde, [email protected]

ABSTRACTWeb proxies and Content Delivery Networks (CDNs) arewidely used to accelerate Web content delivery and to con-serve Internet bandwidth. These caching agents are highlye�ective for static content, which is an important compo-nent of all Web-based services.

This paper explores the e�ect of ubiquitous Web cachingon the request patterns seen by other components of anend-to-end content delivery architecture, including Webserver clusters and interior caches. In particular, objectpopularity distributions in the Web tend to be Zipf-like,but caches disproportionately absorb requests for the mostpopular objects, changing the reference properties of the�ltered request stream in fundamental ways. We call thisthe trickle-down e�ect.

This paper uses trace-driven simulation and synthetic traf-�c patterns to illustrate the trickle-down e�ect and to in-vestigate its impact on other components of a content de-livery architecture, focusing on the implications for requestdistribution strategies in server clusters.

KeywordsZipf, Web Proxy Cache, URL Hashing, Load Balancing,Request Distribution

1. INTRODUCTIONAs tra�c on the Internet continues to grow, proxy cachesand Content Delivery Networks (CDNs) are increasinglyused to make Web service more e�cient. These cachingagents serve requests from content replication points closerto the clients, reducing origin server load, consumed net-work bandwidth, and client response latency.

Several factors contribute to increasing use of Web caching.A few years ago, Web caches were used only by clients whoexplicitly con�gured their Web browsers to use proxies.Today, access ISPs commonly reduce their bandwidth costsby automatically redirecting outgoing Web request tra�cthrough interception proxies. At the same time, hosting

�R. Doyle is also a PhD student in the Computer Sciencedepartment at Duke University.yThis work is supported by the U.S. National Science Foun-dation through grants CCR-00-82912 and EIA-9972879.Gadde is supported by a U.S. Department of EducationGAANN fellowship. Vahdat is supported by NSF CA-REER award CCR-9984328.

providers often deploy surrogate caches to �lter incomingrequest tra�c for data center customers. Many Web sitesalso contract with third-party CDNs to serve their requesttra�c from caching sites at various points in the network.

Ubiquitous caching has the potential to reshape the Webtra�c patterns observed by the other components of a con-tent delivery system, including Web server sites and inte-rior caches that receive requests \downstream" from for-ward caches. Upstream caches �lter the request stream andshift the downstream loads from popular static objects toless popular objects and dynamic (uncacheable) content.We refer to this as the trickle-down e�ect. It is importantto understand this e�ect because assumptions about traf-�c patterns drive many of the local design choices in Webdelivery architectures.

This paper explores the e�ects of Web caches on the prop-erties of the request tra�c behind them. We use trace-driven simulation and synthetic tra�c patterns to illus-trate the trickle-down e�ect and to investigate its implica-tions for downstream components of an end-to-end contentdelivery architecture. We focus on Web requests that fol-low a Zipf-like object popularity distribution [18], whichmany studies have shown to closely model Web requesttra�c [11, 8, 5, 17]. We then illustrate the importanceof this e�ect by examining its impact on load distributionstrategies and content cache e�ectiveness for downstreamservers. The contribution of this paper is to demonstrateand quantify the key implications of the trickle-down e�ecton downstream servers at a given server request load:

� Forward caching disproportionately absorbs requestsfor the most popular objects. The miss stream con-sists of evenly distributed references to moderatelypopular objects and recently updated or expired ob-jects, together with the heavy tail of the popularitydistribution.

� By absorbing much of the locality in the request stream,upstream caches reduce the e�ectiveness of down-stream content caches (e.g., server memory), increas-ing pressure on server storage systems. This phe-nomenon is well-known in the �le system context [13].

� The reduced locality in the miss stream tends to in-crease the importance of content-aware request dis-tribution (such as URL hashing) for e�ective cachingin downstream server clusters.

Page 2: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

request stream

Internethostingnetwork

servercluster

requestdistributorsurrogate

caches

CDNservers

proxy caches

clients

Figure 1: End-to-end architecture for Web content delivery.

� At the same time, forward caching suppresses the pri-mary source of load imbalances in Web server clus-ters, reducing the need for dynamic load balancingwhen content-aware request distribution is used.

Our results emphasize the need to consider how deploy-ment of caches and CDNs a�ects other functional com-ponents of the Internet infrastructure. More generally, inevaluating solutions and policies for managing Web con-tent it is important to consider their role as components ofa complete end-to-end content delivery architecture.

This paper is organized as follows. Section 2 discusses therelevance of our work to end-to-end content delivery, andoutlines our experimental methodology. Section 3 presentsanalytical and experimental results illustrating the trickle-down e�ect, and examines its implications for server re-quest distribution, load balancing, and cache management.Section 4 considers the question of how the trickle-down ef-fect manifests in today's Web. Section 5 summarizes ourconclusions.

2. BACKGROUNDFigure 1 depicts the key components involved in deliv-ering content from a large Web site to a community ofclients. The request stream originates at clients on theleft-hand side, and �lters through one or more caches; re-quests not absorbed by the caches pass through to the ori-gin servers on the right-hand side. Caches �ltering therequest stream may include client browser caches, demand-side proxy caches acting on behalf of a community of clientsor an access ISP, and supply-side surrogate caches (alsocalled reverse proxies) acting on behalf of the Web site'shosting provider or a third-party CDN service. A varietyof mechanisms route the request stream through caches:clients may be con�gured to use a cache, or the systemmay redirect requests by interposing on DNS domain nametranslation or by intercepting the request at the IP level.In addition, each cache may control the routing of its missstream to the other components.

The last two years have seen an explosion of growth in Webcaching and content delivery infrastructure. Key develop-ments include CDN service o�erings from Akamai, Digi-tal Island, and others, increasing use of surrogate cachingamong hosting providers such as Exodus, and aggrega-tion of content consumers into large ISPs employing trans-

parent interception proxies based on L7 switch hardwarefrom Cisco/Arrowpoint, Nortel/Alteon, Foundry, and oth-ers. Each of these development has fed the growth in de-mand for Web caching systems from vendors including Ink-tomi, Network Appliance, and CacheFlow.

One goal of this paper is to explore the e�ect of upstreamcaching on the e�ectiveness of downstream request distri-bution policies, e.g., at the origin site. The right-handside of Figure 1 depicts a typical origin site for a large-scale Web service. Many of today's Web sites are hostedon server farms, where a group of servers are clustered to-gether to act as a uni�ed server to external clients. Anygiven request could be handled by any of several servers,improving scalability and fault-tolerance. The switchinginfrastructure connecting the servers to the hosting net-work includes one or more redirecting server switches toroute incoming request tra�c to the servers. We refer tothese switches as request distributors because they selectthe server to handle each incoming request. The server se-lection policy plays a key role in managing cluster resourcesto maximize throughput and meet quality-of-service goals.

Commercial server switches use a variety of server selectionpolicies to distribute requests to maximize throughput andminimize response latency. Server load balancing policies(slb) monitor server status and direct requests to lightlyloaded servers. slb switches are often called L4 switchesbecause they make server selection decisions at connectionsetup time, and examine only the transport (layer 4) head-ers of the incoming packet stream. Content-aware serverselection policies prefer servers that can handle a given re-quest most e�ciently, for example, a server likely to havethe requested data in cache. URL hashing (urlhash) isa content-based policy that applies a simple determinis-tic hash function on the request URL to select a server.URL hashing is sometimes called an L7 policy because theswitch must parse HTTP (layer 7) headers to extract theURL.

2.1 Zipf-Like Request TrafficObservations of Web request patterns drive the design choicesand policies for all of these components of a Web deliveryarchitecture. In particular, a number of studies indicatethat requests to static Web objects follow a Zipf-like pop-ularity distribution [11, 8, 5, 17]. The probability pi of arequest to the ith most popular document is proportional

Page 3: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

URLrequeststream

LRU cachesimulation

request distribution(URLHASH or RANDOM)

serverserverserver

server NSURGE ortrace input server

simulation

Figure 2: Simulation pipeline for study of caching and request distribution.

to 1=i�, for some parameter �. A large number of requeststarget the most popular sites and the most popular objects,but the distribution has a long, heavy tail of less popularobjects with poor reference locality. Higher � values in-crease the concentration of requests on the most popularobjects.

One implication of the Web's Zipf-like behavior is thatcaching is highly e�ective for the most popular static (cache-able) objects, assuming that popularity dominates rate ofchange [17]. Unfortunately, caching is less e�ective on theheavy tail, which comprises a signi�cant fraction of re-quests. This is why Web cache e�ectiveness typically im-proves only logarithmically with size, measured either bycapacity or by user population.

Zipf-like behavior also has implications for request distribu-tion strategies in server clusters. For example, it creates atension between the competing goals of load balancing andlocality. On the one hand, content-aware policies such asurlhash e�ectively take advantage of the locality presentin the request stream by preferring the same server for re-peat requests, maximizing server memory hits for popularobjects. However, urlhash is vulnerable to load imbal-ances because the most popular objects receive the largestnumber of requests, and a single server handles all requestsfor any given object. slb policies balance load, but theytend to scatter requests for each object across the servers,reducing server memory hits for moderately popular ob-jects.

Recent research [15, 2, 3] has studied this tradeo� in depth,and developed Locality Aware Request Distribution (lard)and related policies to balance these competing goals, com-bining the bene�ts of each approach. The lard approachis commercialized by Zeus technology [1]. Other com-mercial request distributors use less sophisticated strate-gies. Server switches from Foundry, Nortel/Alteon andCisco/Arrowpoint can assign multiple servers to each url-

hash bucket, and select from the target set using load in-formation. The IBMWebsphere Edge Server includes intel-ligent load balancing software for routing requests amongservers based on content, server load, connection statistics,and customizable monitors. This product also contains aWeb proxy cache.

2.2 Goals and MethodologyThis paper examines the impact of caching on downstreamrequest tra�c patterns, using the simulation pipeline de-

picted in Figure 2. We limit our experiments to static,cacheable content, and we do not consider the e�ect of ob-ject rate of change or expiration. We use a simple Webcache simulator [9] to �lter representative request tracesthrough caches of various sizes. Our model assumes asimple LRU replacement policy with no cache cooperationamong the front-end cache and the servers. Our resultsare conservative in that we use small cache capacities rela-tive to the aggregate content size. In these experiments weassume that a single cache �lters requests from all clients;this assumption is reasonable for CDNs and proxies serving20,000 or more users [17], but it exaggerates the trickle-down e�ect from proxies serving small populations.

We use simulations to explore the properties of the cachemiss stream and their e�ects on downstream components,focusing on the e�ectiveness of server content caching andserver request distribution policies. We study two serverselection policies: urlhash and random. These policiesare su�cient to quantify the impact of the trickle-downe�ect on the performance potential of a range of policies forserver request distribution. We substitute random for slbbecause it is easier to simulate, and its locality propertiesare no worse than slb.

Table 1 shows some characteristics of the server tracesand synthetic request streams used in the study. Theserver traces are from IBM Corporation's main Web server(www.ibm.com). Because these traces are collected at theentry point(s) of a large, popular Web server, they capturethe characteristics of request streams likely to be seen atthe kind of server clusters we target in this study. The�rst trace contains 15.5M requests over a period of 3.5days in June 1998, showing a best-�t Zipf � value of 0.764.We select this trace partly to facilitate direct comparisonwith the lard study, which uses the same trace. The sec-ond IBM trace was collected on February 5-11, 2001. Asshown in Table 1 the IBM traces deviate somewhat fromexpected Zipf-like behavior, e.g., they contain a small num-ber of popular objects that account for an unusually largepercentage of references.

We also use the Surge Web server load generator [4] tosynthesize Zipf-like request traces using � values between0.6 and 0.9, to allow experimentation with a range of valuesobserved in previous tra�c studies. The synthetic traceswere generated with default Surge settings for temporallocality and object size distributions.

Page 4: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

Trace Objects Footprint References Referencesfrom top 1%of objects

Referencesfrom top 5%of objects

1998 IBM 38527 1 GB 15.5M 77% 94%2001 IBM 52763 1.6 GB 43M 93% 97%Zipf Alpha 0.9 40000 900 MB 4.2M 42% 57%Zipf Alpha 0.8 20000 440 MB 7.5M 27% 42%Zipf Alpha 0.7 20000 440 MB 16M 17% 30%

Table 1: Trace Characteristics

100

101

102

103

104

105

100

101

102

103

104

105

106

Ref

eren

ces

Object rank

Zipf 0.6, no cacheZipf 0.6, 32MB front−end cacheZipf 0.6, 64MB front−end cacheZipf 0.6, 128MB front−end cache

100

101

102

103

104

105

100

101

102

103

104

105

106

Object rank

Ref

eren

ces

No front−end cache32MB front−end cache64MB front−end cache128MB front−end cache

Figure 3: Number of references to each object by rank (log-log), before and after caching. The left-handgraph is from a synthetic trace with � = 0:6; the right-hand graph is from the 1998 IBM trace.

3. EXPERIMENTAL RESULTSThis section presents experimental results in four subsec-tions. Section 3.1 illustrates the tra�c properties of the\trickle-down" cache miss stream. Sections 3.2 and 3.3 ex-amine the impact of these properties on load balancing andcache locality respectively.

3.1 Properties of the Miss StreamFigure 3 illustrates object reference frequency by rank be-fore and after caching, on a log-log scale. As expected forZipf-like distributions, the original traces roughly follow astraight line; the straight line breaks down for less popularobjects because the traces are of bounded size. The �gureshows that even a small forward cache \skims the cream"o� the request stream, absorbing requests for the most pop-ular objects. The e�ect is to \ atten" the distribution sothat the most popular objects in the miss stream receiveroughly the same number of requests. In fact, there are1035 references to the most popular object in the � = 0:6miss stream from a 32MB cache, while the object with rank1000 is referenced 816 times. The e�ect is even more pro-nounced in the right-hand graph from the 1998 IBM trace,which has a heavier concentration on the most popular ob-jects in the original trace. Larger forward caches extendthe attening e�ect, but are qualitatively similar.

To illustrate the disproportionate e�ect of caches on popu-lar objects, Figure 4 shows per-object hit rates for popularobjects in the same input traces, as a function of popular-

ity rank. This graph shows that cache e�ectiveness fallso� rapidly with object popularity. The e�ect is more pro-nounced in the IBM trace; its most popular objects missin the cache even less often than for the � = 0:6 trace, ex-plaining the smaller number of references to these objectsin the miss stream in Figure 3.

The intuition behind the trickle-down e�ect is as follows.Suppose that requests to each object are evenly distributedin the input trace, and that it takes R requests of the traceto �ll a cache of k distinct objects. The expected valueof R is a function of the input distributions and the cachesize, but we leave aside the question of how to compute it.(R is on the order of 105 for the 1998 IBM trace with a32MB cache.) The probability that none of the R requestsreferences the object with popularity rank i is (1 � pi)

R;this gives the probability that a reference to object i missesin the cache after any R references. Thus the probabilityof a reference to object i in the miss stream is qi = pi(1�pi)

R; this gives the probability that any given element ofan in�nite request stream is a reference to object i and thatthe reference misses in the cache.

Figure 5 shows a scaled plot of qi = pi(1 � pi)R for a

Zipf � = 0:7 and R = 104. The graph shows that it isthe moderately popular objects that are most likely to ap-pear in the miss stream; these become the most popularobjects in the �ltered distributions of Figure 3. This is be-cause the probability of a forward cache miss (1 � pi)

R isvanishingly small for popular objects, thus few requests to

Page 5: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

0 500 1000 1500 2000 2500 3000 3500 4000 45000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Obj

ect h

it ra

tio

Object rank

32MB front−end cache64 MB front−end cache128 MB front−end cache

0 500 1000 1500 2000 2500 3000 3500 4000 45000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Object rank

Obj

ect h

it ra

tio

32MB front−end cache64 MB front−end cache128 MB front−end cache

Figure 4: Object hit rate in front-end caches, by object rank in the input trace. The left-hand graph isfrom a synthetic trace with � = 0:6; the right-hand graph is from the 1998 IBM trace.

Operation CPU time in �s

Connection setup 145Net transmit (512 bytes) 40Disk transfer (4K bytes) 410

Table 2: CPU cost parameters for simulation.

popular objects pass through the cache. While this proba-bility approaches unity for unpopular objects, requests forunpopular objects are rare to begin with. For comparison,Figure 5 also shows the number of references to each objectafter passing the 1998 IBM trace through a 32MB cache, asa function of object rank in the original input trace. Thiscurve con�rms our intuition.

3.2 Load BalancingThese e�ects of forward caching on the reference distribu-tions in the miss stream have important implications fordownstream components. To illustrate, we now considerthe impact on server request distribution policies. Oneimplication is that the trickle-down e�ect suppresses theload skew caused by popular objects in Zipf-like requeststreams. This means that simple content-aware requestrouting policies such as urlhash become less vulnerableto load imbalances than they have been in the past.

We used a simple server simulator to quantify this e�ect.The simulator assigns a �xed CPU cost for each connec-tion (we assumed a single request per connection) and foreach block transferred. A server cache miss also imposes aper-block CPU transfer cost from disk. Table 2 gives thesimulation parameters for these costs, which are identicalto those used in the lard study [15].

The simulation results compare load distributions fromurlhash and random request distribution of �ltered tracesacross varying numbers of servers. The results exaggerateload imbalances relative to the lard study in the followingrespect. We assume that all client requests are available atthe start of the experiment, to eliminate server wait time

100

101

102

103

104

105

0

50

100

150

200

250

Original object rank before caching (popularity)

Ref

eren

ces

in c

ache

mis

s st

ream

1998 IBM TraceMiss stream PDF

Figure 5: Number of references to each object inthe 32MB cache miss stream, ordered by rank inthe 1998 IBM trace. The superimposed curve is ascaled plot of qi = pi(1 � pi)

R for a Zipf � = 0:7 andR = 104.

between requests; this \squeezes out" all idle time of un-derutilized servers to the end of the run. We then quantifyload skew as the average percentage CPU time di�erencebetween the slowest server and all other servers:

PN

i(Tslow � T imei)=(N � Tslow)

where Tslow is the CPU time for the slowest of the Nservers. This conservative load skew metric gives the aver-age idle time percentage of servers left underutilized due toload imbalances. A lower load skew �gure indicates a morebalanced load across the servers. This simulation assumesadequate I/O system bandwidth, and considers only theCPU cost to manage the I/O.

Figure 6 shows the load skew results for random and url-hash on the cache miss streams, as a function of the server

Page 6: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

4 8 12 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of back−end servers

Load

ske

w0.7 − no front−end cache − url hashing0.7 − 32 MB front−end cache − url hashing0.7 − 64 MB front−end cache − url hashing0.7 − 128 MB front−end cache − url hashing0.7 − 64 MB front−end cache − random switching0.8 − no front−end cache − url hashing0.8 − 64 MB front−end cache − url hashing

Figure 6: Server load skew (synthetic traces with� = 0:7 and � = 0:8).

4 8 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of back−end servers

Load

ske

w

No front−end cache32MB front−end cache64MB front−end cache128MB front−end cache256MB front−end cacheInfinite front−end cache

Figure 7: Server load skew (2001 IBM trace)

cluster size N . These experiments use a server memory sizeof 32MB, as in the lard study. random balances the loadas expected. With no caching, the load skew for urlhashon the � = 0:7 Zipf trace with 8 servers is almost 25%and jumps to over 35% when using 16 servers, con�rmingthe lard result that load imbalances under urlhash cansigni�cantly reduce cluster throughput, and that load im-balances grow with cluster size. The imbalance is worsefor the � = 0:8, which concentrates references to popularobjects even more. However, these load imbalances are lesssevere after �ltering the request streams through forwardcaches; for example, a small cache drops the load skew toroughly 7% with 8 servers, independent of the � or thefront-end cache size.

This e�ect is also visible with the IBM server traces, al-though a small number of very large objects introduce someskew (see below). For example, Figure 7 shows the loadskew results for urlhash on the 2001 IBM trace with thenine largest objects removed (all over 8MB). While for-

100

101

102

103

104

105

101

102

103

104

105

106

107

108

109

1010

Byt

es tr

ansf

erre

d

Object Rank

No Front End Cache32MB Front End Cache64MB Front End Cache128MB Front End Cache

Figure 8: Request distribution by size*references(1998 IBM trace)

ward caching is less e�ective at eliminating load imbalancesthan for the synthetic traces, the 8 server load imbalanceis never more than 20% with a forward cache of 64 MBor larger. To further validate our simulation, we ran the2001 IBM trace against a cluster of 8 machines runningFreeBSD and Apache Web Servers, gathering load infor-mation from the cluster while using urlhash to distributethe requests among the cluster machines. The results ofthis experiment are qualitatively similar to Figure 7, butthe load skews are signi�cantly lower for larger clusters andlarger front-end cache sizes. One di�erence is that the testplatforms have larger server memories (512 MB) than thesimulated systems.

We ran additional experiments to investigate the impact oflarge objects on our results. We determined that much ofthe remaining load imbalance occurs because the caches inour simulations are ine�ective at removing load skew dueto variations in object size. In particular, large objects donot live in a small cache long enough to yield hits. Forexample, the objects in the IBM traces range in size up to61 MB. Some of the largest objects are moderately popular,but their size makes them e�ectively uncacheable in ourexperiments.

To illustrate, Figure 8 shows the distribution of bytes trans-ferred (size � requests) for objects in the 1998 IBM traceafter �ltering through caches of 32MB, 64MB, and 128MB.This metric approximates the amount of server work toserve requests for each object. The graph shows that al-though the caches suppress popularity skew, they do noteliminate the skew in bytes transferred from the server foreach object. In practice, e�ectiveness of forward caching istypically limited by population rather than size, and size-related imbalances may be resolved at the origin sites bysegregating continuous media �les and other large objects.

These results indicate that explicit load balancing for content-aware request distribution in Web server clusters is lessimportant in the presence of upstream caching. That is,a simple urlhash strategy at the back end is unlikely todevelop signi�cant load imbalances when forward caching

Page 7: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

Trace 256MB Aggregate Server Cache0 Front-End Cache 64MB Front-End Cache 128MB Front-End Cache

URL Random URL Random URL Random

1998 IBM 99% 92% 65% 3.5% 34% 1.6%2001 IBM 98% 94% 46% 3.6% 28 1.6%Zipf � 0.9 81% 43.7% 45% 3.0% 28% 1.6%

512MB Aggregate Server Cache

1998 IBM 99.4% 94.4% 85% 7.5% 74% 3.3%2001 IBM 99% 95.3% 70% 7.8% 55% 4.2%Zipf � 0.9 93% 50% 76% 6.5% 65% 4.1%

Table 3: Server cache hit rates for a 16-server cluster with various traces and front-end cache sizes, 256MBand 512MB aggregate server cache.

is e�ective.

3.3 Effects on Server LocalityA key advantage of content-aware request distribution isthat it maximizes server cache e�ectiveness. However, theheavy-tailed nature of Web popularity distributions impliesa low marginal bene�t from multiple levels of caching [10].Thus, the trickle-down e�ect reduces the ability of a servercluster to serve requests from memory for a given serverrequest rate, independent of the request distribution pol-icy. Thus servers tend to be increasingly I/O-bound in thepresence of upstream caching, although there may be sig-ni�cant locality remaining in requests from caches to reloadrecently modi�ed or expired objects.

Figure 9 illustrates this e�ect for ideal 256 MB and 512 MBaggregate server caches, as a function of front-end cachesize. For the 1998 IBM trace, a 256 MB back-end cacheis su�cient to serve more than 99% of references in theoriginal trace from server memory. Even a 32 MB front-end cache absorbs 95% of the references in this trace, aspredicted by the data in Table 1. With a 32 MB front-end cache the server caches still yield hit ratios above 80%for the miss stream. As the front-end cache grows, theback-end caches are left with just the tail of the originaldistribution, so back-end hit rates drop. When front-endand back-end caches can each store 25% of the content, theback-end hit rate is only 15%.

This shows that the potential of content-aware request dis-tribution to reduce server I/O load is diminished in thepresence of upstream caching. Simulations with urlhash

show that back-end cache hit ratios closely follow Figure 9,independent of the number of servers and their memorysizes. That is, urlhash delivers an equivalent back-endhit ratio to an ideal uni�ed cache of the same aggregatesize for the traces we tested. For example, with a 128 MBfront-end cache and 8 servers with 32 MB of memory each,urlhash delivers the ideal back-end hit ratio of 41%.

Even so, upstream caching increases the marginal bene-�t from content-aware request distribution in many cases.Figure 10 shows the e�ectiveness of random request distri-bution to quantify the server cache hits that are sacri�cedby a locality-blind policy such as slb. The graph giveshit ratios in a 256 MB aggregate back-end cache for the1998 IBM trace �ltered through front-end caches of vary-ing sizes. The lines show cache hit ratios as a function ofthe number of servers in the cluster; larger clusters exacer-

0

10

20

30

40

50

60

70

80

90

100

0 32 64 128 256 512 768 Infinite

Front-end cache size

Obj

ecth

itra

tio

512 MB aggregate back-end cache

256 MB aggregate back-end cache

Front-end hit rates

Figure 9: Cache hit rates for front-end and back-end caches of varying sizes in the 1998 IBM Trace.

bate redundant caching as described in Section 2, reducingoverall cache e�ectiveness. With no front-end cache, ran-dom yields back-end hit ratios above 90% for this traceeven with 16 servers. With a front-end cache absorbingrequests for the most popular objects, random never de-livers a back-end hit ratio above 38%, while Figure 9 showsthat a content-aware policy such as urlhash could deliverhit ratios as high as 80% with a 32 MB front-end.

As the size of the front-end cache grows and the localitypresent in the miss stream drops, random becomes pro-gressively less sensitive to the number of servers, and theperformance gap beween random and the ideal back-endnarrows. Yet urlhash signi�cantly outperforms randomon the �ltered request streams across a range of con�gura-tions. Table 3 gives results from the 2001 IBM trace and asynthetic Zipf-like trace with similar locality but more typ-ical request concentrations for the most popular objects. Ineach case, even random delivers reasonable or competitivehit ratios for the original trace, but is not e�ective on the�ltered traces. In contrast, urlhash is e�ective at extract-ing the remaining locality from the �ltered traces, yieldingclose-to-ideal hit ratios.

Page 8: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

0

20

40

60

80

100

120

4 Servers 8 Servers 12 Servers 16 Servers

Number of servers

Obj

ecth

itra

tio No cache

32 MB cache

64 MB cache

128 MB cache

256 MB cache

Figure 10: Server cache hit rates (256 MB aggre-gate) with random request distribution as a func-tion of cluster size, for various front-end cache sizes(1998 IBM trace).

4. IS THE TRICKLE-DOWN EFFECT VIS-IBLE TODAY?

The simulation data presented in earlier sections clearlyshow the potential impact of widespread caching on pop-ularity distributions. Can we see preliminary evidence ofsuch a trend in current Web traces, and if not, why not?

A �rst step is to isolate the e�ects of other non-caching re-lated trends, such as observed changes in Zipf distributions.Padmanabhan and Qiu [14] show that Zipf � values for theMSNBC site during 1998 and 1999 were higher than in pre-vious proxy and server traces. They note that these traceswere taken at a time when CDNs and surrogate caches werenot employed at this site, and observe that � values forWeb tra�c are increasing, i.e., a relatively small numberof documents are becoming more popular. This increasesthe potential bene�t from caching and replication. Thestudy also notes that caching and CDNs will reduce theload on the servers. Our results explore how this cachingmight a�ect the properties of the request \trickle" reachingthe MSNBC server after CDNs or surrogates are deployed.

We looked for evidence of upstream proxy caches in theclient request stream of the 2001 IBM trace. The IBM siteuses the Network Dispatcher component of the WebsphereEdge Server to balance load across servers, but, like theMSNBC site, IBM does not yet use surrogate caches ora CDN for this site. A few dozen clients generate morethan 10,000 requests per day. While these are likely to becaches, there are no very large caches: no client generatesmore than 1% of requests. The top three clients accountfor 2% of the requests. This con�rms that the potentialof proxy caching in the Web is not yet fully realized, andthat existing demand-side caches do not yet serve largeenough user communities to have a signi�cant impact onthe request stream for sites that do not use CDNs. Onefactor may be that the IBM site is accessed primarily bybusinesses rather than consumers.

1

10

100

1000

10000

1 10 100 1000 10000 100000 1e+06

Ref

eren

ces

Object rank

Full streamMiss stream

Misses ordered by rank in full stream

Figure 11: Popularity distributions for the fulltrace and the miss stream from NLANR cacheson May 30, 2001, plotted logarithmically on bothaxes. We ignore objects whose requests hit in theNLANR caches less than 10% of the time.

We also examine one day of May 2001 traces from thetop-level NLANR caches [16], which cooperate to act as auni�ed root proxy cache. Intuition suggests that the popu-larity distributions for the incoming NLANR cache requeststream would be at for the most popular objects, becausethe request stream itself originates from caches in lowerlevels of the hierarchy. However, the e�ect is not visible inFigure 11, which shows reference frequency distributionsfor \cacheable" URLs in the request stream (upper line)and the root cache miss stream (lower line). We considera URL \cacheable" if it yields at least a 10% object hit ra-tio in the NLANR root cache. Even with this restriction,neither request stream shows the attening characteristicof the trickle-down e�ect to the degree expected.

Further analysis reveals that the NLANR root cache e�ec-tiveness is highly variable, even for popular objects thatappear to be cacheable. The fuzzy cloud below the up-per line in Figure 11 is a plot of object request frequencyin the miss stream, ordered by object rank in the originaltrace. This shows an unexpectedly low correlation betweenobject popularity and cache e�ectiveness, even for \cache-able" objects (compare to Figure 4).

The NLANR traces do not contain enough HTTP headerinformation to determine why any given request was notserved from the cache. One possibility is that expirationheaders in responses caused the caches to discard manyobjects that would have yielded hits. Another possibilityis that cookies or other headers rendered the responses un-cacheable for other reasons. The cache software in thissystem used a complex procedure for identifying cacheableresponses.

In summary, there is little empirical evidence to indicatethat Web caches signi�cantly impact the request tra�c ofrecent server studies, even in popular Web sites such asIBM and MSNBC. However, this paper shows that Webcaches will reshape the request stream, to the degree thatthey are e�ective at all. A range of factors compromise thee�ectiveness of Web caching today, including incomplete

Page 9: ABSTRACTDep artment of Computer Scienc e Duke University chase, gadde, vahdat@cs.duke.edu ABSTRACT W eb pro xies and Con ten t Deliv ery Net w orks (CDNs) are widely used to accelerate

deployment of CDNs, small population sizes on demand-side caches, and confusion about what content is cacheableduring the transition to HTTP 1.1.

5. CONCLUSIONSThis paper shows that key assumptions about Web serverrequest tra�c may lead us astray as proxy caches and Con-tent Delivery Networks (CDNs) become increasingly ubiq-uitous and e�ective. To be sure, perfect caching wouldabsorb all static object requests, leaving origin sites to han-dle dynamic content and propagate object updates to thecaching networks. But even imperfect caches can absorbthe majority of requests to the most popular objects, shift-ing downstream tra�c loads toward the heavy tail of Zipf-like Web request distributions. This \trickle-down e�ect"impacts the design of downstream components.

Our results illustrate the importance of considering the im-pact of caching on other components of an end-to-end con-tent delivery architecture. This paper explores key impli-cations of the trickle-down e�ect for request distributionalternatives in back-end server clusters. The trickle-downe�ect suppresses the primary source of load imbalances inWeb server farms, reducing the potential bene�t from so-phisticated load balancing strategies for static content. Italso absorbs much of the locality in the request stream, re-ducing the bene�t from downstream content caches (e.g.,server memory). In the limit, upstream caches weaken thedi�erences between request distribution strategies down-stream. The various strategies are motivated largely by thelocality and load skew inherent in the \head" of Zipf-likerequest distributions, and behave similarly on the heavytail. Even so, the partial reduction in locality from moder-ately e�ective forward caches makes content-aware requestdistribution (e.g., URL hashing) even more important indownstream server clusters to bene�t from the locality thatremains; content-blind policies such as server load balanc-ing (slb) severely compromise server cache e�ectiveness,even with small front-end caches.

This study focuses on static content. Dynamic Web re-quests are traditionally not cacheable. However, with tech-niques such as dynamic caching [12] and fragment caching[6] [7] it is possible to cache part of the result of dynamicrequests (which may apply to many di�erent requests),gaining some bene�t from locality. These e�ects must alsobe considered when studying the impacts of increased dy-namic content in Web requests.

6. ACKNOWLEDGEMENTSWe would like to thank Erich Nahum of IBM Research forproviding us the 1998 IBM web server traces used in theLARD study. We would also like to thank Alister Lewis-Bowen and Andrew Frank-Loron for their help in obtainingthe 2001 IBM server logs.

7. REFERENCES[1] Improving Web Server Performance with Zeus Load

Balancer. http://www.zeus.co.uk/products/lb1/.

[2] M. Aron, P. Druschel, and W. Zwaenepoel. E�cientsupport for P-HTTP in cluster-based Web servers. In InProceedings of USENIX'99 Technical Conference, 1999.

[3] Mohit Aron, Darren Sanders, Peter Druschel, and WillyZwaenepoel. Scalable content-aware request distributionin cluster-based network servers. In In Proceedings of theUSENIX 2000 Technical Conference, 2000.

[4] Paul Barford and Mark E. Crovella. Generatingrepresentative Web workloads for network and serverperformance evaluation. In Proceedings of Performance'98/ACM SIGMETRICS '98, pages 151{160, June 1998.

[5] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and ScottShenker. Web caching and Zipf-like distributions:Evidence and implications. In Proceedings of IEEEInfocom '99, March 1999.

[6] J. Challenger, A. Iyengar, and P. Dantzig. A scalablesystem for consistently caching dynamic data. InProceedings of IEEE Infocom 1999 Conference, March1999.

[7] J. Challenger, A. Iyengar, K. Witting, C. Ferstat, andP. Reed. A publishing system for e�ciently creatingdynamic web content. In Proceedings of the IEEE Infocom2000 Conference, March 2000.

[8] Carlos Cunha, Azer Bestavros, and Mark Crovella.Characteristics of WWW client-based traces. TechnicalReport 1995-010, 1, 1995.

[9] Syam Gadde. The Proxycizer Web proxy tool suite.http://www.cs.duke.edu/ari/Proxycizer/.

[10] Syam Gadde, Je�rey S. Chase, and Michael Rabinovich.Web caching and content distribution: A view from theinterior. Computer Networks and ISDN Systems (SelectedPapers from the Fifth International WWW Caching andContent Delivery Workshop), 24(2):222{231, February2001.

[11] Steve Glassman. A Caching Relay for the World WideWeb. In First International World Wide Web Conference,1994.

[12] A. Iyengar and J. Challenger. Improving web serverperformance by caching dynamic data. In Proc. USENIXSymp. on Internet Technologies and Systems, December1997.

[13] D. Muntz and P. Honeyman. Multi-level caching indistributed �le systems, or your cache ain't nuthin' buttrash. In Proceedings of the Winter USENIX TechnicalConference, January 1992.

[14] V. Padmanabhan and L. Qiu. The content and accessdynamics of a busy web site: Findings and implications.In Proc. of SIGCOMM, Stockholm, Sweden, Aug. 28 {Sept. 1, 2000.

[15] Vivek S. Pai, Mohit Aron, Gaurav Banga, MichaelSvendsen, Peter Druschel, Willy Zwaenopoel, and ErichNahum. Locality-aware request distribution incluster-based network servers. In Proceedings of theEighth International Conference on Architectural Supportfor Programming Languages and Operating Systems,October 1998.

[16] Duane Wessels, Traice Monk, k cla�y, and Hans-WernerBraun. A distributed testbed for national informationprovisioning. http://ircache.nlanr.net/Cache/.

[17] Alec Wolman, Geo� Voelker, Nitin Sharma, NealCardwell, Anna Karlin, and Henry Levy. On the scale andperformance of cooperative Web proxy caching. InProceedings of the 17th ACM Symposium on OperatingSystems Principles, December 1999.

[18] G. Zipf. Human Behavior and the Principle of LeastE�ort. Addison Wesley, 1949.