Engineering Web Cache Consistency · IBM T. J. Watson Research Center, POB 704, Yorktown Heights,...

Engineering Web Cache Consistency

JIAN YIN, LORENZO ALVISI, and MIKE DAHLINUniversity of Texas at AustinandARUN IYENGARIBM T. J. Watson Research Center

Server-driven consistency protocols can reduce read latency and improve data freshness for a givennetwork and server overhead, compared to the traditional consistency protocols that rely on clientpolling. Server-driven consistency protocols appear particularly attractive for large-scale dynamicWeb workloads because dynamically generated data can change rapidly and unpredictably. How-ever, there have been few reports on engineering server-driven consistency for such workloads.This article reports our experience in engineering server-driven consistency for a sporting andevent Web site hosted by IBM, one of the most popular sites on the Internet for the duration ofthe event. We also examine an e-commerce site for a national retail store. Our study focuses onscalability and cachability of dynamic content. To assess scalability, we measure both the amountof state that a server needs to maintain to ensure consistency and the bursts of load in sending outinvalidation messages when a popular object is modified. We find that server-driven protocols cancap the size of the server’s state to a given amount without significant performance costs, and cansmooth the bursts of load with minimal impact on the consistency guarantees. To improve perfor-mance, we systematically investigate several design issues for which prior research has suggestedwidely different solutions, including whether servers should send invalidations to idle clients. Fi-nally, we quantify the performance impact of caching dynamic data with server-driven consistencyprotocols and the benefits of server-driven consistency protocols for large-scale dynamic Web ser-vices. We find that (i) caching dynamically generated data can increase cache hit rates by up to10%, compared to the systems that do not cache dynamically generated data; and (ii) server-drivenconsistency protocols can increase cache hit rates by a factor of 1.5-3 for large-scale dynamic Webservices, compared to client polling protocols. We have implemented a prototype of a server-drivenconsistency protocol based on our findings by augmenting the popular Squid cache.

Categories and Subject Descriptors: H.3.5 [Information Storage and Retrieval]: Online Infor-mation Services—Web-based services; Data sharing; H.3.4 [Information Storage and Retrieval]:Systems and Software—Distributed systems; Performance evaluation (efficiency and effectiveness);C.4 [Computer Systems Organization]: Performance of Systems—Design studies; Fault tol-erance; Performance attributes; Reliability, availability, and serviceability; C.2.4 [Computer-Communication Networks]: Distributed Systems—Client/server ; D.4.3 [Operating Systems]:

Authors’ addresses: J. Yin, L. Alvisi, M. Dahlin, Dept. of Computer Science, Univ. of Texas at Austin,Taylor Hall-2.124, Austin, TX 78712-1188; email: yin,lorenzo,[email protected]; A. Iyengar,IBM T. J. Watson Research Center, POB 704, Yorktown Heights, NY 10598; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]© 2002 ACM 1533-5399/02/0800-0224 $5.00

ACM Transactions on Internet Technology, Vol. 2, No. 3, August 2002, Pages 224–259.

Engineering Web Cache Consistency • 225

File Systems Management—Distributed file systems; D.4.8 [Operating Systems]: Performance—Simulation

General Terms: Design Performance, Reliability

Additional Key Words and Phrases: Cache coherence, cache consistency, scalability, lease, dynamiccontent, volume

1. INTRODUCTION

Although Web caching and prefetching have the potential to reduce read latencysignificantly, the inefficiency of the cache consistency protocols in the currentversion of HTTP prevents this potential from being fully realized. HTTP usesclient polling in which clients query servers to determine if cached objects areup to date. Thus, clients may need to poll servers before returning cached objectsto users even when these objects are valid. For example, in the workloads thatwe examine, more than 20% of requests to the servers can be client polls torevalidate unmodified cached objects. Thus, client polling not only increasesserver and network load, but also significantly increases read latency.

Many of the drawbacks of the traditional client polling protocols can be ad-dressed with server-driven consistency protocols. In server-driven protocols,servers inform clients of updates [Howard et al. 1988]. Thus, clients can re-turn cached objects without contacting the server if these objects have not beeninvalidated. A range of server-driven consistency protocols have been proposedand evaluated in both unicast and multicast environments using client Webtraces [Yin et al. 1998], synthetic workloads [Yin et al. 1999a], single Webpages [Yu et al. 1999], and proxy workloads [Li and Cheriton 1999].

Server-driven consistency appears particularly attractive for large-scale Websites containing significant quantities of dynamically generated and frequentlychanging data. There are two reasons for this. First, in these workloads, datachanges often and at unpredictable times. Therefore, client polling is likely toresult in obsolete data unless polling is done frequently—in which case the over-head becomes prohibitive. Second, the ability to cache dynamically generateddata is critical for improving server performance. Requests for dynamic data canrequire orders of magnitude more time than requests for static data [Iyengarand Challenger 1997] and can consume most of the CPU cycles at a Web site,even if they only make up a small percentage of the total requests.

However, to deploy server-driven consistency protocols for large-scale dy-namic Web services, several design issues critical to scalability and performancemust be examined. This article provides one of the first studies of server-drivenconsistency for Web sites serving large amounts of dynamically generated data.We examine two workloads. Our first workload is generated by a major sportingand event Web site hosted by IBM,1 which in 1998 served 56.8 million requestson the peak day, 12% of which were dynamically generated data [Iyengar etal. 1999]. Our second workload is taken from the e-commerce site of a nationalretail store. This site generated more than one million hits a day, 6.4% of whichwere to dynamically generated data.

1The 1998 Olympic Games Web site.

ACM Transactions on Internet Technology, Vol. 2, No. 3, August 2002.

226 • J. Yin et al.

The first issue we address is scalability. In server-driven consistency, scala-bility can be limited by a number of factors:

—As the number of clients increases, the amount of memory required to keeptrack of the content of clients’ caches may become large.

—Servers may experience bursts of load when they need to send invalidationmessages to a large number of clients as a result of a write.

Previous efforts to improve the scalability of server-driven consistency haveprimarily focused on using multicast and hierarchies to flood invalidation mes-sages [Li and Cheriton 1999; Yin et al. 1999a; Yu et al. 1999]. Although theseapproaches are effective, relying on them would pose a barrier to deployment.Our primary focus is on engineering techniques to improve scalability that areindependent of network layers. These techniques make it feasible to deployserver-driven consistency for services as large as the IBM Sporting and EventWeb site on today’s infrastructure, and will continue to improve scalability inthe future as multicast and hierarchies become widespread.

We show that the maximum amount of state kept by the server to enforce con-sistency can be limited without incurring a significant performance cost. Fur-thermore, we show that although server-driven consistency can significantlyincrease peak server load, it is possible to smooth out this burstiness withoutsignificantly increasing the time during which clients may access stale datafrom their caches.

The second issue we address is assessing the performance implicationsof the different design decisions made by previous studies in server-drivenconsistency.

Different studies have made widely different decisions in terms of the lengthof time during which clients and servers should stay synchronized, i.e., thelength of time during which servers are required to notify clients wheneveran object in the clients’ cache becomes stale. Some studies argue that serversshould stop notifying idle clients to reduce network, client, and server load [Yinet al. 1998], while others suggest that clients should stay synchronized withservers for days at a time to reduce latency and to amortize the cost of joininga multicast channel when multicast-based systems are used [Li and Cheriton1999; Yu et al. 1999].

Using a framework that is applicable in both unicast and multicast environ-ments, we quantify the trade-off between the low network, server, and clientoverhead of maintaining synchronization for short periods of time on the onehand, and the low read latency of maintaining synchronization for long periodsof time on the other hand. We find that for both of our workloads, there is lit-tle performance cost in guaranteeing that clients will be notified of stale datawithin a few hundred seconds. We also find that there is little benefit to hitrate in keeping servers and clients synchronized for more that a few thousandseconds.

Previous studies also propose significantly different resynchronization pro-tocols to resynchronize servers’ and clients’ consistency state after recoveringfrom disconnections, which may be caused by choice, by a machine crash, orby a temporary network partition. Proposals include invalidating all objects in



clients’ caches [Liu and Cao 1997; Yu et al. 1999], replaying “delayed invali-dations” upon resynchronization [Yin et al. 1998], bulk revalidation of cachecontents [Baker 1994], and combinations of these techniques. This study sys-tematically compares these alternatives in the context of large-scale services.We find that for desynchronizations that last less than 1000 seconds, delayed in-validations result in significant performance advantages compared to the otheralternatives.

The final issue that we address is quantifying the performance implicationsof caching dynamically generated data, data generated by server programs ex-ecuted in response to HTTP requests, as opposed to static data, data containedin static documents in a Web server. Although the high frequency and unpre-dictability of updates make dynamically generated data virtually uncachablewith traditional client-polling consistency, server-driven consistency may allowclients to cache dynamically generated data effectively. Through a simulationstudy that uses both server-side access traces and update logs, we demonstratethat server-driven consistency allows clients to cache dynamic content withnearly the same effectiveness as static content.

We have implemented the lessons learned from the simulations in a proto-type that runs on top of the popular Squid cache architecture (http://www.squid-cache.org). Our implementation addresses the consistency requirements oflarge-scale dynamic sites by extending basic server-driven consistency to pro-vide consistent updates of multiple related objects and to guarantee fast resyn-chronization and atomic updates. Preliminary evaluation of the prototypeshows that it introduces only a modest overhead.

The rest of the article is organized as follows. Section 2 reviews previouswork on WAN consistency, on which this study is built. Section 3 evaluates var-ious scalability and performance issues of server-driven consistency for large-scale dynamic services. Section 4 presents an implementation of server-drivenconsistency based on the lessons that we learned from our simulation study.Section 5 and Section 6 discuss related work and summarize the contributionsof this study.

2. BACKGROUND

The guarantees provided by a consistency protocol can be characterized us-ing two parameters: worst-case staleness and average staleness. We use 1(t)consistency to bound worst-case staleness. 1(t) consistency ensures that datareturned by a read is never stale by more than t units of time. Specifically,suppose the most recent update to an object O happened at time T . To sat-isfy 1(t) consistency, any read after T + t must return the new version ofobject O. Average staleness is instead expressed in terms of two factors: thefraction of reads that return stale data and the average amount of time forwhich the returned data has been obsolete. For example, a live news site maywant to guarantee that in the worst case it will not supply its clients withany content that has been obsolete for more than five minutes, while attempt-ing to provide good average staleness by delivering most updates within a fewseconds.



The optimal Web consistency protocol is precise expiration. In precise expi-ration, servers set the expiration time of each object to be the next modificationtime. When a cache loads a Web object, the corresponding expiration time isalso passed to the cache via the HTTP header “Expires.” Because in preciseexpiration Web objects expire when and only when they are updated, cachesonly contact servers to load new versions of Web objects. Unfortunately, serversgenerally cannot predict future modification times, and thus precise expirationis unrealistic. Nevertheless, the performance of precise expiration provides abenchmark for evaluating other consistency protocols.

Consistency protocols other than precise expiration use two mechanisms tomeet consistency guarantees. First, worst-case guarantees are provided usingsome variation of leases [Gray and Cheriton 1989], which place an upper boundon how long a client can operate on cached data without communicating withthe server. Second, some systems decouple average staleness from the leases’worst-case guarantees by also providing callbacks [Howard et al. 1988; Nelsonet al. 1988] which allow servers to send invalidation messages to clients whendata is modified.

For example, HTTP’s traditional client polling associates a time to live (TTL)or an expiration time with each cached object [Mogul 1996]. This TTL can beregarded as a per-object lease to read the object; in particular, it places an upperbound on the time that each object may be cached before the client revalidatesthe cached version. To revalidate an object whose expiration time has passed,a client sends a Get-if-modified-since request to the server, and the serverreplies with “304 not modified” if the cached version is still valid or with “200OK” and the new version if the object has changed.

The HTTP polling protocol has several limitations. First, because TTL de-termines both worst-case staleness and average staleness, they can not be de-coupled. Second, each object is associated with an individual TTL. After a setof TTLs expire, each object has to be revalidated individually with the serverto renew its TTL, thereby increasing server load and read latency.

As a result, several researchers [Li and Cheriton 1999; Liu and Cao 1997;Yin et al. 1998; Yin et al. 1999a; Yu et al. 1999] have proposed server-drivenconsistency protocols. These protocols can be understood within the generalframework of volume leases [Yin et al. 1999b]. Volume leases decouple averagestaleness from worst-case staleness by maintaining leases on objects as well asvolumes, which are collections of related objects. Whenever a client caches anobject, it requests a lease on the object. The server registers callbacks on objectleases and revokes them when the corresponding objects are updated. Typi-cally, the length of an object lease is chosen so that the lease will be valid foras long as the client is interested in the corresponding object (unless, of course,the object is updated by the server). Thus, object leases allow servers to informclients of updates as soon as possible. Worst-case staleness is enforced throughvolume leases, which abstract synchronization between clients and servers. Aserver grants a client a volume lease only if all previous invalidation messageshave been received by the client, and a client is allowed to read an object only ifit holds both the object lease and the corresponding volume lease. Thus, a vol-ume lease protocol enforces a staleness bound that is equal to the volume lease



length. In addition to decoupling average staleness from worst-case staleness,volume lease protocols can reduce server load and read latency by amortiz-ing volume lease renewal overheads over many objects in a volume. Moreover,understanding the relation between consistency semantics and volume leasemechanisms reveals an opportunity for reducing the number of messages. Sincea client with an expired volume lease cannot read any objects in the volume,we can delay sending invalidation messages for these objects until the clientrenews its volume lease without compromising consistency. This optimizationallows the invalidation messages to piggyback on other messages and thus re-duces the number of messages. We call this optimization delayed invalidations.

Within this general framework, a wide range of implementations can be re-lated. Volume leases are either explicitly renewed [Li and Cheriton 1999; Yin etal. 1998; Yin et al. 1999a] or implicitly renewed via heartbeats [Yu et al. 1999].Moreover, the implementation details of these protocols differ considerably. Yinet al. [1998] assume a unicast network infrastructure with an optional hierar-chy of consistency servers [Yin et al. 1999b] and specify explicit volume leaserenewal messages by clients. Li and Cheriton [1999] assume a per-server reli-able multicast channel for both invalidation and heartbeat messages. Yu et al.[1999] assume an unreliable multicast channel, but bundle invalidation mes-sages with heartbeat messages and thus tie average staleness to the system’sworst-case guarantees. The implications of these design choices are evaluatedin the next section.

A key problem in caching dynamically generated data is determining howchanges to underlying data affects cached objects. For example, dynamic Webpages are often constructed from databases, and the correspondence betweenthe databases and Web pages is not straightforward. Data update propagation(DUP) [Challenger et al. 1999] uses object dependence graphs to maintain pre-cise correspondences between cached objects and underlying data. DUP therebyallows servers to identify the exact set of dynamically generated objects to beinvalidated in response to an update of underlying data. Use of data updatepropagation at IBM sporting and event Web sites has resulted in server sidehit rates of close to 100%, compared to about 80% for an earlier version thatdidn’t use data update propagation.

3. EVALUATION

In this section we use trace-based simulation to evaluate the benefit and fea-sibility of server-driven cache consistency for large-scale Web sites. Section 3.1describes our workloads and simulator. Section 3.2 compares read latency, over-head, and consistency resulting from server-driven protocols, traditional client-polling protocols, and a theoretically optimal consistency protocol. Section 3.3considers two optimizations over traditional client-polling protocols and com-pares them against server-driven consistency protocols. Section 3.4 examineshow to use prefetch to further reduce read latency in server-driven consis-tency protocols. Section 3.5 examines the scalability of server-driven consis-tency protocols. Section 3.6 examines performance of various mechanisms ofserver-driven consistency to recover from server, client, and network failures.



3.1 Methodology

In this section, we describe the workloads and simulator used in our simulationstudy.

3.1.1 Workload. We use two workloads with different characteristics toconduct our study; one is from an online news site, the other an e-commercesite. The first workload is taken from a major IBM sporting and event Website. This site contained about 60,000 objects, and over 60% of the objects weredynamically generated [Iyengar et al. 1999]. The peak request rate for the IBMsporting and event site exceeded 56.8 million requests per day. The sporting andevent Web service was hosted on four geographically distributed Web clusters;each of them served about one-fourth of all requests. We took the server accesslog from one of these clusters on February 19th, 1998, and a modification log ofdynamically generated objects for our simulation study. The server access log isin the Common Log Format (http://httpd.apache.org/docs/logs.html), recordingthe IP address, timestamp, request line, status code, and object size for eachaccess. This log contains about 9 million entries. The modification log for dy-namically generated objects contains 20,549 entries. Our second workload istaken from an e-commerce Web site for a national retail store. This trace con-tains the Web access logs from March 3, 2000 to March 9, 2000. It indicatesthat 177,978 clients accessed 69,608 URLs during the 7-day period. Overall,9,504,953 requests are logged.

We distinguish two classes of data: dynamic data, data generated by serverprograms upon users’ requests, and static data. We classify URIs in our work-loads as static or dynamic, based on whether a URI contains the name of a serverprogram. We can further divide static objects into image objects and nonimageobjects by examining the suffix of the URIs. In our sporting and event workload,12% of the requests were made to pages dynamically generated by server pro-grams. In our e-commerce workload, 6.4% of the requests were to dynamicallygenerated objects.

We detect updates in two ways. First, the IBM sporting and event serverlogged the updates of the dynamically generated data, and the modification logis available for our simulation study. Second, for the static data, the dynami-cally generated data in our e-commerce workload, and the subset of dynamicdata in the IBM workload, we use changes of object sizes to infer updates fromour traces of read requests. This method appears reasonably accurate becausetwo versions of a document are unlikely to have the same size; we can par-tially confirm this assumption in our workload—we have a modification log ofa subset of objects in the first trace from DUP [Challenger et al. 1999], andwe find that the object size always changes when there is a write for this sub-set of the workload. We set the time of such inferred writes to immediatelyprecede the read that detects the change; this approximation of the timing ofwrites does not affect our results for read latency, number of messages, or aver-age staleness. It may slightly overstate server state, since it maximizes delayon reclaiming server state for callbacks. Finally, it may somewhat affect ourmeasurements of bursts of load, though we do not anticipate any systematicbiases.



In our study, all the objects from a Web server form a volume. We expectthis method of volume construction to be used in practice because it is simpleto implement. Moreover, the bigger the volume, the lower the server load andthe read latency, since volume lease renewals are amortized over more objects.However, it is generally not beneficial to group objects from several servers intoone volume, since these objects can be disconnected from clients independently.Thus, the ideal volume typically comprises all the objects from a server.

As noted in Section 2, object leases should be long enough to span the pe-riod of interest of a client in an object. For simplicity, we use infinite-lengthobject leases in our simulations. By allowing servers to discard some callbacks,more carefully tuned object lease lengths can further reduce message load [Yinet al. 1999b] or server callback state [Duvvuri et al. 1999], while only slightlyincreasing client response time due to the need to renew expired object leases.

We treat all the reads in our server access logs the same, and we do notcare whether these reads are from a browser or a proxy cache. We assumeconsistency proxies deployed in these proxies as described by Yin et al. [1999a]in our simulation study. Consistency proxies act as servers to clients, act asclients to origin servers, and are essential for invalidation messages to traversefirewalls. Thus, consistency proxies that support volume leases interact withthe server in the same way as browsers.

We use the client requests in server access logs to drive our simulation be-cause these logs are the only workloads that contain requests from all clientsto large-scale popular servers. All client reads to dynamically generated objectsare included in server access logs, since dynamically generated objects are notcached in the current Web cache scheme. However, some reads to static objectsare not recorded in server access logs because Web access logs are taken whilethe clients are running some client-polling protocols. There are two potentialinaccuracies. First, a request is filtered out by the local cache when the TTLis valid and the cached object is also valid. This read would turn to a cachehit for both server-driven protocols with long volume leases and client-pollingprotocols with long TTLs.2 Thus, it affects both server-driven consistency pro-tocols and client-polling protocols. Second, a client request is filtered out by thelocal cache when the TTL is valid and the cached object is invalidated. Thisrequest could result in a cache miss for server-driven protocols, but a cache hitfor long TTLs. Since the role of consistency protocols is to fetch the new datawhen data changes, it is preferable to suffer a cache miss to load valid data thanto avoid a load by reading stale data. Moreover, a cache miss would eventuallyresult in client-polling protocols if the client continues to read the object afterthe TTL expires. Thus, our results are conservative estimates of the benefits ofserver-driven protocols over client-polling protocols.

3.1.2 Simulator. To study the impact of different design decisions on theperformance of server-driven consistency, we built a simulator that reads a Web

2When the TTL and the volume lease are short, this hidden read is more likely to trigger a TTLrefresh than a volume lease renewal because a volume lease renewal is amortized over many objects.Thus, we underestimate the cache miss rates more for client polling than for volume lease in thiscase.



access log and a modification log and outputs read latency, server load, andnetwork bandwidth consumption. Our simulator simulates both cache state(whether an object is cached) and consistency state (whether TTLs are validor expired in client-polling protocols and whether volume leases and objectleases are valid in server-driven protocols). Given a workload, our simulatorprocesses the Web access log and the modification log in the order of timestamps,updating cache and consistency state and recording consistency, read load, andserver load information. To estimate server and network load, we only trackthe number of messages and total number of bytes in all messages, and wedo not simulate network queuing and round-trip delays. Counting the numberof messages and bytes provides a reasonable approximation to network load,independent of many detailed network parameters such as packet sizes andcongestion conditions.

3.2 Consistency for Large-Scale Dynamic Workloads

Previous studies have examined the performance characteristics of server-driven consistency for client cache workloads [Yin et al. 1998], synthetic work-loads [Yin et al. 1999b], single Web pages [Yu et al. 1999], and proxy workloads[Li and Cheriton 1999]. In this section, we examine the performance charac-teristics of server-driven consistency protocols for large-scale dynamic serverworkloads. Our goals are (i) to understand the interaction of server-driven con-sistency with this important class of workloads; and (ii) to provide a baselinefor the more detailed evaluations that we provide later in this article.

Our performance evaluation stresses read latency. To put the read latencyresults in perspective, we also examine the network costs of different protocolsin terms of messages transmitted. Read latency is primarily determined bylocal miss rate, the fraction of reads that a client cannot serve locally. There aretwo conditions under which the system has to contact the server to satisfy aread. First, the requested object is not cached. We call this a data miss. Second,even if the requested object is cached locally, the consistency protocol may needto contact the server to determine whether the cached copy is valid. We callthis a consistency miss. Read latency for consistency misses may be ordersof magnitude higher than that for local hits, especially when the network iscongested or the server is busy. Thus, to a fine approximation, reduction inmiss rates yields proportional reduction in average read latency.

As described in Section 2, the volume lease algorithm has two advantagesover traditional client-polling algorithms. First, it reduces the cost of providing agiven worst-case staleness by amortizing lease renewals across multiple objectsin a volume. In particular, under a standard TTL algorithm, if a client referencesa set of objects whose TTLs have expired, each reference must go to the server tovalidate an object. In contrast, under a volume leases algorithm, the first objectreference will trigger a volume lease renewal message to the server, which willsuffice to re-validate all of the cached objects. Second, volume leases providethe freedom to separate average case staleness from worst-case staleness byallowing servers to notify clients when objects change.

Figures 1 through 6 illustrate the impact of different consistency parame-ters for the IBM sporting and event workload and the e-commerce workload.



Fig. 1. Local hit rates vs. worst-case staleness bound of volume lease and TTL for the IBM work-load. Volume Lease(x) represents the volume lease protocol with the volume lease length equal to xseconds; Poll with TTL(x) represents a client-polling protocol with the TTL equal to x seconds. Notethat in the common case, volume lease caches are invalidated within a few seconds of an update,independent of worst-case staleness bounds.

Fig. 2. Local hit rates vs. worst-case staleness bound of volume lease and TTL for the e-commerceworkload.



Fig. 3. Number of messages vs. staleness bound of Volume Lease and TTL for the IBM workload.

Fig. 4. Number of messages vs. staleness bound of Volume Lease and TTL for the e-commerceworkload.



Fig. 5. Stale rate vs. staleness bound of TTL for the IBM workload.

Fig. 6. Average staleness vs. staleness bound of TTL for the IBM workload.



In these figures, the x axis represents the worst-case staleness bounds for thevolume lease algorithm; these bounds correspond to the volume lease lengthfor volume lease algorithms and the TTL for TTL algorithms. The y axes inthese figures show the fraction of local hits, network traffic, stale rate, andaverage staleness. The access patterns, such as read frequencies and num-bers of repeated accesses of Web objects, are quite different for the IBM Website and the e-commerce Web site. Thus, hit rates achieved by a given con-sistency protocol are different for these workloads. We can see that volumeleases consistently provide larger advantages for both workloads, due to theimpact of amortizing lease renewal overheads across volumes. In particular,for short worst-case staleness bounds, volume lease algorithms achieve signif-icantly higher hit rates and incur lower server overheads compared to TTLalgorithms.

As indicated in Figures 1 and 3 (and 2 and 4), providing worst-case stalenessbounds of 100 seconds by volume lease is cheaper than providing 10,000-secondworst-case staleness bounds by polling in terms of higher local hit-rate andfewer network messages. And, as Figures 5 and 6 indicate, this comparisonactually understates the advantages of volume leases because, for polling algo-rithms, the number of stale reads and their average staleness increase rapidlyas the worst-case bound increases. In contrast, volume lease schemes allowservers to notify active clients of updates quickly, regardless of the worst-casestaleness guarantees.

Also, notice in Figures 3 and 4 that the server load decreases when volumelease lengths increase from several seconds to several thousand seconds. In theIBM workload, load then increases slightly. This increase arises because as vol-ume lease lengths increase, the number of volume lease renewals decreases, butthe number of invalidation messages increases, since if a client holds a volumelease, the server sends invalidation messages to the client immediately, insteadof piggy-backing invalidation messages on later volume renewals. When thevolume lease length exceeds several thousand seconds, the effect of increasinginvalidation messages overtakes the effect of reducing volume lease renewalsin the IBM workload. However, server overhead with very long volume leasesis not much higher than that with the volume lease length of several thousandseconds.

Overall, volume leases increase hit rates by a factor of 1.5-3 compared toclient-polling protocols when the staleness bound is between 10 and 100 sec-onds. Furthermore when the volume lease length exceeds 1000 seconds, volumelease achieves the local hit rates that are within 5% of Precise Expiration, thetheoretical optimal protocol for maintaining cache consistency.

In Figures 7 through 9, we examine two key subsets of the requests in theworkloads. We examine the response time and average staleness for the dy-namically generated pages and the nonimage objects fetched in the workload.Table I shows that for the IBM workload, the nonimage objects account for67.6% of all objects, and requests to nonimage objects account for 29.3% ofall requests—while the dynamic objects account for 60.8% of all objects, andrequests to dynamic objects account for 12% of all requests. The fraction ofrequests to dynamic data rises to 40.9% when we exclude requests to image



Fig. 7. Local hit rates vs. staleness bound for TTL for the IBM workload. Note that in the commoncase, volume lease caches are invalidated within a few seconds of an update, independent of worst-case staleness bounds.

Fig. 8. Stale rate vs. staleness bound for TTL for the IBM workload.



Fig. 9. Average staleness vs. staleness bound for TTL for the IBM workload.

Table I. Classifying Objects and Requests in the IBM TraceAccording to URL Types

Object RequestNumber Percent Number Percent

image 9027 32.4 6165803 70.7non-image 18857 67.6 2553543 29.3dynamic 16960 60.8 1044712 12.0other non-image 1897 6.8 1508831 17.3total 27884 100 8719346 100

Table II. Classifying Objects and Requests in the E-CommerceTrace According to URL Types

Object RequestNumber Percent Number Percent

image 9255 13.3 8004528 84.2non-image 60353 86.7 1500425 15.8dynamic 59083 84.9 606073 6.4other non-image 1270 1.8 894352 9.4total 69608 100 9504953 100

objects. Table II shows that there are also a significant number of requests todynamically generated objects in our e-commerce workload.

The dynamic and other nonimage data are of interest for two reasons. First,few current systems allow dynamically generated content to be cached. Oursystem provides a framework for doing so, and no studies to date have ex-amined the impact of server-driven consistency on the cachability of dynamic



data. Several studies have suggested that uncachable data significantly limitsachievable cache performance [Wolman et al. 1999a; 1999b], so reducing un-cachable data is a key problem. Second, the cache behavior of these subsetsof data may disproportionately affect end-user response time. This is becausedynamically generated pages and nonimage objects may form the bottleneck inresponse time, since they must often be fetched before images may be fetchedor rendered. In other words, the overall hit rate data shown in Figure 1 maynot directly bear on end-user response time if a high hit rate to static imagesmasks a poor hit rate to the HTML pages.

In current systems, a number of factors limit the cachability of dynamicallygenerated data, including (1) the need to determine which objects must be inval-idated or updated when underlying data (e.g., databases) change [Challengeret al. 1999]; (2) the need for an efficient cache consistency protocol; (3) theinherent limits to caching that arise when data changes rapidly; and (4)privacy requirements that prevent some dynamic data from being cachedat proxy caches. As a result, most current systems use cache control meta-data to disable caching for dynamically generated data. Two mechanisms al-low our system to cache dynamically generated data effectively. First, as de-tailed in Section 2, our system provides an efficient method for identifyingWeb pages that must be invalidated when underlying data changes. Sec-ond, as Figures 7 through 9 indicate, volume lease strategies can signifi-cantly increase the hit rate for both dynamic pages and for the “bottleneck”nonimage pages. Simulations with our e-commerce workload yield similarresults.

Finally, the figures quantify the third limitation. One concern about cachingdynamic objects is that dynamic objects may change so quickly that cachingthem would be ineffective. This concern appears justified, at least for client-polling protocols. To reduce stale read rates for dynamic objects in client polling,clients have to frequently resynchronize with servers by using short TTLs,which leads to low cache hit rates. As shown in Figures 8 and 9, client pollingleads to high stale hit rates and high average staleness when the TTL exceeds1000 seconds. However, the cache hit rate is low when the TTL is small. Fortu-nately, volume leases allow us to eliminate stale reads in the common case andto achieve high cache hit rates. Hit rates for dynamic objects are lower than forall objects. However, as many as 25% of reads to dynamically generated datacan be returned locally with long leases, which increases the local hit rate fornonimage data by up to 10%. Since the local hit rate of nonimage data may de-termine the actual response time experienced by users, caching dynamic datawith server-driven consistency can improve cache performance by as much as10%. Further performance improvements can be made by prefetching up-to-date versions of dynamically generated objects after the cached versions havebeen invalidated.

Notice that Figure 7 shows that dynamic pages and nonimage pages are sig-nificantly more sensitive to short volume lease lengths than average pages. Thissensitivity supports the hypothesis that these pages represent “bottlenecks” todisplaying other images; dynamic pages and nonimage pages are particularlylikely to cause a miss due to volume lease renewal because they are often the



Fig. 10. Local hit rates of Adaptive TTL, Average TTL, and Volume Lease (900) for the IBMworkload.

first elements fetched when a burst of associated objects is fetched in a group.In Section 3.4, we examine techniques for reducing the hit rate impact of shortworst-case guarantees.

3.3 Adaptive TTL and Average TTL

In this section, we evaluate the set of client-polling protocols that only intendto provide low stale read rates. Since these protocols do not provide worst-casestaleness bounds, they are not compared against volume lease in the previoussections. One such TTL algorithm is adaptive TTL [Cate 1992]. Adaptive TTL isdesigned to reduce stale reads by adjusting client-polling frequencies to modifi-cation frequencies of Web objects. In adaptive TTL, the TTL is set to be propor-tional to an object’s age, the amount of time since the last modification. We callthis proportion the update threshold of the adaptive TTL. The second algorithmis Average TTL. In Average TTL, the TTL of an object is set to be the productof its average life time over the entire trace and the update threshold. Herean object’s life is defined as the amount of time between two adjacent updates.Average TTL can be a reasonable approach because researchers have observedthat update patterns of some Web objects can be closely approximated by aPoisson distribution [Brewington and Cybenko 2000; Cho and Garcia-Molina2000] with fixed modification frequencies.

As shown in Figures 10 through 12, volume lease with reasonable volumelength can achieve higher local hit rates than average TTL, and adaptive TTLwith higher than 100% of update threshold, while only introducing insignificantamounts of additional network overhead. While adaptive TTL with higher than



Fig. 11. Number of network messages generated by Adaptive TTL, Average TTL, and VolumeLease (900) for the IBM workload.

Fig. 12. Stale hit rates of all the objects, the nonimage objects, and the dynamic objects runningAdaptive TTL, Average TTL for the IBM workload.



100% threshold and Average TTL introduce significant stale read rates, volumelease always returns fresh data in the absence of failures. The results fromsimulations with our e-commerce workload are similiar. Thus, volume lease isattractive even for applications that do not need worst-case staleness boundguarantees.

3.4 Prefetching/Pushing Lease Renewals

The above experiments assume that when a volume lease expires, the nextrequest must go to the server to renew it. A potential optimization is to prefetchor push volume lease renewals to clients before their leases expire. For example,a client whose volume lease is about to expire might piggy-back a volume leaserenewal request on its next message to the server [Yin et al. 1998], or it mightsend an additional volume lease renewal prefetch request even if no requestsfor the server are pending. Alternately, servers might periodically push volumelease renewals to clients via unicast or multicast heartbeat [Yu et al. 1999].

Regardless of whether renewals are prefetched or pushed and whether theyare unicast or multicast, the same fundamental trade-offs apply. More aggres-sive prefetching keeps clients and servers synchronized for longer periods oftime, increasing cache hit rates, but also increasing network costs, server load,and client load.

Previous studies assumed extreme positions regarding prefetching volumelease renewals. Yin et al. [1999b] assumed that volume lease renewals arepiggy-backed on each demand request, but that no additional prefetching isdone; soon after a client becomes idle with respect to a server, its volume leaseexpires, and the client has to renew the volume lease in the next request tothe server’s data. Conversely, Li and Cheriton [1999] suggest that to amortizethe cost of joining multicast hierarchies, clients should stay connected to themulticast heartbeat and invalidation channel from a server for hours or daysat a time.

In Figure 13 and Figure 14, we examine the relationship between pushingor prefetching renewals, read latency, and network overhead. In interpretingthese graphs, consider that in order to improve read latency by a given amount,we could increase the volume lease length by a factor of K . Alternatively, wecould get the same improvement in read latency by prefetching the lease Ktimes as it expires. We would expect that most services would choose the worst-case staleness guarantee they desire and then add volume lease prefetching ifthe improvement in read latency justifies the increase in network overhead.

As illustrated in Figure 13, volume lease pull or push can achieve higher localhit rates than basic volume leases for the same freshness bound. In a push-Kalgorithm, if a client is idle when a demand-fetched volume lease expires, theclient prefetches or the server pushes to the client up to K−1 successive volumelease renewals. Thus, if each volume renewal is for length V , the volume leaseremains valid for K · V units of time after a client becomes idle. If a client’saccesses to the server resume during that period, they are not delayed by theneed for an initial volume lease renewal request.

Both push-2 and push-10 shift the basic volume lease curve upward for shortvolume leases, and larger values of K increase these shifts. Also note that the



Fig. 13. Prefetching/pushing volume leases to reduce read latency for the IBM workload. Push yvolume leases (x) represents a volume lease protocol with volume lease length equal to x whichpushes or prefetches volume leases y − 1 times after a volume lease expires.

Fig. 14. Cost of pushing volume leases for the IBM workload.



benefits are larger for the dynamic elements in the workload, suggesting thatprefetching may improve access to the bottleneck elements of a page.

However, pulling or pushing extra volume lease renewals does increase clientload, server load, and network overhead. This overhead increases with thenumber of renewals prefetched after a client’s accesses to a volume cease. Fora given number of renewals, this overhead is lower for long volume leases thanfor short ones.

Systems may use multicast or consistency hierarchies to reduce the overheadof pushing or prefetching renewals. Note that although these architectures mayeffectively eliminate the volume renewal load on the server and may signifi-cantly reduce volume lease renewal overhead in server areas of the network,they do not affect the volume renewal overhead at clients. Although client re-newal overhead should not generally be an issue, widespread aggressive volumelease prefetching or pushing could impose significant client overheads in somecases. For example, in the traces of the Squid regional proxies taken duringJuly 2000, these busy caches access tens of thousands of different servers perday [Chandra et al. 2001].

In general, we conclude that although previous studies have examined ex-treme assumptions for prefetching [Yu et al. 1999; Li and Cheriton 1999], itappears that for this workload, modest amounts of prefetching are desirablefor minimizing response time when short volume leases are used, and littleprefetching is needed at all for long volume leases. This is because after a fewhundred seconds of a client not accessing a service, maintaining valid volumeleases at that client has little impact on latency.

3.5 Scalability

Large-scale Web services present several potential challenges to scalability.First, callback-based systems typically store state that is proportional to thetotal number of objects cached by clients. In the worst case, this state couldgrow to be proportional to the total number of clients multiplied by the num-ber of objects. Second, when a set of popular objects is modified, servers incallback-based systems send callbacks to the clients caching those objects. Inthe worst case, such a burst of load could enqueue a number of messages equalto the number of clients using the service multiplied by the number of objectssimultaneously modified. For both memory consumption and bursts of load, ifuncontrolled, this worst-case behavior could prevent deployment for the IBMSporting and Event or the e-commerce service we examined.

A wide range of techniques for reducing memory capacity demands or burstsof load are possible. Some have been evaluated in isolation, while others havenot been explored. There has been no previous direct comparison of these tech-niques to one another.

— Hierarchy or precise multicast. Using a hierarchy of consistency serversto flood invalidation messages to caches [Yin et al. 1999b] can reduce burstsof load at the server. Precise multicast, which distributes invalidations toclients via multicast and ensures that clients receive invalidations only forobjects they are caching, can accomplish the same thing. Precise multicast



can be implemented by having separate multicast channels per object or byfiltering multicast distribution on a per-object basis in the network or usinga hierarchy of consistency servers [Yin et al. 1999a]. Note that although ahierarchy or precise multicast reduces the amount of state at the centralserver, the total amount of callback state, and thus the global system cost, isnot directly reduced by a hierarchy.

— Imprecise multicast invalidates. Imprecise multicast invalidation [Liand Cheriton 1999] combines two ideas. It uses a multicast hierarchy toflood invalidation messages and imprecise invalidations to reduce state. Im-precision of invalidations stems from the use of a single unfiltered multicastchannel to transmit invalidations for all objects in a volume. The advantageof imprecise invalidations is reduced state; state at the server and multicasthierarchy is proportional to the number of clients subscribed to the volume,rather than to the number of objects cached across all clients. The disadvan-tage of imprecise invalidations is increased invalidation message load at theclients and in the network near the clients.

— Delayed invalidation messages. Rather than sending invalidation mes-sages immediately, systems may delay when invalidation messages are sentto reduce bursts of load. There are two variations. Delayed invalidations [Yinet al. 1998] enqueue invalidation messages to clients whose volume leaseshave expired and send the enqueued messages in a group when a clientrenews its volume lease. Background invalidations place invalidation mes-sages in a separate send queue from replies to client requests and sendinvalidations only when spare capacity is available. Note that backgroundinvalidations may increase the average staleness of data observed by clients,while delayed invalidations have no impact on average staleness of datareads. At the same time, while both techniques reduce bursts of load, back-ground invalidations also have the ability to impose a hard upper bound onthe maximum load from invalidations.

— Forget idle clients. Two techniques allow servers to drop callbacks on ob-jects cached by idle clients, and thereby reduce server memory requirements.First, by issuing short object leases, servers can discard callback state when aclient’s lease on an object expires. Duvvuri et al. [2000] examine techniquesfor optimizing the lease lengths of individual objects. Second, servers canmark clients whose volume leases have expired some amount of time in thepast as “unreachable,” and drop all callback state for unreachable clients [Yinet al. 1998]. When an unreachable client renews its volume lease, the clientand server must execute a reconnection protocol to synchronize server call-back state with client cache contents. The next section discusses reconnec-tion protocols. The fundamental trade-offs for both approaches are the same:shorter leases reduce memory consumption but also increase consistencymisses and synchronization overhead. Either algorithm can enforce a hardlimit on memory capacity consumed by adaptively shortening leases as spaceconsumption increases [Duvvuri et al. 1999].

In evaluating this range of options, two factors must be considered.First, given the potential worst case memory and load behavior of callback



Fig. 15. Callback state increases with elapsed time for the IBM workload. In this graph, Hierarchy(m) + Volume Lease (l) represents the hierarchical volume lease algorithm with the volume leaselength equal to l units of time running over a hierarchy in which each node in the hierarchy canhave up to m children. Note that the Server State in Hierarchy lines overlap the x-axis.

consistency, a system should enforce a hard worst-case limit on state consump-tion and bursts of load regardless of the workload. Second, systems should selecttechniques that minimize damage to hit rates and overhead for typical loads.

3.5.1 Server Callback State. Figure 15 shows the number of object leasesstored as a function of elapsed time in the trace for a nonhierarchical systemand for the consistency hierarchies (or precise multicast systems) with fan-outof 10 or 100 children per invalidation server. Note that whether the server holdsan object lease on an object for a client is determined only by whether the clientcaches the object, and is independent of volume lease length. For the time periodcovered in our trace, server memory consumption increases linearly. Althoughfor a longer trace, higher hit rates might reduce the rate of growth, for theZipf workload distributions common on the Web [Breslau et al. 1998; Wolman1999a; 1999b], hit rates improve only slowly with increasing trace length, anda nearly constant fraction of requests will be compulsory cache misses. Nearlylinear growth in state therefore may be expected even over long time scales formany systems.

Although the near linear growth in state illustrates the need to boundworst-case consumption, the rate of increase for this workload is modest. After24 hours, fewer than 5 million leases exist in one of the four Sporting andEvent server clusters even with infinite object leases. Our prototype consumes62 bytes per object lease, so this workload consumes 310 million bytes per dayunder the baseline algorithm for the whole system. This corresponds to 0.4%



Fig. 16. Distribution of invalidation burstiness for the IBM workload. A point of (x, y) means thatthe server generates at least y messages per second during x percentage of time.

of the memory capacity of the 143-processor system that actually served thisworkload in a production environment. In other words, this busy server couldkeep complete callback information for 10 days and increase its memory re-quirements by less than 4%.

These results suggest that either of the “forget idle clients” approaches canlimit maximum memory state without significantly hurting hit rates or increas-ing lease renewal overhead, and that performance will be relatively insensitiveto the detailed parameters of these algorithms. Because systems can keep sev-eral days of callback state at little cost, further evaluation of these detailedparameters will require longer traces than we have available to us.

The same conclusion applies to consistency hierarchies or precise multicast.As Figure 15 indicates, although a consistency hierarchy can reduce servercallback state by orders of magnitude, the total amount of callback state main-tained by all the machines in a hierarchy increases compared to that of anonhierarchical central server.

3.5.2 Bursts of Load. Figure 16 shows the cumulative distribution ofserver load, approximated by the number of messages sent and received bya server with no hierarchy. As we can see from the right edge of this graph,volume leases with callbacks reduce average server load compared to TTL.However, as can be seen from the left side of the graph, the peak server loadincreases by a factor of 100 for volume leases without delayed invalidations.

This figure shows that delayed invalidations can reduce peak load by afactor of 76 for short volume lease periods and 15 for long volume lease



Fig. 17. Distribution of delay of invalidation messages under background invalidations for VolumeLease (900).

periods, but even with delayed invalidations, peak load is increased by a fac-tor of 6 for 15 minute volume leases. This increase is smaller for short vol-ume leases and larger for long volume leases, since delayed invalidations’ ad-vantage stems from delaying messages to clients whose volume leases haveexpired.

Further improvements can be gained by also using background invalida-tions. In Figure 17, we limit the server message rate to 200 messages per sec-ond, which is approximately the average load of TTL, and we send invalidationmessages as soon as possible, but only using the spare capacity. Well over 99.9%of invalidation messages are transmitted during the same second they are cre-ated, and no messages are delayed more than 11 seconds. Thus, backgroundinvalidation allows the server to place a hard bound on load burstiness withoutsignificantly hurting average staleness.

Figure 5 shows the average staleness for the traditional TTL polling proto-col. The data in Figure 17 allows us to understand the average staleness thatcan be delivered by invalidation with volume leases. Clients may observe staledata if they read objects between when the objects are updated at the serverand when the updates appear at the client. There are two primary cases toconsider. First, the network connection between the client and server fails. Inthat case, the client may not see the invalidation message, and data stalenesswill be determined by the worst-case staleness bound from leases. Fortunately,failures are relatively uncommon, and this case will have little effect on averagestaleness. Second, server queuing and message propagation time will leave awindow when clients can observe stale data. The data in Figure 17 suggests



that this window will likely be at most a few seconds, plus whatever propagationdelays are introduced by the network.

We conclude that delaying invalidation messages makes unicast invalidationfeasible with respect to server load. This is encouraging because it simplifies de-ployment: systems do not need to rely on hierarchies or multicast to limit serverload. In the long run, hierarchies or multicast are still attractive strategies forfurther reducing latency and average load [Yin et al. 1999b; Yu et al. 1999].

3.5.3 Client Issues. Server-driven consistency protocols appear to intro-duce several challenges for clients. First, a client can interact with a largenumber of servers over a long period of time. Thus, a client may need to processinvalidation messages from a large number of servers. However, our analy-sis suggests that volume lease can manage client overhead effectively. Volumelease effectively caps the total number of invalidation messages that a clientcan receive and the burstiness of the invalidation messages. In volume lease,servers maintain the callback state of client caches so that invalidation mes-sages are only sent to the cached objects that have not been invalidated. Thus,the number of invalidation messages can be no larger than the number of readsby clients. Moreover, our previous simulation results suggest that volume leasegenerally leads to a smaller number of messages between servers and clientscompared to client polling.

Second, in naive server-driven consistency protocols, a client may sufferbursts of invalidation messages if the cached objects from many servers areupdated at the same time. This issue can be effectively addressed by delayedinvalidation as discussed in the background section. With delayed invalidation,a server sends invalidation messages to a client only if the client holds the vol-ume lease from that server. The amount of time that a volume lease is validafter the time that the client reads an object from the volume is equal to thevolume lease length. Thus at any point in time, a client only receives messagesfrom the small number of servers that it has accessed within the period of timethat equals the volume lease length.

Third, a server may not always be able to contact a client when it needs tosend invalidation messages. For example, a mobile client can be disconnectedfrom networks occasionally, or it may choose to power down to save batterypower. We call these scenarios disconnections. Volume lease provides efficientmechanisms to recover after a disconnection. These mechanisms are discussedin the next section.

3.6 Resynchronization

In server-driven consistency, servers’ consistency state must be synchronizedwith client cache contents to bound staleness for client reads. This synchroniza-tion can be lost due to failures, which include server crashes, client crashes, andnetwork partitions. Additionally, to achieve scalability, servers may drop call-backs of idle clients to limit server state, as discussed in Section 3.5.

These disconnections can be roughly divided into two groups based onwhether the consistency state before a disconnection survives the disconnec-tion. State-preserving disconnections caused by network partitions preserve



Fig. 18. Hit rates after recovery for a disconnection of 1 second for the IBM workload.

the consistency state prior to the disconnections. State-losing disconnectionscaused by server crashes, client crashes, and deliberate protocol disconnectionsresult in loss of consistency state. In this section, we systematically study thedesign space of resynchronization to recover from all these disconnections.

There are three potential policies to control how aggressively clients resyn-chronize with servers. At one extreme, demand revalidation marks all cachedobjects as potentially stale after reconnection and revalidates each object in-dividually as it is referenced. At the other extreme, immediate revalidationrevalidates all cached objects immediately after reconnections to reduce theread latency associated with revalidating each object individually. When theoverhead of revalidating all cached objects is high, immediate revalidationmay delay clients’ access to servers immediately after reconnections. To ad-dress this problem, background revalidation allows bulk revalidation to beprocessed in the background. Some previous studies have assumed that de-mand revalidation is sufficient [Liu and Cao 1997], while others have assumedthat immediate revalidation is justified [Yin et al. 1999b]. In this study, wequantitatively evaluate these two options and the middle ground, backgroundrevalidation.

Figures 18 and 19 show that immediate revalidation achieves higher averagelocal hit rates than demand revalidation. The performance disparity betweenimmediate revalidation is larger immediately after failures and decreases overtime as more cached objects are accessed and validated in demand revalida-tion. The local hit rates of background revalidation would range between thesetwo lines in the graph—they equal those of demand revalidation immediatelyafter reconnection, and would increase to those of immediate revalidation as



Fig. 19. Hit rates after recovery for a disconnection of 1000 seconds for the IBM workload.

background revalidation completes. The benefit of immediate and backgroundrevalidation is also affected by disconnection duration. When disconnectionduration is short, the number of cached objects that are invalidated duringdisconnections is small. Moreover, because of read locality, the chance of read-ing these cached objects after recovery is high. Hence, as shown by Figures 18and 19, the benefit of immediate and background revalidation is significantfor short disconnections. Conversely, when a disconnection duration is long,demand revalidation may be sufficient.

To implement demand revalidation, systems only need to detect reconnec-tions and mark all cached objects as potentially stale by dropping all objectleases. Revalidating cached objects in demand revalidation is the same as val-idating cached objects after the object leases expire. Two additional mecha-nisms can be added to support immediate or background revalidation. First, inbulk revalidation, a client simply sends a revalidation message containing re-quests to revalidate a collection of objects. The server processes each includedrequest as it would have if it had been sent separately and on demand, ex-cept that the server replies with a bulk revalidation message containing ob-ject leases for all unchanged objects and invalidations for objects that havechanged. Second, in delayed invalidation, the server buffers the invalidationsthat should be sent to a client when a network partition makes a client un-reachable from the server or when the server decides to delay sending inval-idation messages to an idle client to reduce server load. When the server re-ceives a volume lease request message from the client, the server piggy-backsthe buffered invalidations on the reply message granting the client a volumelease. The client applies these buffered invalidations to resynchronize with the



Fig. 20. Resynchronization cost for bulk revalidation and delayed invalidation for the IBMworkload.

server. Note that delayed invalidation may only be used after state-preservingdisconnections.

The overhead of bulk revalidation and delayed invalidation primarily de-pends on the number of cached objects and on the number of objects invalidatedduring the disconnection. In the case of bulk revalidation, server load and net-work bandwidth are proportional to the number of cached objects; in delayedinvalidation, they are instead determined by the number of invalidated objects.As Figure 20 shows, bulk revalidation must examine an average of more than100 objects. For some recovered clients, several thousand objects must be com-pared during bulk revalidation. Delayed invalidation can be used to reducethe cost of immediate revalidation for state-preserving disconnections, sincethe number of cached objects is two orders of magnitude less than the numberof invalidated objects for disconnections shorter than 1000 seconds. Unfortu-nately, for state-losing disconnections, delayed invalidation is not an option.Because bulk revalidation may have to revalidate hundreds or thousands ofobjects, the system should support background revalidation, rather than rely-ing solely on immediate revalidation. These conclusions also apply to the resultsfrom simulations with our e-commerce workload.

In conclusion, server-driven consistency protocols must implement someresynchronization mechanisms for fault tolerance and scalability. Demandresynchronization is a good default choice, since it handles all disconnectionsand is simple to implement. Background bulk revalidation may be needed toreduce read latency when recovering from short disconnections, and delayedinvalidation may be desirable to reduce resynchronization overheads for shortstate-preserving disconnections.



4. PROTOTYPE

We have implemented server-driven consistency based on volume leases withSquid cache version 2.2.5. The consistency module includes a server part, whichsends invalidation messages in response to writes and issues volume and objectleases, and it includes a client part, which manages consistency information onlocally cached objects to satisfy reads. Servers maintain a per-object versionnumber as well as per-volume and per-object share lists. These lists containthe set of clients that may hold valid leases on the volumes and objects, respec-tively, along with the expiration time of each lease. Clients maintain a volumeexpiration time as well as per-object version numbers and expiration times.Objects and volumes are identified by URLs, and object version numbers areimplemented by HTTP Etag or modification times [Fielding et al. 1999].

On a client request for data, a callback-enabled client includes aVLease-Request field in the header of its request. This field indicates a net-work port at the client that may be used to deliver callbacks. A callback-enabled server includes volume lease and object leases as Volume-Lease-Forand Object-Lease-For headers of replies to client requests. Invalidation andInvalidation-Ack headers are used to send invalidation messages to clientsand to acknowledge receiving invalidations by clients.

The mechanisms provided by the protocol support either client-pull or server-push volume lease renewal. At present, we implement the simple policy ofclient-pull volume lease renewal. Volume lease requests and replies can usethe same channels used to transfer data or can be exchanged along dedicatedchannels.

4.1 Hierarchy

We construct the system to support a hierarchy in which each level grantsleases to the level below and requests leases from the level above [Yin et al.1999b]. The top-level cache is a reverse proxy that intercepts all requests tothe origin server and caches all replies, including dynamically generated data.The top-level cache is configured to hold infinite object and volume leases onall objects and to pass shorter leases to its children. Invalidations are sent tothe top-level cache using the standard invalidation interface used for communi-cation between parent and child caches. These invalidations can be generatedby systems such as the trigger monitor used at the major Sporting and EventWeb sites hosted by IBM [Challenger et al. 1999]. The trigger monitor main-tains correspondences between underlying data (e.g., databases) affecting Webpage content and the Web pages themselves. In response to changes to under-lying data, the trigger monitor determines which cached pages are affected andpropagates invalidation or update messages to the appropriate caches.

The hierarchy provides three benefits. First, it simplifies our prototype byallowing us to use a single implementation for servers, proxies, and clients.Second, hierarchies can considerably improve the scalability of lease systemsby forming a distribution tree for invalidations and by serving renewal requestsfrom lower-level caches [Yin et al. 1999b]. Third, reverse-proxy caching of dy-namically generated data at the server can achieve high hit rates and can



dramatically reduce server load [Challenger et al. 1998]. By implementingour system as a hierarchy, we make it easy to gain these advantages. Fur-ther, if a multilevel hierarchy is used (such as the Squid regional prox-ies (http://www.squid-cache.org/) or a cache mesh [Tewari et al. 1999]), wespeculate that nodes higher in the hierarchy will achieve hit rates betweenthe per-client cache hit rates and the at-server cache hit rates illustrated inSection 3.

4.2 Reliable Delivery of Invalidations

In order to maintain an upper bound on worst-case staleness, volume leasesystems must maintain the following invariant: a client may not receive a vol-ume lease renewal unless all of its cached objects that were modified before thetransmission of the volume renewal have been invalidated. If this invariant isviolated, an object may be modified at time T1, and the client may then receive avolume lease renewal valid until time T2>T1+Tv. If a network partition thenoccurs, the client could access stale cached data longer than Tv seconds after itwas modified.

The system must therefore reliably deliver invalidations to clients. It doesso in two ways. First, it uses a delayed invalidation buffer to maintain reli-able invalidation delivery across different transport-level connections. Second,it maintains epoch numbers and an unreachable list to allow servers to resyn-chronize after servers discard or lose client state [Yin et al. 1999b].

We use TCP as our transport layer for transmitting invalidations, but, un-fortunately, this does not provide the reliability guarantees we require. In par-ticular, although TCP provides reliable delivery within a connection, it cannotprovide guarantees across connections: If an invalidation is sent on one con-nection and a volume renewal on another, the volume renewal may be receivedand the invalidation may be lost if the first connection breaks. Unfortunately,a pair of HTTP nodes will use multiple connections to communicate in at leastthree circumstances. First, HTTP 1.1 allows a client to open as many as twosimultaneous persistent connections to a given server [Mogul 1996]. Second,HTTP 1.1 allows a server or client to close a persistent connection after anymessage; many modern implementations close connections after short periodsof idleness to save server resources. Third, a network, client, or server fail-ure may cause a connection to close and a new one to be opened. In additionto these fundamental limitations of TCP, most implementations of persistentconnection HTTP are designed as performance optimizations, and they do notprovide APIs that make it easy for applications to determine which messageswere sent on which channels.

We therefore implement reliable invalidation delivery that is independentof transport-layer guarantees. Clients send explicit acknowledgments to inval-idation messages, and servers maintain lists of unacknowledged invalidationmessages to each client. When a server transmits a volume lease renewal to aclient, it piggy-backs the list of the client’s unacknowledged invalidations us-ing a Group-Object-Version header field. Clients receiving such a message mustprocess all invalidations in it before processing the volume lease renewal.



Three aspects of this protocol are worth noting.First, the invalidation delivery requirement in volume leases is weaker than

strict reliable in-order delivery, and the system can take advantage of that.In particular, a system need only retransmit invalidations when it transmitsa volume lease renewal. At one extreme, the system can mark all packets asvolume lease renewals to keep the client’s volume lease fresh, but at the costof potentially retransmitting more invalidations than necessary. At the otherextreme, the system can only send periodic heartbeat messages and handle allretransmission at the end of each volume lease interval [Yu et al. 1999].

Second, the queue of unacknowledged invalidations provides the basis foran important performance optimization: delayed invalidation [Yin et al. 1998].Servers can significantly reduce their average and peak load by not transmit-ting invalidation messages to idle clients whose volume leases have expired.Instead, servers place these invalidation messages into the idle clients’ un-acknowledged invalidation buffer (also called the delayed invalidation buffer)and do not transmit these messages across the network. If a client becomes ac-tive again, it first asks the server to renew its volume lease, and the servertransmits these invalidations with the volume lease renewal message. Theunacknowledged invalidation list thus provides a simple, fast reconnectionprotocol.

Third, the mechanism for transmitting multiple invalidations in a singlemessage is also useful for atomically invalidating a collection of related objects.Our protocol for caching dynamic data supports documents that are constructedof multiple fragments, and atomic invalidation of multiple objects is a key build-ing block [Challenger et al. 2000].

The system also implements a protocol for resynchronizing client or serverstate when a server discards callback state about a client. This occurs after aserver crash or when a server deliberately discards state for idle clients. Thesystem includes epoch numbers [Yin et al. 1998] in messages to detect loss ofsynchronization due to crashes. The servers maintain unreachable lists, lists ofclients whose state has been discarded to detect when such clients reconnect.If a reply to a client request includes an unexpected epoch number or a headerindicating that the client is on the unreachable list, the client invalidates allobject leases for the volume and renews them on demand. A subject of futurework is to implement a bulk revalidation protocol.

4.3 Evaluation

We evaluate our implementation with a standard benchmark of theWeb caching industry: the first semi-annual web caching bake-off work-load (http://cacheoff.ircache.net/N01/). Our testbed includes four computers.Two of them run the workload. The consistency server and the consistencyclient, which are the Squid proxies augmented with server-driven consistency,run on two other machines. The consistency server is placed in front of theworkload server, which delivers data requested by clients after retrieving itfrom the workload server. The consistency server also issues leases and sendsinvalidation messages.



Our initial evaluation shows that our implementation of server-driven con-sistency, compared to the standard Squid cache, increases the load on the con-sistency server by less than 3% and increases read latency by less than 5%,while sustaining a throughput of 70 requests per second. In the future, we planto implement an architecture that decouples the consistency module from theother parts of the Web or proxy server that deliver data, and to quantify thecomputing resources (CPU, memory) needed to maintain server-driven consis-tency for the traces examined here.

5. RELATED WORK

Many researchers have observed that Web cache consistency is critical to Webperformance. For example, Padmanabhan and Qiu [2000] examined a large-scale enterprise Web trace. They concluded that traditional client polling isunsatisfactory for their workload.

Owing to its significant impact on end-to-end Web performance, Web cacheconsistency has generated much interest in the research community. Basedon their simulation study with write traces collected from Web servers andsynthetic read traces, Gwertzman and Seltzer [1996] concluded that adaptiveTTL can provide good performance for applications that can tolerate 4% stalereads. Evaluating a prototype with real-world traces, Liu and Cao [1997] foundthat much of the bandwidth-saving benefit of adaptive TTL is derived fromreading stale data by clients and that server-driven consistency protocols canimprove freshness of client reading without introducing significant overhead.However, they discovered that there are two challenges in deploying server-driven consistency protocols: scalability and fault tolerance. In particular, theypoint to (i) the bursts of server load caused by invalidations sent when popularobjects are written; (ii) the growth of the state that the server maintains totrack the caches of its clients; and (iii) if partial failures prevent servers fromcontacting clients, then clients may continue to return stale data. These threechallenges are addressed in this article.

Several researchers studied how to use multicast to scale server-driven con-sistency protocols. With multicast, invalidation messages can be delivered pre-cisely, in which only the set of invalidation messages required by a client aredelivered to the client, or imprecisely, in which extra invalidations are deliveredto a client. Yu et al. [1999] proposed a scalable cache consistency architecturethat integrates the ideas of invalidation, volume lease, and unreliable imprecisemulticast. They used synthetic workloads of single pages and focused their eval-uation on network performance of server-driven consistency. Li and Cheriton[1999] proposed using reliable, precise multicast to deliver invalidations andupdates for frequently modified objects. The workloads in their study includedclient traces, proxy traces, and synthetic traces.

Other studies examined specific design optimizations to reduce overhead orread latency. Mogul [1996] independently proposed a notion of grouping filesinto volumes to reduce the overhead of client polling. Duvvuri et al. [1999]examined adapting object leases to reduce server state and messages. Thesetechniques can also be employed in our protocol to improve scalability. Cohen



et al. [1998] studied the use of volumes for prefetching and consistency. Theconsistency algorithms they examined are best-effort algorithms based on clientpolling. Krishnamurthy and Wills [1998] examined ways to improve polling-based consistency by piggy-backing optional invalidation messages on othertraffic between a client and server. While their study doesn’t provide the worst-case staleness bounds, several techniques in their study can also be exploited toimprove performance and scalability of server-driven consistency. For example,our protocol allows servers to send delayed invalidations to clients by piggy-backing them on top of other traffic between servers and clients. In the samepaper, they also proposed to group related objects into volumes and to send theinvalidations on all objects contained in volumes, instead of just invalidationson the objects that a client caches. This imprecise unicast idea obviates the needfor the server to track the callback state related to each client, resulting in ascheme that may be used in some extreme cases by server-driven consistencyprotocols to limit server state. Cohen and Kaplan [2001] observed that pollingcached objects when TTLs expire, while the cached objects are still valid, cansignificantly increase read latency. Thus they proposed to proactively refreshTTLs. This technique is similar to the volume lease prefetching we discuss inSection 3.4, but it effectively prefetches individual object leases rather thanresynchronizing an entire volume with one lease renewal.

Finally, we observe that cache consistency protocols have long been studiedfor distributed file systems [Howard et al. 1988; Mogul 1994; Nelson et al.1988; Sandberg et al. 1985]. In particular, Howard et al. [1988] comparedNFS’s polling to AFS’s callback-based strategy and concluded that polling in-curs higher server load and stale hit rates. The notion of invalidation andthat of leases for fault tolerance have been examined in this context [Fray andCheriton 1989]. Baker studied methods for fast resynchronization of callbackstate [Baker 1994] and showed that server-driven recovery can be used to avoid“recovery storm,” the overwhelming recovery load at a server immediately aftera server reboot. Background revalidation in our framework allows the server tocontrol the load during revalidation, which has the same spirit as server-drivenrecovery.

6. CONCLUSIONS

Although server-driven consistency can provide significant performance advan-tages over traditional client-polling systems, the feasibility of deploying sucha system depends on the scalability and performance of these server-drivenconsistency algorithms over a wide range of applications. Large-scale servicesdelivering both static and dynamically generated data are an important class ofapplications to be considered, since objects served by such applications changeunpredictably and frequently, and the scale of such a service presents manychallenges. In this study we find that server-driven consistency can meet thescalability, performance, and consistency requirements of these services. First,we find that we can put a limit on callback state growth with little performancepenalty and that we can smooth out server burstiness introduced by invalida-tions without significantly increasing average staleness. Second, we find that



how long servers and clients are kept synchronized can greatly influence perfor-mance and overhead. However, for these workloads there is little performancebenefit in keeping servers and clients synchronized longer than 1000 secondsafter a read. Third, we find that delayed invalidation is the most efficient fault-recovery protocol for the most common failures in today’s Internet. Overall,server-driven consistency can offer excellent performance for both static anddynamically generated data in large-scale Web services.

ACKNOWLEDGMENTS

The authors would like to thank Jim Challenger and Paul Dantzig for theirhelp in acquiring and interpreting the IBM Sporting and Event web logs. Theauthors would also like to thank Jeffrey Mogul, Craig Wills, and the anonymousreviewers for their suggestions which helped in enhancing the quality of thisarticle.

This work was supported in part by DARPA/SPAWAR grant N66001-98-8911,NSF CISE grant CDA-9624082, the Texas Advanced Technology Program, anIBM Faculty Partnership Award, and Tivoli. Alvisi was also supported by anNSF CAREER award (CCR-9734185) and an Alfred P. Sloan Fellowship. Dahlinwas also supported by an NSF CAREER award (CCR-9733842) and an AlfredP. Sloan Fellowship.

REFERENCES

BAKER, M. 1994. Fast crash recovery in distributed file systems. PhD thesis, University of Cali-fornia at Berkeley.

BRESLAU, L., CAO, P., FAN, L., PHILLIPS, G., AND SHENKER, S. 1998. On the implications of Zipf ’s lawfor web caching. Technical Report 1371, University of Wisconsin (Apr. 1998).

BREWINGTON, B. AND CYBENKO, G. 2000. How dynamic is the web? In World Wide Web (May 2000).http://cacheoff.ircache.net/N01/.CATE, V. 1992. Alex—a global filesystem. In Proceedings of the 1992 USENIX File System Work-

shop (May 1992).CHALLENGER, J., DANTZIG, P., AND IYENGAR, A. 1998. A scalable and highly available system for

serving dynamic data at frequently accessed web sites. In Proceedings of ACM/IEEE SC98 (Nov.1998).

CHALLENGER, J., IYENGAR, A., AND DANTZIG, P. 1999. A scalable system for consistently cachingdynamic web data. In Proceedings of IEEE INFOCOM’99 (Mar. 1999).

CHALLENGER, J., IYENGAR, A., WITTING, K., FERSTAT, C., AND REED, P. 2000. A publishing system forefficiently creating dynamic web content. In Proceedings of IEEE INFOCOM 2000 (Mar. 2000).

CHANDRA, B., DAHLIN, M., GAO, L., AND NAYATE, A. 2001. End-to-end wan service availability. InProceedings of the USENIX Symposium on Internet Technologies and Systems (USITS01).

CHO, J. AND GARCIA-MOLINA, H. 2000. Synchronizing a database to improve freshnes. In VLDB,2000.

COHEN, E. AND KAPLAN, H. 2001. Refreshment policies for web content caches. In INFOCOM 2001.COHEN, E., KRISHNAMURTHY, B., AND REXFORD, J. 1998. Improving end-to-end performance of the

web using server volumes and proxy filters. In SIGCOMM ’98.http://httpd.apache.org/docs/logs.html.DUVVURI, V., SHENOY, P., AND TEWARI, R. 1999. Adaptive leases: A strong consistency mechanism

for the World Wide Web. In INFOCOM 2000.FIELDING, R., GETTYS, J., MOGUL, J., FRYSTYK, H., MASINTER, L., LEACH, P., AND BERNERS-LEE, T. 1999.

Hypertext Transfer Protocol—HTTP/1.1. Request for Comments 2616, Network Working Group(June 1999).



GRAY, C. AND CHERITON, D. 1989. Leases: An efficient fault-tolerant mechanism for distributedfile cache consistency. In Proceedings of the Twelfth ACM Symposium on Operating SystemsPrinciples, 202–210.

GWERTZMAN, J. AND SELTZER, M. 1996. World-Wide Web cache consistency. In Proceedings of the1996 USENIX Technical Conference (Jan. 1996).

HOWARD, J., KAZAR, M., MENEES, S., NICHOLS, D., SATYANARAYANAN, M., SIDEBOTHAM, R., AND WEST, M.1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.1988), 51–81.

IYENGAR, A. AND CHALLENGER, J. 1997. Improving Web server performance by caching dynamicdata. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (Dec.1997).

IYENGAR, A., SQUILLANTE, M., AND ZHANG, L. 1999. Analysis and characterization of large-scale webserver access patterns and performance. In World Wide Web (June 1999).

KRISHNAMURTHY, B. AND WILLS, C. 1998. Piggyback Server Invalidation for proxy cache coherency.In Proceedings of the Seventh International World Wide Web Conference 185–193.

LI, D. AND CHERITON, D. R. 1999. Scalable Web caching of frequently updated objects using reliablemulticast. In Proceeding of the 2nd USENIX Symposium on Internet Technologies and Systems(USITS’99) (Oct. 1999).

LIU, C. AND CAO, P. 1997. Maintaining strong cache consistency in the World-Wide Web. In Pro-ceedings of the Seventeenth International Conference on Distributed Computing Systems (May1997).

MOGUL, J. 1996. http://www.roads.lut.ac.uk/lists/http-caching/1996/01/0002.html.MOGUL, J. 1994. Recovery in Spritely NFS. Comput. Syst. 7, 2 (Spring 1994), 201–262.MOGUL, J. 1996. A design for caching in HTTP 1.1 preliminary draft. Tech. Rep., Internet Engi-

neering Task Force (IETF), Jan. 1996. Work in progress.NELSON, M., WELCH, B., AND OUSTERHOUT, J. 1988. Caching in the sprite network file system. ACM

Trans. Comput. Syst. 6, 1 (Feb. 1988).PADMANABHAN, V. AND QIU, L. 2000. The content and access dynamics of a busy web site: Findings

and implications. In SIGCOMM ’2000.SANDBERG, R., GOLDBERG, D., KLEIMAN, S., WALSH, D., AND LYON, B. 1985. Design and implementation

of the Sun network filesystem. In Proceedings of the Summer 1985 USENIX Conference (June1985), 119–130.

TEWARI, R., DAHLIN, M., VIN, H., AND KAY, J. 1999. Design considerations for distributed caching onthe Internet. In Proceedings of the Nineteenth International Conference on Distributed ComputingSystems (May 1999).

WOLMAN, A., VOELKER, G., SHARMA, N., CARDWELL, N., BROWN, M., LANDRAY, T., PINNEL, D., KARLIN,A., AND LEVY, H. 1999a. Organization-based analysis of Web-object sharing and caching. InProceedings of the Second USENIX Symposium on Internet Technologies and Systems (Oct. 1999).

WOLMAN, A., VOELKER, G., SHARMA, N., CARDWELL, N., KARLIN, A., AND LEVY, H. 1999b. On the scaleand performance of cooperative Web proxy caching. In Proceedings of the Seventeenth ACM Sym-posium on Operating Systems Principles (Dec. 1999).

YIN, J., ALVISI, L., DAHLIN, M., AND LIN, C. 1998. Using leases to support server-driven consistencyin large-scale systems. In Proceedings of the Eighteenth International Conference on DistributedComputing Systems (May 1998).

YIN, J., ALVISI, L., DAHLIN, M., AND LIN, C. 1999a. Hierarchical cache consistency in a WAN. InProceedings of the 2nd USENIX Symposium on Internet Technologies and Systems (USITS’99).

YIN, J., ALVISI, L., DAHLIN, M., AND LIN, C. 1999b. Volume leases to support consistency in large-scale systems. IEEE Trans. Knowl. Data Eng. (Feb. 1999).

YU, H., BRESLAU, L., AND SCHENKER, S. 1999. A scalable Web cache consistency architecture. InSIGCOMM ’99 (Sept. 1999).

Received September 2001; revised April 2002; accepted May 2002


Engineering Web Cache Consistency · IBM T. J. Watson Research Center, POB 704, Yorktown Heights,...

Documents

Transcript of Engineering Web Cache Consistency · IBM T. J. Watson Research Center, POB 704, Yorktown Heights,...