Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query...

14
Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian and Emin G¨ un Sirer Abstract Structured peer-to-peer hash tables provide decentraliza- tion, self-organization, failure-resilience, and good worst-case lookup performance for applications, but suffer from high la- tencies (O( )) in the average case. Such high latencies pro- hibit them from being used in many relevant, demanding appli- cations such as DNS. In this paper, we present a proactive repli- cation framework that can achieve O( ) lookup performance for common Zipf-like query distributions. This framework is based around a closed-form solution that achieves O( ) lookup performance with low storage requirements, bandwidth over- head and network load. Simulations show that this replication framework can realistically achieve good latencies, outperform passive caching, and adapt efficiently to sudden changes in ob- ject popularity, also known as flash crowds. This framework provides a feasible substrate for high-performance, low-latency applications, such as peer-to-peer domain name service. 1 Introduction Peer-to-peer distributed hash tables (DHTs) have recently emerged as a building-block for distributed applications. Un- structured DHTs, such as Freenet and the Gnutella network [5, 1], offer decentralization and simplicity of system construction, but may take up to O(N) hops to perform lookups in networks of N nodes. Structured DHTs, such as Chord, Pastry, Tapestry and others [24, 22, 26, 21, 18, 17, 14], are particularly well- suited for large scale distributed applications because they are self-organizing, resilient against denial-of-service attacks, and provide O(log N) lookup performance in both the worst- and the average case. However, for large-scale, high-performance, latency-sensitive applications, such as the domain name service (DNS) and the world wide web, this logarithmic performance bound translates into high latencies. Previous work on serving DNS using a peer-to-peer lookup service concluded that high average-case lookup costs render current structured DHTs un- suitable for latency-sensitive applications, such as DNS [8]. In this paper, we describe how proactive replication can be used to achieve O(1) lookup performance efficiently on top of a standard O(log N) peer-to-peer distributed hash table for certain, commonly-encountered query distributions. It is well- known that the query distributions of several popular applica- tions, including DNS and the web, follow a power law dis- tribution [15, 2]. Such a well-characterized query distribution presents an opportunity to optimize the system according to the expected query stream. The critical insight in this paper is that, for query distributions based on a power law, proactive (model- driven) replication can enable a DHT system to achieve a small constant lookup latency on average. In contrast, we show that common techniques for passive (demand-driven) replication, such as caching objects along a lookup path, fail to make a sig- nificant impact on the average-case behavior of the system. We outline the design of a replication framework, called Bee- hive, with the following three goals: High Performance: Enable O(1) average-case lookup performance, effectively decoupling the performance of peer-to-peer DHT systems from the size of the network. Provide O(log N) worst-case lookup performance. High Scalability: Minimize the background traffic in the network to reduce aggregate network load and per-node bandwidth consumption. Ensure that the amount of mem- ory and/or disk space required of each peer in the network is kept to a minimum. High Adaptivity: Promptly adjust the performance of the system in response to changes in the aggregate popularity distribution of objects. Further, cheaply track and main- tain the popularity of individual objects in the system to quickly respond when a certain object becomes highly popular, as with flash crowds and the “slashdot effect.” Beehive achieves these goals through efficient proactive replication. By proactive replication, we mean actively prop- agating copies of objects among the nodes participating in the network. There is a fundamental tradeoff between replication and resource consumption: more copies of an object will gen- erally improve lookup performance at the cost of space, band- width and aggregate network load. In the limit, proactively copying all objects in the DHT to all nodes would enable ev- ery query to be satisfied in constant time. However, this would not scale to large systems since it would require prohibitive amounts of space on each node, the network would be over- loaded during replica creation, and changes to mutable objects would require O(N) updates. In contrast, Beehive performs this tradeoff through an analytical model that provides a closed- form, optimal solution that achieves O(1) lookup performance for power law query distributions while minimizing the num- ber of object copies, and hence reducing storage, bandwidth and load, in the network. Beehive relies on cheap, local measurements and efficient lease-based protocols for replica coordination. Each node in Beehive continually performs local measurements to determine the relative popularity of the objects in the system, as well as 1

Transcript of Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query...

Page 1: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

Beehive: Exploiting Power Law Query Distributions forO(1) Lookup Performance in Peer to Peer Overlays

VenugopalanRamasubramanianandEminGun Sirer

Abstract

Structured peer-to-peer hash tables provide decentraliza-tion, self-organization,failure-resilience,andgoodworst-caselookup performancefor applications,but suffer from high la-tencies(O(

�������)) in theaveragecase.Suchhighlatenciespro-

hibit themfrom beingusedin many relevant,demandingappli-cationssuchasDNS.In thispaper, wepresentaproactiverepli-cation framework that can achieve O( � ) lookup performancefor commonZipf-lik e querydistributions. This framework isbasedarounda closed-formsolutionthatachievesO( � ) lookupperformancewith low storagerequirements,bandwidthover-headandnetwork load. Simulationsshow that this replicationframework canrealisticallyachievegoodlatencies,outperformpassivecaching,andadaptefficiently to suddenchangesin ob-ject popularity, alsoknown asflashcrowds. This frameworkprovidesafeasiblesubstratefor high-performance,low-latencyapplications,suchaspeer-to-peerdomainnameservice.

1 Introduction

Peer-to-peer distributed hash tables (DHTs) have recentlyemergedasa building-block for distributedapplications.Un-structured DHTs,suchasFreenetandtheGnutellanetwork [5,1], offer decentralizationandsimplicity of systemconstruction,but maytake up to O(N) hopsto performlookupsin networksof N nodes.Structured DHTs, suchasChord,Pastry, Tapestryandothers[24, 22, 26, 21, 18, 17, 14], areparticularly well-suitedfor largescaledistributedapplicationsbecausethey areself-organizing,resilientagainstdenial-of-serviceattacks,andprovide O(log N) lookup performancein both the worst- andtheaveragecase.However, for large-scale,high-performance,latency-sensitiveapplications,suchasthedomainnameservice(DNS) andthe world wide web, this logarithmicperformanceboundtranslatesinto high latencies.Previouswork on servingDNS usinga peer-to-peerlookup serviceconcludedthat highaverage-caselookupcostsrendercurrentstructuredDHTs un-suitablefor latency-sensitiveapplications,suchasDNS [8].

In this paper, we describehow proactive replicationcanbeusedto achieve O(1) lookup performanceefficiently on topof a standardO(log N) peer-to-peerdistributedhashtableforcertain,commonly-encounteredquerydistributions. It is well-known that the querydistributionsof several popularapplica-tions, including DNS and the web, follow a power law dis-tribution [15, 2]. Sucha well-characterizedquerydistributionpresentsanopportunityto optimizethesystemaccordingto theexpectedquerystream.Thecritical insightin thispaperis that,

for querydistributionsbasedonapowerlaw, proactive (model-driven)replicationcanenableaDHT systemto achieveasmallconstantlookup latency on average.In contrast,we show thatcommontechniquesfor passive (demand-driven) replication,suchascachingobjectsalonga lookuppath,fail to makeasig-nificantimpacton theaverage-casebehavior of thesystem.

Weoutlinethedesignof areplicationframework,calledBee-hive,with thefollowing threegoals:

� High Performance: EnableO(1) average-caselookupperformance,effectively decouplingthe performanceofpeer-to-peerDHT systemsfrom the sizeof the network.ProvideO(logN) worst-caselookupperformance.

� High Scalability: Minimize thebackgroundtraffic in thenetwork to reduceaggregatenetwork load and per-nodebandwidthconsumption.Ensurethattheamountof mem-ory and/ordiskspacerequiredof eachpeerin thenetworkis keptto a minimum.

� High Adaptivity: Promptlyadjusttheperformanceof thesystemin responseto changesin theaggregatepopularitydistribution of objects. Further, cheaplytrack andmain-tain the popularityof individual objectsin the systemtoquickly respondwhen a certain object becomeshighlypopular, aswith flashcrowdsandthe“slashdoteffect.”

Beehive achieves these goals through efficient proactivereplication. By proactive replication,we meanactively prop-agatingcopiesof objectsamongthenodesparticipatingin thenetwork. Thereis a fundamentaltradeoff betweenreplicationandresourceconsumption:morecopiesof anobjectwill gen-erally improve lookupperformanceat thecostof space,band-width and aggregatenetwork load. In the limit, proactivelycopying all objectsin the DHT to all nodeswould enableev-ery queryto besatisfiedin constanttime. However, this wouldnot scaleto large systemssince it would requireprohibitiveamountsof spaceon eachnode,the network would be over-loadedduringreplicacreation,andchangesto mutableobjectswouldrequireO(N) updates.In contrast,Beehiveperformsthistradeoff throughan analyticalmodel that provides a closed-form, optimalsolutionthatachievesO(1) lookupperformancefor power law querydistributionswhile minimizing the num-ber of object copies,and hencereducingstorage,bandwidthandload,in thenetwork.

Beehive relies on cheap,local measurementsand efficientlease-basedprotocolsfor replica coordination. EachnodeinBeehivecontinuallyperformslocalmeasurementsto determinethe relative popularityof the objectsin the system,aswell as

1

Page 2: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

to estimateglobalpropertiesof theaggregatequerydistributionfunction.Beehivenodesdecidehow many replicasof eachob-ject shouldbepropagatedby combiningtheclosed-formsolu-tionsfrom theanalyticalmodelwith their measurementsof theaggregatequery distribution function and estimatesof objectrank. This estimationis performedindependentlyandperiod-ically at eachnode,while a replicamanagementprotocoleffi-cientlypropagatesor removescachedobjectswithoutexcessivemessaging,globalsynchronizationor agreement.

Objectsin Beehive may be modifieddynamically. In gen-eral, mutableobjectsposecache-coherency problemsfor anyreplicationtechnique,asolder, out-of-datecopiesof anobjectmayremaincachedthroughouta systemandkeepclientsfromaccessingmore recentversions. To provide up to dateviewsin thepresenceof updates,a systemneedsto trackall replicasof an objectandeither invalidateold copiesor propagatethechangeswhentheobjectis modified.In Beehive,thestructurednatureof theunderlyingDHT allows thesystemto keeptrackof theplacementof all replicaswith a singleinteger. This en-ablesBeehive to efficiently find andupdateall replicaswhenan object is modified. Consequently, objectsmay be updatedat any time in Beehive,andlookupsperformedafteranupdatehascompletedwill returnthelatestcopy of theobject.

While this paperdescribestheBeehive proactive replicationframework in its generalform, we usethe domainnamesys-tem asa target application,performour evaluationwith DNSdata,anddemonstratethat servingDNS lookupswith a peer-to-peerdistributedhashtableis feasible.Severalshortcomingsof thecurrent,hierarchicalstructureof DNS makesit an idealapplicationcandidatefor Beehive. First,DNSis highly latency-sensitive, andposesa significantchallengeto serve efficiently.Second,thehierarchicalorganizationof DNSleadsto adispro-portionateamountof load beingplacedat the higherlevelsofthe hierarchy. Third, the highernodesin the DNS hierarchyserve as easytargets for distributed denial-of-serviceattacksandform a securityvulnerabilityfor theentiresystem.Finally,nameserversrequiredfor the internal leavesof the DNS hier-archyincur expensive administrative costs,asthey needto bemanuallyadministered,secureandconstantlyonline. Peer-to-peerDHTs addressall but the first critical problem;we showin this paperthatBeehive’s replicationstrategy canaddressthefirst.

We have implementeda prototype Beehive-basedDNSserver layeredon topof thePastrypeer-to-peerhashtable[22].Ourprototypeimplementationis compatiblewith currentclientresolver libraries deployed aroundthe Internet. We envisionthattheDNSnameserversthatarecurrentlyusedto servesmall,dedicatedportionsof thenaminghierarchywould form a Bee-hive network and collectively managethe entire namespace.Our implementationsupportsthe existing namingschemebyfalling backon legacy DNS whenBeehive-DNSlookupsfail.Unlike legacy DNS, which relieson cachetimeoutsfor loosecoherency andincursongoingcacheexpirationandrefill over-heads,Beehive-DNSenablesresourcerecordsto beupdatedatany time. While weuseDNSasaguidingapplicationfor evalu-atingoursystem,wenotethatafull treatmentof theimplemen-tationof analternative peer-to-peerDNS systemis beyondthe

scopeof this paper, andfocus insteadon the general-purposeBeehive framework for proactive replication. The frameworkis sufficiently generalto achieve O(1) lookup performanceinother settings, including web caching, where the aggregatequerydistribution followsa power law, similar to DNS.

Overall, this paper describesthe design of a replicationframework thatenablesO(1) lookupperformancein structuredDHTs for common query distributions, applies it to a P2PDNS implementation,andmakesthe following contributions.First, it proposesproactive replicationof objectsandprovidesa closedform analytical solution for the numberof replicasneededto achieve constant-timelookupperformancewith lowcosts.Thestorage,bandwidthandloadplacedon thenetworkby this schemearemodest. In contrast,we show that simplecachingstrategiesbasedon passive replicationincur largeon-going costs.Second,it outlinesthe designof a completesys-tembasedaroundthis analyticalmodel.Thissystemis layeredon top of Pastry, anexisting peer-to-peersubstrate.It includestechniquesfor estimatingthe requisiteinputs for the analyti-cal model, mechanismsfor replica propagationand deletion,anda strategy for mappingbetweenthecontinuoussolutionintheanalyticalmodelandthediscreteimplementationin Pastry.Finally, it presentsresultsfrom a prototypeimplementationofa peer-to-peerDNS serviceto show that the systemachievesgoodperformance,haslow overhead,andcanadaptquickly toflashcrowds. In turn, theseapproachesenablethe benefitsofP2Psystems,suchas self-organizationand resilienceagainstdenialof serviceattacks,to beappliedto latency-sensitive ap-plications,suchasDNS.

Therestof this paperis organizedasfollows. Section2 pro-videsa broadoverview of our approachanddescribesthestor-ageandbandwidth-efficientreplicationcomponentsof Beehivein detail. Section3 describesour implementationof Beehiveover Pastry. Section4 presentsthe resultsandexpectedben-efits of using Beehive to serve DNS queries. Section5 sur-veys differentDHT systemsandsummarizesotherapproachesto cachingand replication in peer-to-peersystemsSection6describesfuturework andSection7 summarizesour contribu-tions.

2 The Beehive System

Beehive is a generalreplicationframework thatcanbeappliedto structuredDHTsbasedonprefix-routing[19], suchasChord,Pastry, Tapestry, and Kademlia. TheseDHTs operatein thefollowing manner. Eachnodehasa uniquerandomlyassignedidentifier in a circular identifier space. Eachobject also hasa uniquerandomlyselectedidentifier assignedfrom the samespaceandis storedat the closestnode,calledthe home node.Routingis performedby successively matchinga prefix of theobject identifier againstnodeidentifiers. Generally, eachstepin thequeryprocessingtakesthe queryto a nodethathasonemorematchingprefixthanthepreviousnode.A querytraveling

hopsreachesa nodethat has

matchingprefixes . Since�Strictly speaking,thenodesencounteredtowardstheendof thequeryrout-

ing processin asparselypopulatedDHT maynotshareprogressively morepre-

2

Page 3: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

home node

other nodes

L 1

L 1

L 2

L 0

L 0

lookup (0124)

MN

OP

2012Q

R

T

S

AB

0021

C

GH

I

JK

L

0112D0122EF

Figure1: Thisfigureillustratesthelevelsof replicationin Bee-hive. A query for object 0124 takes 3 hopsfrom nodeQ tonodeE, thehomenodeof theobject.By replicatingtheobjectat level 2, that is at D andF, thequerylatency canbereducedto 2 hops.In general,anobjectreplicatedat level � incursat themost� hopsfor a lookup.

the searchspaceis reducedexponentially, this query routingapproachprovidesO(

������ ��) lookup performanceon average,

where�

is thenumberof nodesin theDHT and � is thebase,or fanout,usedin thesystem.

ThecentralobservationbehindBeehive is that the lengthofthe averagequery path will be reducedby one hop when anobjectis proactively replicatedat all nodeslogically precedingthat nodeon all querypaths. We canapply this iteratively todisseminateobjectswidely throughoutthe system. Replicat-ing anobjectat all nodes

hopsor lesserfrom thehomenode

will reducethe lookup latency by

hops. The Beehive repli-cationmechanismis a generalextensionof this observationtofind theappropriateamountof replicationfor eachobjectbasedon its popularity. Beehivestrivesto createtheminimalnumberof replicassuchthat the expectednumberof nodestraversedduring a querywill matcha targetedconstant,� . It usesananalyticalmodel to derive the numberof replicasrequiredtoachieve O(1) lookup performancewhile minimizing per nodestorage,bandwidthrequirementsandnetwork load. We note,however, that the model is driven by estimatesof objectpop-ularity and,in a real implementationlike theonewe describe,maydeviatefrom theoptimaldueto samplingerrors.

Beehive controlsthe extent of replicationin the systembyassigninga replication level to eachobject.An objectat level �is replicatedon all nodesthathave at least� matchingprefixeswith theobject. Queriesto objectsreplicatedat level � incur alookup latency of at most � hops. Objectsstoredonly at theirhomenodesareat level

������ ��, while objectsreplicatedat level�

arecachedat all thenodesin thesystem.Figure1 illustrates

fixeswith theobject,but remainnumericallyclose.This detaildoesnot signif-icantly impacteitherthetime complexity of standardDHT’s or our replicationalgorithm.Section3 discussestheissuein moredetail.

theconceptof replicationlevels.Thegoalof Beehive’s replicationstrategy is to find themin-

imal replication level for eachobject such that the averagelookup performancefor the systemis a constantnumberofhops.Naturally, theoptimalstrategy involvesreplicatingmorepopularobjectsatlowerlevels(onmorenodes)andlesspopularobjectsathigherlevels.By judiciouslychoosingthereplicationlevel for eachobject,wecanachieveconstantlookuptimewithminimalstorageandbandwidthoverhead.

Beehive employs several mechanismsandprotocolsto findand maintainappropriatelevels of replicationfor its objects.First, an analyticalmodelprovidesBeehive with closedformoptimal solutionsindicating the appropriatelevels of replica-tion for eachobject. Second,a monitoringprotocolbasedonlocal measurementsandlimited aggregationestimatesrelativeobjectpopularity, andtheglobalpropertiesof thequerydistri-bution. Theseestimatesareused,independentlyandin a dis-tributedfashion,asinputsto theanalyticalmodelwhich yieldsthelocally desiredlevel of replicationfor eachobject.Finally,a replicationprotocolproactively makescopiesof the desiredobjectsaroundthe network. The restof this sectiondescribeseachof thesecomponentsin detail.

2.1 Analytical Model

In this section, we provide a model that analyzesZipf-lik equerydistributionsandprovidesclosedform optimal replica-tion levels for the objectsin orderto achieve constantaveragelookupperformancewith low storageandbandwidthoverhead.

In Zipf-lik e, or power law, querydistributions, the numberof queriesto the ����� mostpopularobjectis proportionalto ����� ,where � is theparameterof thedistribution. Thequerydistri-bution hasa heavier tail for smallervaluesof theparameter� .A Zipf distribution with parameter

�correspondsto a uniform

distribution. The total numberof queriesto the mostpopular� objects, � � �"! , is approximately #%$'&)( �*��+� for �-,. � , and�/� �0!21 �43 � �0! for � . � .Usingtheaboveestimatefor thenumberof queriesreceived

by objects,we posean optimizationproblemto minimize thetotal numberof replicaswith the constraintthat the averagelookuplatency is aconstant� .

Let � bethebaseof theunderlyingDHT system,5 thenum-berof objects,and

�thenumberof nodesin thesystem.Ini-

tially, all the 5 objectsin the systemarestoredonly at theirhomenodes,thatis, they arereplicatedat level

. ������ �� . Let687 denotethe fraction of objectsreplicatedat level � or lower.From this definition, 6�9 is � , sinceall objectsare replicatedat level

. 5 6): mostpopularobjectsarereplicatedat all the

nodesin thesystem.Eachobject replicatedat level � is cachedin

�<; � 7 nodes.5 6 7�= 5 6 7 �> objectsarereplicatedonnodesthathaveexactly� matchingprefixes. Therefore,theaveragenumberof objectsreplicatedat eachnodeis givenby 5 6?:%@BADCFE $ � E�GIH @KJ�J�JL@AMCNE�O � E�O &�$ H O . Simplifying thisexpression,theaveragepernodestoragerequirementfor replicationis:

3

Page 4: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

5QPR��� = � � ! � 6 : @6 � @SJ�J�JT@

6 9 �*� 9 �* !>@�� 9?U (1)

The fraction of queries, �/�V5 6 7 ! , that arrive for the most

popular 5 687 objects is approximately CFA EXWNH $'&)( �>A $R&)( �> . Thenumber of objects that are replicatedat level � is 5 687 =5 6�7 �>�Y �BZ �0[

. Therefore,the numberof queriesthattravel � hops is �/�V5 687\! = �/�V5 687 �* ! Y �]Z �^[

. Theaveragelookup latency of the entire systemcan be given by97N_ ���I�/�V5 6 7 ! = �/�V5 6 7 �* !\! . The constrainton the aver-agelatency is

97F_ ���I�/�V5 6 7 ! = �/�V5 6 7 �* !�! [`� , where �is the requiredconstantlookup performance.After substitut-ing the approximationfor �/� �"! andsimplifying, we arrive atfollowing optimizationproblem.

Minimize 6?:a@ 6 � @SJ�J�JT@6 9 �>� 9 �> , suchthat (2)

6 ����: @M6 ��+� @"J�J�J�@M6 ��+�9 �> b = �c�\� = �

5 ��+� ! (3)

and6 : [ 6 [ J�J�J [ 6�9 �* [d� (4)

Notethatthesecondconstrainteffectively reducesto 6 9 �* [� , since any optimal solution to the problem with just con-straint3 wouldsatisfy6): [ 6 [ J�J�J [ 6 9 �* .

WecanusetheLagrangemultiplier techniqueto find anana-lytical closed-formoptimalsolutionto theaboveproblemwithjustconstraint3, sinceit definesaconvex feasiblespace.How-ever, the resultingsolutionmaynot guaranteethesecondcon-straint 6�9 �* [e� . If the obtainedsolutionviolatesthe secondconstraint,wecanforce 6�9 �> to 1 andapplytheLagrangemul-tiplier techniqueto the modifiedproblem. We canobtain theoptimal solutionby repeatingthis processiteratively until thesecondconstraintis satisfied.However, thesymmetricpropertyof thefirst constraintfacilitatesaneasieranalyticalapproachtosolve theoptimizationproblemwithout iterations.

Assumethat in the optimal solution to the problem, 6?: [6 [ J�J�J [ 6 9�f �> Z � , for somehg [ , and 6 9�f . 6 9�fNi .J�J�J . 6 9 . � . Thenwe canrestatetheoptimizationproblem

asfollows:

Minimize 6 : @ 6 � @SJ�J�JT@6 9 f �>� 9 f �* , suchthat (5)

6 ����: @j6 ���� @SJ�J�JX@j6 ����9 f �> b g = � g Y (6)

where � g . �c��� = �5 ���� !

Using the Lagrangemultiplier techniqueto solve this opti-mizationproblem,wegetthefollowing closedform solution:

6lk7 . Pm 7 � hg = � g !

� @ m @SJ�J�JX@ m 9 f �> U $$'&)(�Y\n� [o� Z g (7)

6 k7 . � YIn g [p�q[ (8)

wherem . � $'&)((

WecanderivethevalueofXg

by satisfyingtheconditionthat6 9 f �* Z � , thatis, r Of &�$ C 9 f ��s f H i r iut4t4t i r O f &�$

Z � .

As an example,considera DHT with basevTw , � . ��x y,� � Y �T�X� nodes,and � Y �T�X� Y �X�T� objects. Applying this analyt-

ical methodto achieve an averagelookup time, � , of � hop,we obtain

hg . w and the solution: 6 : . ��x �X� �X� � w , 6 .�+x �Tz � y , and6?{ . � . Thus, the most popular �X� � w objectswould be replicatedat level

�, the next most popular

zT�X| ��}objectswouldbereplicatedat level � , andall theremainingob-jectsat level w . The averageper nodestoragerequirementofthis systemwouldbe v?~ �X� objects.

Theoptimalsolutionobtainedby this modelappliesonly tothe case � Z � . For ����� , the closed-formsolution willyield a level of replicationthat will achieve the target lookupperformance,but theamountof replicationmaynotbeoptimal,becausethe feasiblespaceis no longerconvex. For � . � ,we canobtaintheoptimalsolutionby usingtheapproximation�/� �0! .^�'� � andapplyingthe sametechnique.Theoptimalsolutionfor thecase� . � is asfollows:

6�k7 . 5 &?�O f � 7� O f &�$� Y\n � [p� Z g (9)

6 k7 . � YIn g [o��[ (10)Xggivenby 6 k9�f �> Z �

This analyticalsolutionhasthreepropertiesthat areusefulfor guiding the extent of proactive replication. First, the ana-lytical modelprovidesa solution to achieve any desiredcon-stantlookupperformance.Thesystemcanbetailored,andtheamountof overallreplicationcontrolled,for any levelof perfor-manceby adjustingC overacontinuousrange.SincestructuredDHTs preferentiallykeepphysicallynearbyhostsin their top-level routingtables,andsincethey consequentlypaythehighestper-hop latency costsasthey get closerto the homenode,se-lecting even a large target valuefor C candramaticallyspeedup end-to-endquery latencies[4]. Second,for a large classof querydistributions ( ��[�� ), the solutionprovided by thismodelachievestheoptimalnumberof objectreplicasrequiredto provide the desiredperformance.Minimizing the numberof replicasreducesper-nodestoragerequirements,bandwidthconsumptionand aggregateload on the network. Finally,

hgservesasanupperboundfor theworstcaselookuptimefor anysuccessfulquery, sinceall objectsarereplicatedat leastin levelhg

.We make two assumptionsin the analyticalmodel: all ob-

jects incur similar costs for replication, and objectsdo notchangevery frequently. For applicationssuchasDNS, whichhave essentiallyhomogeneousobjectsizesandwhoseupdate-driventraffic isaverysmallfractionof thereplicationoverhead,the analyticalmodelprovidesan efficient solution. Applyingthe Beehive approachto applicationssuchas the web, whichhasa wide rangeof objectsizesandfrequentobjectupdates,mayrequireanextensionof themodelto incorporatesizeandupdatefrequency informationfor eachobject.

2.2 Popularity and Zipf-Parameter Estimation

Theanalyticalmodeldescribedin theprevioussectionrequirestheknowledgeof theparameter� of thequerydistributionand

4

Page 5: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

therelative popularitiesof theobjects.In orderto obtainaccu-rateestimatesof thepopularityof objectsandtheparameterofthe querydistribution, Beehive needsefficient mechanismstocontinuouslymonitortheaccessfrequency of theobjects.Bee-hive employs a combinationof local measurementandlimitedaggregationto keeptrackof thechangingparametersandadaptthereplicationappropriately.

Eachnodelocally measuresthe numberof queriesreceivedby anobjectreplicatedat thatnodein orderto estimateits rel-ative popularity. If objectsare replicatedonly at their homenodes,all thequeriesfor anobjectareroutedto thehomenode,andlocalmeasurementof accessfrequency is sufficient to esti-matetherelativepopularity. However, if theobjectis replicatedat level � , the queriesfor that objectaredistributedacrossap-proximately

�<; � 7 nodesin a base � DHT with�

nodes. Inorder to estimatethe relative popularity with the sameaccu-racy, we needan

�<; � 7 fold increasein the measurementin-terval. But, this preventsthe systemfrom reactingquickly tochangesin thepopularityof theobjects.Beehiveperformslim-ited aggregationin orderto alleviatethis problemandimprovetheresponsivenessof thesystem.

Aggregationin Beehive takesplaceperiodically, onceeveryaggregation interval. Eachnode � sendsto node � in the �����level of its routing table, an aggregation message containingtheaccessfrequency of eachobjectreplicatedatlevel � or lowerandhaving � @ � matchingprefixeswith � . Node � receivestheaggregationmessagesfrom � aswell asothernodesat level �with which it shares� prefixes.It thenaggregatestheestimatesfor accessfrequenciesreceivedfrom thesenodeswith its ownlocalestimate,andsendstheaggregatedaccessfrequency to allnodesin the �'� @ � ! ��� level of its routingtableduringthenextroundof aggregation. After

�����)� = � roundsof aggregation,thehomenodeof anobjectreplicatedat level � obtainsanaccurateestimateof theaccessfrequency.

In Beehive,eachnodeis responsiblefor replicatinganobjectatmostonelevel lower. Thatis, nodesat level � @ � arerespon-siblefor replicatinganobjectat level � . Thenodesat level � @ �needto get the aggregatedaccessfrequenciesof objectsrepli-catedat level � from the homenodes. We enablethis reverseinformationflow by sendingtheaggregatedaccessfrequenciesin responseto aggregationmessages.Thehomenodeof anob-jectsendsthelatestaggregatedestimateof theaccessfrequencyin responseto an aggregationmessagefrom a node � . Whennode � receivesanaggregationmessagefrom � , it sendsa re-ply containingthe aggregatedaccessfrequency of the objectslisted in the aggregationmessage.In this manner, the accessfrequency of anobjectis aggregatedat thehomenodeandtheaggregatedestimateis disseminatedto all thenodescontainingareplicaof theobject.For anobjectreplicatedatlevel � , it takesw?� ������� = � ! roundsof aggregationto completetheinformationflow.

In addition to the popularity of the objects,the analyticalmodelneedsan estimateof the parameterof the querydistri-bution. The Zipf-parameter, � , is also estimatedusing localmeasurementandlimited aggregation.Eachnodelocally com-putes� usingtheaggregatedaccessfrequency for differentob-jectsreplicatedat thenode.We estimate� usinglinearregres-

sion techniquesto computethe slopeof thebestfit line, sinceaZipf-lik epopularitydistribution is a straightline in log-scale.Sincethis local estimateis basedon a small subsetof the ob-jectsin thesystem,theestimateis refinedby aggregatingit withthelocal estimatesof othernodesit communicateswith duringaggregation.

Therewill be fluctuationsin the estimationof accessfre-quency andtheZipf parameterdueto randomnessin thequerydistribution. In order to avoid large discontinuouschangesto theseestimates,we age them as follows: �����V� �p� ����� .�����V� �o� ����� �*q�<� @ 3 �\� � � �4� � � �\� = � ! , with � . ��x z .2.3 Replication Protocol

Beehive requiresa protocolto replicateobjectsat the levelsofcomputedby the analyticalmodel. In order to be deployablein wide areanetworks,thereplicationprotocolshouldbeasyn-chronousandnot requireexpensive mechanismssuchasdis-tributedconsensusor agreement.In this section,we developan efficient protocol that enablesBeehive to replicateobjectsacrossa DHT.

Beehive’sreplicationprotocolusesanasynchronousanddis-tributedalgorithmto implementtheoptimal solutionprovidedby theanalyticalmodel. Eachnodeis responsiblefor replicat-ing anobjecton othernodesat mostonehopaway from itself;thatis, atnodesthatshareonelessprefix thanthecurrentnode.Initially, eachobject is replicatedonly at the homenodeat alevel

. ������ �� , whereN is thenumberof nodesin thesystemandb is thebaseof theDHT, andshares

prefixeswith theob-

ject. If anobjectneedsto bereplicatedat thenext level = � ,

thehomenodepushestheobjectto all nodesthatshareonelessprefix with the homenode. Eachof the level

= � nodesatwhich theobjectis currentlyreplicatedmayindependentlyde-cideto replicatetheobjectfurther, andpushtheobjectto othernodesthat shareone lessprefix with it. Nodescontinuetheprocessof independentanddistributedreplicationuntil all theobjectsarereplicatedat appropriatelevels. In this algorithm,nodesthat share� @ � prefixeswith an objectareresponsiblefor replicatingthat objectat level � , andarecalled � level de-ciding nodes for thatobject.For eachobjectreplicatedat level� at somenode � , the � level decidingnodeis thatnodein itsroutingtableat level � thathasmatching� @ � prefixeswith theobject.For someobjects,thedecidingnodemaybethenode �itself.

This distributed replicationalgorithm is illustrated in Fig-ure 2. Initially, an objectwith identifier 0124 is replicatedatits homenode � at level v andsharesv prefixeswith it. If theanalyticalmodelindicatesthat this objectshouldbereplicatedat level w , node � pushesthe objectsto nodes� and � withwhich it sharesw prefixes.Node � is thelevel w decidingnodefor the objectat nodes� , � , and � . Basedon the popularityof theobject,thelevel w nodes� , � , and� mayindependentlydecideto replicatetheobjectat level � . If node� decidesto doso,it pushesa copy of theobjectto nodes� and � with whichit shares� prefixandbecomesthelevel � decidingnodefor theobjectat nodes� , � , and � . Similarly, node � mayreplicatetheobjectat level � by pushinga copy to nodes� and � , and

5

Page 6: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

*

A B C D E G H IF

home node

level 3

level 2

level 1* * * * * * * *0 0 0 0 0 0 0 0 0

*0 101 10

0 1 2 E

E IB

*

*

*

Figure2: Thisfigureillustrateshow theobject0124at its homenodeE is replicatedto level 1. For nodesA throughI, thenum-bersindicatetheprefixesthatmatchtheobjectidentifierat dif-ferent levels. Eachnodepushesthe object independentlytonodeswith onelessmatchingdigit.

node� to � and � .Our replicationalgorithmdoesnot requireany agreementin

theestimationof relative popularityamongthenodes.Conse-quently, someobjectsmaybereplicatedpartially dueto smallvariationsin theestimateof the relative popularity. For exam-ple in Figure2, node � might decidenot to pushobject

� ��wT}to level � . We toleratethis inaccuracy to keepthe replicationprotocolefficient andpractical. In the evaluationsection,weshow that this inaccuracy in the replicationprotocoldoesnotproduceany noticeabledifferencein performance.

Beehive implementsthis distributedreplicationalgorithmintwo phases,ananalysis phase andareplicate phase, thatfollowthe aggregationphase. During the analysisphase,eachnodeusestheanalyticalmodelandthe latestknown estimateof theZipf-parameter� to obtaina new solution.Eachnodethenlo-cally changesthereplicationlevelsof theobjectsaccordingtothe solution. The solutionspecifiesfor eachlevel � , the frac-tion of objects,6 7 thatneedto bereplicatedat level � or lower.Hence, E WEXW � $ fractionof objectsreplicatedat level � @ � or lowershouldbe replicatedat level � or lower. Basedon the currentpopularity, eachnodesortsall theobjectsat level � @ � or lowerfor which it is the � level decidingnode. It choosesthe mostpopular EXWE W � $ fraction of theseobjectsandlocally changesthereplicationlevel of thechosenobjectsto � , if theircurrentrepli-cation level is � @ � . The nodealso changesthe replicationlevel of theobjectsthatarenot chosento � @ � , if their currentreplicationlevel is � or lower.

After the analysisphase,the replicationlevel of someob-jectscouldincreaseor decrease,sincethepopularityof objectschangeswith time. If thereplicationlevelof anobjectdecreasesfrom level � @ � to � , it needsto bereplicatedin nodesthatshareonelessprefix with it. If the replicationlevel of an objectin-creasesfrom level � to � @ � , the nodeswith only � matchingprefixesneedto deletethe replica. The replicate phase is re-sponsiblefor enforcingthe correctextentof replicationfor anobjectasdeterminedby the analysisphase.During the repli-catephase,eachnode � sendsto eachnode � in the i ��� levelof its routingtable,a replication message listing theidentifiers

of all objectsfor which � is the � level decidingnode. When� receivesthis messagefrom � , it checksthelist of identifiersandpushesto node � any unlistedobjectwhosecurrentlevelof replicationis � or lower. In addition, � sendsback to �the identifiersof objectsno longerreplicatedat level � . Uponreceiving this message,� removesthelistedobjects.

Beehive nodesinvoke the analysisandthe replicatephasesperiodically. Theanalysisphaseis invokedonceeveryanalysisinterval andthereplicatephaseonceeveryreplication interval.In order to improve the efficiency of the replicationprotocoland reduceload on the network, we integratethe replicationphasewith theaggregationprotocol. We performthis integra-tion by settingthe samedurationsfor the replicationintervalandtheaggregationinterval andcombiningthereplicationandthe aggregationmessagesasfollows: Whennode � sendsanaggregationmessageto � , themessageimplicitly containsthelist of objectsreplicatedat � whose� level decidingnodeis � .Similarly, whennode� repliesto thereplicationmessagefrom� , it addsthe aggregatedaccessfrequency informationfor allobjectslistedin thereplicationmessage.

The analysisphaseestimatesthe relative popularity of theobjects using the estimatesfor accessfrequency obtainedthrough the aggregation protocol. Recall that, for an objectreplicatedat level � , it takes w)� �����)� = � ! roundsof aggrega-tion to obtainanaccurateestimateof theaccessfrequency. Inorder to allow time for the information flow during aggrega-tion, we setthereplicationinterval to at leastw �����)� timestheaggregationinterval.

Randomvariationsin thequerydistributionwill leadto fluc-tuationsin therelativepopularityestimatesof objects,andmaycausefrequentchangesin thereplicationlevelsof objects.Thisbehavior may increasethe object transferactivity andimposesubstantialloadon thenetwork. Increasingthedurationof theaggregationinterval is not an efficient solutionbecauseit de-creasestheresponsivenessof systemto changes.Beehivelimitstheimpactof fluctuationsby employing hysteresis.During theanalysisphase,whenanodesortstheobjectsat level � basedontheirpopularity, theaccessfrequenciesof objectsalreadyrepli-catedat level � = � is increasedby asmallfraction.Thisbiasesthesystemtowardsmaintainingalreadyexisting replicaswhenthepopularitydifferencebetweentwo objectsis small.

ThereplicationprotocolalsoenablesBeehiveto maintainap-propriatereplicationlevelsfor objectswhennew nodesjoin andothersleave thesystem.Whena new nodejoins thesystem,itobtainsthe replicasof objectsit needsto storeby initiating areplicatephaseof the replicationprotocol. If thenew nodeal-readyhasobjectsreplicatedwhenit waspreviously partof thesystem,thentheseobjectsneednot be fetchedagainfrom thedecidingnodes. A nodeleaving the systemdoesnot directlyaffectBeehive. If theleaving nodeis a decidingnodefor someobjects,theunderlyingDHT choosesa new decidingnodefortheseobjectswhenit repairstheroutingtable.

2.4 Mutable Objects

Beehive directly supportsfor mutableobjectsby proactivelydisseminatingobjectupdatesto thereplicasin thesystem.The

6

Page 7: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

semanticsof readandupdateoperationson objectsis an im-portant issueto considerwhile supportingobject mutability.Strongconsistency semanticsrequirethatonceanobjectis up-dated,all subsequentqueriesto thatobjectonly returnthemod-ified object. Achieving strongconsistency is challengingin adistributedsystemwith replicatedobjects,becauseeachcopyof the replicatedobjectshouldbe updatedor invalidateduponobjectmodification.In Beehive,we exploit thestructureof theunderlyingDHT to efficiently disseminateobjectupdatesto allthenodescarryingreplicasof theobject. Our schemeguaran-teesthatwhenanobjectis modified,all replicaswill beconsis-tently updatedwithin a very shorttime if the systemis stable,thatis, nodesarenot joining andleaving thesystem.

Beehive associatesa 64 bit version number with eachob-ject to identify modifiedobjects.An objectreplicawith higherversionnumberis more recentthana replicawith lower ver-sionnumber. Theownerof anobjectin thesystemcanmodifytheobjectby insertinga freshcopy of theobjectwith a higherversionnumberat thehomenode.Thehomenodeproactivelymulticaststheupdateto all thereplicasof theobjectsusingtherouting table. If the object is replicatedat level � , the homenodesendsa copy of theupdatedobjectto eachnode � in the����� level of the routing table. Node � thenpropagatestheup-dateto eachnodein the � � @ � ! ��� level of its routingtable.

The updatepropagationprotocolensuresthat eachnode �sharingat least� prefixeswith the objectobtaina copy of themodifiedobject.Theobjectupdatereachesthenode � follow-ing exactly the samepatha queryissuedat the object’s homenodefor node� ’s identifierwouldfollow. Becauseof thisprop-erty, all nodeswith a replicaof theobjectgetexactly onecopyof themodifiedobject.Hence,thisschemeis bothefficientandprovidesguaranteedcachecoherency in the absenceof nodesleaving thesystem.

Nodesleaving thesystemmaycausetemporaryinconsisten-ciesin theroutingtable. Consequently, updatesmaynot reachsomenodeswhereobjectsarereplicated.Similarly, nodesjoin-ing thesystembut having olderversionsof theobjectreplicatedat themneedto updatethe copy of their objects. We modifyBeehive’s replicationprotocolslightly to disseminateupdatesto nodesthat have older versionsdueto churn in the system.Duringthereplicatephase,eachnodeincludestheversionnum-ber in additionto theobjectidentifierslisted in the replicationmessage.Uponreceiving thismessage,thedecidingnodeof anobjectpushesacopy of theobjectif it hasamorerecentversionof theobject.

3 Implementation

Beehive is a generalreplicationmechanismthatcanbeappliedto any prefix-baseddistributedhashtable.We have layeredourimplementationon top of Pastry, a freely availableDHT withlog(N) lookupperformance.Our implementationis structuredasa transparentlayer on top of FreePastry1.3, supportsa tra-ditional insert/modify/delete/queryDHT interfacefor applica-tions, andrequiredno modificationsunderlyingPastry. How-ever, convertingtheprecedingdiscussioninto aconcreteimple-

mentationof theBeehive framework, building a DNS applica-tion on top,andcombiningtheframework with Pastryrequiredsomepracticalconsiderationsandidentifiedsomeoptimizationopportunities.

Beehive needsto maintainsomeadditional,modestamountof statein order to track the replicationlevel, freshness,andpopularityof objects.EachBeehive nodestoresall replicatedobjectsin anobjectrepository. Beehive associatesthe follow-ing meta-informationwith eachobjectin thesystem,andeachBeehivenodemaintainsthefollowing fieldswithin eachobjectin its repository:

� Object-ID: A 128-bit field uniquely identifiesthe objectandhelpsresolve queries.Theobjectidentifier is derivedfrom thehashkey at thetimeof insertion,justasin Pastry.

� Version-ID:A 64-bit versionnumberdifferentiatesfreshcopiesof an object from older copiescachedin the net-work.

� Home-Node: A single bit specifieswhetherthe currentnodeis thehomenodeof theobject.

� Replication-Level: A small integer specifiesthe current,local replicationlevel of theobject.

� Access-Frequency: A small integer monitorsthe numberof queriesthat have reachedthis node. It is incrementedby onefor eachlocally observedquery, andresetat eachaggregation.

� Aggregate-Popularity:A small integerusedin theaggre-gationphaseto collectandsumup theaccessfrequenciesfrom all dependentnodesfor which thisnodeis thedecid-ing node.We alsomaintainanolderaggregatepopularitycountfor aging.

In addition to the stateassociatedwith eachobject, Beehivenodesalsomaintaina runningestimateof the Zipf parameter.The updatesto this estimatearebatched,andoccurrelativelyinfrequentlycomparedto thequerystream.Overall,thestoragecostconsistsof severalbytesperobject,andtheprocessingcostof keepingthemeta-dataup to dateis small.

Pastry’squeryroutingdeviatesfrom themodeldescribedear-lier in thepaperbecauseit is not entirelyprefix-basedanduni-form. SincePastrymapseachobject to the numericallyclos-estnodein the identifier space,it is possiblefor an object tonot shareany prefixes with its homenode. For example, ina network with two nodesw yX| and v?� z , Pastry will storeanobjectwith identifier v � } on node w yT| . Sincea queryfor ob-ject v � } propagatedby prefix matchingalonecannotreachthehomenode,Pastrycompletesthequerywith theaid of anaux-iliary datastructurecalled leaf set. The leaf set is usedin thelast few hopsto directly locatethenumericallyclosestnodetothequeriedobject. Pastryinitially routesa queryusingentriesin the routing table,andmay routethe lastcoupleof hopsus-ing the leaf setentries. This requiredus to modify Beehive’sreplicationprotocolto replicateobjectsat theleaf setnodesasfollows. Sincetheleaf setis mostlikely to beusedfor the last

7

Page 8: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

hop,we replicateobjectsin theleaf setnodesonly at thehigh-estreplicationlevels.Let

. ������ �� bethehighestreplicationlevel for Beehive,thatis, thedefault replicationlevel for anob-ject replicatedonly at its homenode. As part of the maintainphase,a node � sendsa maintenancemessageto all nodes�in its routing tableaswell as its leaf setwith a list of identi-fiers of objectsreplicatedat level

= � whosedecidingnodeis � . � is thedecidingnodeof anobjecthomedat node � , if� would forwarda queryto thatobjectto node � next. Uponreceiving a maintenancemessageat level

= � , node � wouldpushan object to node � only if node � andthe objecthaveat least

= � matchingprefixes. Oncean objectis replicatedon a leaf setnodeat level

= � , further replicationto lowerlevels follow the replicationprotocol describedin Section2.Thisslightmodificationto Beehiveenablesit to work ontopofPastry. Otherrouting metricsfor DHT substrates,suchastheXOR metric [18], have beenproposedthat do not exhibit thisnon-uniformity, andwheretheBeehive implementationwouldbesimpler.

Pastry’s implementationprovidestwo opportunitiesfor opti-mization,which improveBeehive’s impactandreduceits over-head. First, Pastrynodespreferentiallypopulatetheir routingtableswith nodesthat are in physicalproximity [4]. For in-stance,a nodewith identifier � �X� hasthe opportunityto pickeitherof two nodesw �T� andw � � whenroutingbasedonthefirstdigit. Pastryselectsthe nodewith the lowestnetwork latency,asmeasuredby thepacket round-triptime. As theprefixesgetlonger, nodedensitydropsandeachnodehasprogressively lessfreedomto find andchoosebetweennearbynodes.Thismeansthata significantfractionof thelookuplatency experiencedbya Pastrylookupis incurredon thelasthop.This meansthatse-lectingevena largenumberof constanthops, � , asBeehive’sperformancetarget, will have a significanteffect on the realperformanceof thesystem.While wepick � . � in our imple-mentation,notethat � is a continuousvariableandmaybesetto a fractionalvalue,to getaveragelookupperformancethat isa fractionof a hop. � . � yieldsa solutionthatwill replicateall objectsat all hops,which is suitableonly if the total hashtablesizeis small.

Thesecondoptimizationopportunitystemsfrom themainte-nancemessagesusedby Beehive andPastry. Beehive requiressomeinter-nodecommunicationfor replicadisseminationanddataaggregation. This communicationis confinedto pairsofnodeswhereonememberof thepairappearsin theothermem-ber’s routing table. This highly stylized communicationpat-ternsuggestsapossibleoptimization.Pastrynodesperiodicallysendheart-beatmessagesto nodesin theirroutingtableandleafsetto detectnodefailures.They alsoperformperiodicnetworklatency measurementsto nodesin their routingtablein ordertoobtaincloserrouting tableentries.We canimprove Beehive’sefficiency by combiningtheperiodicheart-beatmessagessentby Pastrywith theperiodicmaintenancemessagessentby Bee-hive. By piggy-backingthe i ��� row routing tableentrieson totheBeehivemaintenancemessageatreplicationlevel � , asinglemessagecansimultaneouslyserveasaheartbeatmessage,Pas-try maintenancemessage,andaBeehivemaintenancemessage.

We have built a prototypeDNS nameserver on top of Bee-

hive in orderto evaluatethe cachingstrategy proposedin thispaper. Beehive-DNS usesthe Beehive framework to proac-tively disseminateDNS resourcerecordscontainingnametoIP addressbindings. The Beehive-DNSserver currentlysup-portsUDP-basedname(A) queries,is compatiblewith widely-deployed resolver libraries and is designedto provide a mi-gration path from legacy DNS. Queriesthat are not satisfiedwithin the Beehive systemare looked up in the legacy DNSby thehomenodeandareinsertedinto theBeehiveframework.TheBeehivesystemstoresanddisseminatesresourcerecordstotheappropriatereplicationlevelsby monitoringtheDNSquerystream.Clientsarefreeto routetheir queriesthroughany nodethat is part of the Beehive-DNS.Sincethe DNS systemreliesentirelyonaggressivecachingin orderto scale,it providesveryloosecoherency semantics,andlimits therateatwhichupdatescanbe performed.Recall that the Beehive systemenablesre-sourcerecordsto be modified at any time, and disseminatesthenew resourcerecordsto all cachingnameserversaspartoftheupdateoperation.However, for this processto beinitiated,nameownerswould have to directly notify the homenodeofchangesto thenameto IP addressbinding. We expectthat,forsometime to come,Beehive will beanadjunctsystemlayeredon top of legacy DNS,andthereforenameownerswho arenotpart of Beehive will not know to contactthe system.For thisreason,our currentimplementationdelineatesbetweennamesthat exist solely in Beehive versusresourcerecordsoriginallyinsertedfrom legacy DNS. In the currentimplementation,thehomenodechecksfor thevalidity of eachlegacy DNSentrybyissuingaDNSqueryfor thedomainwhenthetime-to-livefieldof thatentry is expired. If theDNS mappinghaschanged,thehomenodedetectstheupdateandpropagatesit asusual.Notethatthisstrategy preservesDNSsemanticsandis quiteefficientbecauseonly thehomenodescheckthevalidity of eachentry,while replicasretainall mappingsunlessinvalidated.

Overall, the Beehive implementationaddsonly a modestamountof overheadandcomplexity to peer-to-peerdistributedhashtables.Our prototypeimplementationof Beehive-DNSisonly 3500lines of code,comparedto the 17500lines of codefor Pastry.

4 Evaluation

In this section,we evaluatetheperformancecostsandbenefitsof the Beehive replicationframework. We examineBeehive’sperformancein the context of a DNS systemand show thatBeehivecanrobustlyandefficiently achieveits targetedlookupperformance.We alsoshow thatBeehive canadaptto sudden,drasticchangesin the popularityof objectsaswell asglobalshifts in the parameterof the querydistribution, andcontinueto providegoodlookupperformance.

We comparethe performanceof Beehive with that of purePastry and Pastry enhancedby passive caching. By passivecaching,we meancachingobjectsalongall nodeson thequerypath,similar to theschemeproposedin [23]. We imposenore-strictionson thesizeof thecacheusedin passive caching.Wefollow the DNS cachemodel to handlemutableobjects,and

8

Page 9: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

associatea time to live with eachobject. Objectsareremovedfrom the¡ cacheuponexpirationof thetime to live.

4.1 Setup

We evaluateBeehive usingsimulations,drivenby a DNS sur-vey andtracedata.Thesimulationswereperformedusingthesamesourcecodeasour implementation.Eachsimulationrunwasstartedby seedingthe network with just a singlecopy ofeachobject,andthenqueryingfor objectsaccordingto a DNStrace.Wecomparedtheproactivereplicationof Beehiveto pas-sivecachingin Pastry(PC-Pastry),aswell asregularPastry.

Since passive caching relies on expiration times for co-herency, andsincebothBeehiveandPastryneedto performex-trawork in thepresenceof updates,weconducteda large-scalesurvey to determinethe distribution of TTL valuesfor DNSresourcerecordsandto computetherateof changeof DNSen-tries. Our survey spannedJuly throughSeptember2003,andperiodically queriedweb servers for the resourcerecordsof594059uniquedomainnames,collectedby crawling the Ya-hoo! andtheDMOZ.ORGwebdirectories.We usedthedistri-bution of thereturnedtime-to-livevaluesto determinethelife-timesof the resourcerecordsin our simulation. We measuredtherateof changein DNS entriesby repeatingtheDNS surveyperiodically, andderivedanobjectlifetime distribution.

We usedtheDNStrace[15] collectedat MIT between4 and11December2000.This tracespans} Y ��¢ � Y yXz } lookupsover ~daysfeaturing ��wXvXv distinctclientsand v � w Y � vXw distinct fully-qualifiednames.In orderto reducethe memoryconsumptionof the simulations,we scalethe numberof distantobjectsto} �Ty ¢ � , andissuequeriesat thesamerateof ~ queriespersec.The rateof issuefor requestshaslittle impacton the hit rateachievedby Beehive,which is dominatedmostlyby theperfor-manceof the analyticalmodel,parameterestimation,andrateof updates.Theoverall querydistribution of this tracefollowsanapproximateZipf-lik edistribution with parameter

�+x y � . WeseparatelyevaluateBeehive’s robustnessin thefaceof changesin this parameter.

Weperformedourevaluationsby runningtheBeehiveimple-mentationon Pastryin simulatormodewith 1024nodes.ForPastry, we setthebaseto be16, the leaf-setsizeto be24, andthe length of identifiersto be 128, as recommendedin [22].In all ourevaluations,theBeehivemaintenanceinterval was} |minutesandthereplicationintervalwas} |X� minutes.Therepli-cationphasesateachnodewererandomlystaggeredto approxi-matethebehavior of independent,non-synchronizedhosts.Wesetthetargetlookupperformanceof Beehive to average� hop.

Beehive Performance

Figure 3 shows the averagelookup latency for Pastry, PC-Pastry, and Beehive over a query period spanning40 hours.We plot the lookup latency asa moving averageover } | min-utes. Theaveragelookup latency of purePastryis about w x vT}hops. The averagelookup latency of PC-Pastrydropssteeplyduring the first } hoursandaverages� x z } after } � hours. Theaveragelookupperformanceof Beehivedecreasessteadilyand

0 8 16 24 32 400

0.5

1

1.5

2

2.5

3

time (hours)

late

ncy

(h

op

s)

PastryPC−PastryBeehive

Figure3: Latency (hops)vs Time. Theaveragelookupperfor-manceof Beehive convergesto the targeted � . � hop aftertwo replicationphases.

0 8 16 24 32 400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

6

time (hours)

ob

ject

tra

nsf

ers

(#

)

PC−PastryBeehive

Figure 4: Object Transfers(cumulative) vs Time. The totalamountof objecttransfersimposedby Beehive is significantlylower comparedto caching.Passive cachingincurslargecostsin orderto checkfreshnessof entriesin thepresenceof conser-vative timeouts.

convergesto about�+x yT|

hops,withinz)£

of the target lookupperformance.Beehiveachievesthetargetperformancein about��¢ hoursand } | minutes,thetime requiredfor two replicationphasesfollowedby a maintainphaseat eachnode.Thesethreephases,combined,enableBeehiveto propagatethepopularob-jectsto their respective replicationlevels. Onceall level � ob-jects have beendisseminated,Beehive’s proactive replicationachievesthe expectedpayoff. In contrast,PC-Pastryprovideslimited benefits,despitean infinite-sizedcache.Therearetworeasonsfor therelativeineffectivenessof passivecaching.First,theheavy tail in Zipf-lik edistributionsimpliesthattherewill be

9

Page 10: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

many objectsfor which therewill befew requests,andquerieswill take many disjoint pathsin the network until they collideon a nodeon which the objecthasbeencached.Second,PC-Pastryreliesontime-to-livevaluesfor cachecoherency, insteadof trackingthelocationof cachedobjects.Thetime-to-liveval-uesneedto be setconservatively in order to reflect the worstcasescenariounderwhich the recordmay be updated,asop-posedto theexpectedlifetime of theobject.Consequently, pas-sive cachingsuffers from a low hit rateasentriesareevicteddueto smallvaluesof TTL setby nameowners.

Next, we examine the bandwidthconsumedand networkload incurredby PC-Pastry and Beehive for cachingobjects,and show that Beehive generatessignificantly lower back-ground traffic due to object transferscomparedto passivecaching.Figure4 shows thetotalamountof objectstransferredby Beehive andPC-Pastry sincethe beginning of the experi-ment.PC-Pastryhasa rateof objecttransferproportionalto itslookup latency, sinceit transfersan objectto eachnodealongthe querypath. Beehive incurs a high rate of object transferduring the initial period; but onceBeehive achievesits targetlookupperformance,it incursconsiderablylower overhead,asit needsto performtransfersonly in responseto changesin ob-ject popularityand,relatively infrequentlyfor DNS, to objectupdates.Beehive continuesto performlimited amountsof ob-ject replication,dueto fluctuationsin thepopularityof theob-jectsaswell asestimationerrorsnot dampeneddown by hys-teresis.

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5x 10

4

target lookup performance (hops)

avg

ob

ject

s p

er

no

de

storage

0 0.5 1 1.5 20

100

200

300

400

500

est

ima

ted

late

ncy

(m

s)

latency

Figure5: StorageRequirementvs Latency. This graphshowsthe averageper nodestoragerequiredby Beehive andthe es-timatedlatency for different target lookup performance.Thisgraphcapturesthe tradeoff betweentheoverheadincurredbyBeehiveandthelookupperformanceachieved.

Theaveragenumberof objectsstoredateachnodeat theendof } � hoursis v |X� for Beehive and }Xw � for passive caching.PC-Pastry cachesmore objectsthan Beehive even thoughitslookupperformanceis worse,dueto theheavy tailednatureofZipf distributions.Our evaluationshows thatBeehiveprovides� hopaveragelookup latency with low storageandbandwidthoverhead.

Beehive efficiently tradesoff storageandbandwidthfor im-provedlookuplatency. Our replicationframework enablesad-ministratorsto tunethis tradeoff by varying the target lookupperformanceof the system. Figure5 shows the tradeoff be-tweenstoragerequirementandestimatedlatency for differenttarget lookup performance.We usedthe analyticalmodelde-scribedin Section2 to estimatethe storagerequirements.Weestimatedtheexpectedlookuplatency from roundtrip timeob-tainedby pingingall pairsof nodesin PlanetLab,andaddingtothis

�+x }Tw msfor accessingthelocalDNS resolver. Theaverage� hoproundtrip timebetweennodesin PlanetLabis w � w x w ms.In our largescaleDNSsurvey, theaverageDNSlookuplatencywas w zXz�x y ms. Beehivewith a targetperformanceof � hopcanprovidebetterlookuplatency thanDNS.

32 40 48 56 64 72 800

0.5

1

1.5

2

2.5

3

time (hours)

late

ncy

(h

op

s)

PastryPC−PastryBeehive

Figure6: Latency (hops)vs Time. This graphshows thatBee-hive quickly adaptsto changesin thepopularityof objectsandbringstheaveragelookupperformanceto 1 hop.

Flash Crowds

Next, we examine the performanceof proactive and passivecachingin responseto changesin objectpopularity. Wemodifythe traceto suddenlyreversethepopularitiesof all theobjectsin the system. That is, the leastpopularobject becomesthemostpopularobject, the secondleastpopularobjectbecomesthe secondmostpopularobject,andso on. This representsaworstcasescenariofor proactivereplication,asobjectsthatareleastreplicatedsuddenlyneedto bereplicatedwidely, andviceversa,simulating,in essence,a setof flashcrowdsfor theleastpopularobjects. The switch occursat � . } � , andwe issuequeriesfrom thereversedpopularitydistribution for another} �hours.

Figure6 shows thelookupperformanceof Pastry, PC-PastryandBeehive in responseto flashcrowds. Popularityreversalcausesatemporaryincreasein averagelatency for bothBeehiveandPC-Pastry. Beehive adjuststhereplicationlevelsof its ob-jectsappropriatelyandreducestheaveragelookupperformanceto about � hopafter two replicationintervals. The lookupper-

10

Page 11: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

32 40 48 56 64 72 800

2

4

6

8

10

12

14

16

18

20

time (hours)

ob

ject

tra

nsf

ers

pe

r se

c

PC−PastryBeehive

Figure7: Rateof ObjectTransfersvs Time. This graphshowsthat whenpopularityof the objectschange,Beehive imposesextra bandwidthoverheadtemporarily to replicatethe newlypopularobjectsandmaintainconstantlookuptime.

formanceof passive cachingalsodecreasesto about � x ¢ hops.Figure7 showstheinstantaneousrateof objecttransferinducedby thepopularityreversalfor BeehiveandPC-Pastry. Thepop-ularity reversalcausesa temporaryincreasein theobjecttrans-fer activity of Beehive asit adjuststhereplicationlevelsof theobjectsappropriately. Even thoughBeehive incurs this highrateof activity in responseto aworst-casescenario,it consumeslessbandwidthand imposeslessaggregateload comparedtopassivecaching.

0 24 48 72 96

0.5

1

1.5

2

2.5

3

time (hours)

late

ncy

(h

op

s)

latency

0 24 48 72 960.4

0.5

0.6

0.7

0.8

0.9

1

alp

ha

alpha

Figure8: Latency (hops)vs Time. This graphshows thatBee-hive quickly adaptsto changesin the parameterof the querydistribution and brings the averagelookup performanceto 1hop.

0 24 48 72 960

40

80

120

160

200

240

time (hours)

avg

ob

ject

s p

er

no

de

objects

0 24 48 72 960.4

0.5

0.6

0.7

0.8

0.9

1

alp

ha

alpha

Figure9: Objectsstoredpernodevs Time. This graphshowsthatwhentheparameterof thequerydistributionchanges,Bee-hive adjuststhenumberof replicatedobjectsto maintainO(1)lookupperformancewith storageefficiency.

Zipf Parameter Change

Finally, we examine the adaptationof Beehive to globalchangesin theparameterof theoverall querydistribution. Weissuequeriesfrom Zipf-lik edistributionsgeneratedwith differ-entvaluesof theparameter� ateachwX} hourinterval. We startwith � . ��x | , thenincreaseit to

�+x yafter wX} hours,thende-

creasethevalueto�+x ~ at � . } | , andfinally increaseit to the

startingvalueof�+x |

at � . ~�w . In order to shortenthe com-pletiontime of our simulations,we performedthis experimentwith } �Xy ¢ objectsandissuedqueriesat the rateof } x z queriespersec.

Figure 8 shows the lookup performanceof Beehive as itadaptsto changesin the parameterof the query distribution.After westartedtheexperiment,theaveragequerylatency con-vergesrapidlyto thetargetof � hop.At wX} hours,theincreaseinthevalueof � to

�+x ycausesatemporarydecreasein theaverage

querylatency, but Beehive adaptsto thechangein theZipf pa-rameterandbringsthe lookupperformancecloseto thetarget.Similarly, Beehiverefinesthereplicationlevelsof theobjectstomeetthe target lookup performance,whenthe Zipf parameterchangesto

��x ~ at } | hoursandbackto��x |

at ~�w hours.

Figure9 shows the averagenumberof objectsreplicatedateachnodein the systemby Beehive. Whenthe parameterofthe query distribution is

��x |, Beehive achieves � hop lookup

performanceby replicatingabout �T��¢ objectsat eachnodeonaverage.WhenBeehive observesthe increasein the Zipf pa-rameterto

��x y, it decreasesthe per nodestoragerequirement

to about ~�v objectsin order to meetthe target lookup perfor-manceefficiently. Similarly, whenthe parameterincreasesto�+x ~ , Beehive increasesthe numberof objectsstoredto about��¢?~ per nodein orderto achieve the target. Overall, continu-ouslymonitoringandestimatingthe � of thequerydistributionenablesBeehive to adjusttheextentandlevel of replicationtocompensatefor any globalchanges.

11

Page 12: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

Summary

In this section,we have evaluatedtheperformanceof theBee-hivereplicationframework for differentscenariosin thecontextof DNS. Our evaluationindicatesthat Beehive achievesO(1)lookupperformancewith low storageandbandwidthoverhead.In particular, it outperformspassive cachingin termsof aver-agelatency, storagerequirements,network loadandbandwidthconsumption. Beehive continuouslymonitors the popularityof theobjectsandtheparameterof thequerydistribution, andquickly adaptsits performanceto changingconditions.

5 Related Work

Peerto peerlookupsystemsproposedto datefall into two cat-egories,namely, unstructured systems,where the DHT con-structsan unconstrainedgraphamongthe participatingnodes,andstructured systems,wheretheDHT imposessomestructureon theunderlyingnetwork.

Unstructuredpeer-to-peersystems,suchasFreenet[5] andGnutella [1] perform lookupsfor objectsusing graphtraver-sal algorithms. Gnutellausesa flooding basedbreadth-first-search,while Freenetusesan iterative depth-firstsearchtech-nique. Both GnutellaandFreenetcachequeriedobjectsalongthe searchpath to improve the efficiency of the searchalgo-rithms. However, their lookupprotocolsareinefficient, do notscalewell, and do not provide boundson the the averageorworstcaselookupperformance.

Structuredpeer-to-peersystemsareappealingbecausetheycanprovide a worst-caseboundon lookup performance.Sev-eral structuredpeer-to-peersystemshave beendesignedin re-cent years. CAN [21] mapsboth objectsand nodeson a d-dimensionaltorusandprovidesO(

m�3 $¤ ) lookupperformancebysearchingin a multi-dimensionalspace.Plaxtonet al. [19] in-troducearandomizedlookupalgorithmbasedon prefixmatch-ing to locateobjectsin a distributednetwork in O(

�����)�) prob-

abilistic time. Chord[24], Pastry [22], andTapestry[26] useconsistenthashingto map objectsto nodesand route lookuprequestsusingPlaxton’s prefix-matchingalgorithmsto searchfor objects. An internaldatabaseof O(

�������) entriesenables

thesesystemsto route lookup requestsand achieve O(�����)�

)worst-caselookup performance.Kademlia[24] alsoprovidesO(�������

) lookupperformanceusingasimilarsearchtechnique,but usesthe XOR metric to computeclosenessof objectsandnodes. Viceroy [17] provides O(

�������) lookup performance

with aconstantdegreeroutinggraph.DeBruijn graphs[16, 25]canachieveO(

�����)�) lookupperformancewith w neighborsper

nodeandO(������� ;T�������������

) with�������

degreepernode.Bee-hivecanbeappliedonany overlaybasedon prefixmatching.

A few recentlyintroducedDHTs provide O( � ) lookup per-formanceby toleratingincreasedstorageandbandwidthcon-sumption.Kelips [12] providesO( � ) lookupperformancewithprobabilisticguaranteesby replicatingeachobjecton O(¥ � )nodes. It divides the nodesinto O(¥ � ) groupsof O(¥ � )nodeseachandmaintainsinformationaboutnetwork member-shipandobjectupdatesusinggossip-basedprotocols. It mapseachobject to a group and replicatesthe objecton all nodes

in the group,regardlessof popularity. Thebackgroundgossipcommunicationconsumesaconstantamountof bandwidth,butincurs long convergencetime. Consequently, Kelips may notdisseminateobjectupdatesto all replicasquickly. An alterna-tive methodto achieve onehop lookupsis describedin [13],andrelieson maintainingfull routingstate(i.e. a completede-scriptionof systemmembership)at eachnode. Thespaceandbandwidthcostsof this approachscalelinearly with thesizeofthe network. Farsite[10] alsoroutesin a constantnumberofhops,but doesnotaddressrapidmembershipchanges.Beehivediffers from thesesystemsin threefundamentalways. First,Beehive operatesasa separablelayer on many DHTs withoutrequiringstructuralchanges.Second,it exploits the popular-ity distribution of objectsto minimize the amountof replica-tion. Unpopularobjectsare not replicated,reducingstorageoverhead,bandwidthconsumptionandnetwork load. Finally,Beehive providesa fine grain control of the tradeoff betweenlookupperformanceandoverheadby allowing usersto choosethetargetlookupperformancefrom a continuousrange.

Severalpeer-to-peerapplicationshaveexaminedcachingandreplicationto improve lookupperformance,increaseavailabil-ity, andprovidebetterfailureresilience.PAST [23] andCFS[9]are examplesof file backupapplicationsbuilt on top of Pas-try andChord,respectively. Both reserve a partof thestoragespaceat eachnodeto cachequeriedresultson thelookuppathandprovide fasterlookup.They alsomaintaina constantnum-berof replicasof eachobjectin thesystemin orderto improvefault tolerance.Thesepassive cachingschemesdo not provideany performancebounds.

Somesystemsemploy acombinationof cachingwith proac-tive object updates. In [6], the authorsdescribea proactivecachefor DNS records. Whenever a cachedDNS record isaboutto expire, thecacheissuesa freshqueryto checkfor thevalidity of the DNS record,and result of the query is storedin thecache.While this techniquereducesthe impactof shortexpiration timeson lookup performance,it introducesa largeamountof overheaddueto backgroundobjecttransfers,with-out providing boundedlookupperformance.

CUP, CacheUpdatePropagation[20], is a demand-basedcachingmechanismwith proactiveobjectupdates.In CUP, theprocessof queryingfor an object and updatingcachedrepli-casof thatobjectformsa treelike structurerootedat thehomenodeof theobject. CUPnodespropagateobjectupdatesawayfrom the homenodein accordanceto a popularity basedin-centive thatflows from the leaf nodestowardsthehomenode.TheareseveralsimilaritiesbetweenthereplicationprotocolsofCUPandBeehive. However, thedecisionto cacheobjectsandpropagateupdatesin CUP are basedon heuristics,while thereplicationin Beehive is drivenby ananalyticalmodelthaten-ablesit to provide constantlookupperformancefor power lawquerydistributions.

Theclosestwork to Beehiveis [7], whichpresentsastudyofoptimal strategiesfor replicatingobjectsin unstructuredpeer-peersystems.Thispaperemploysananalyticalapproachto findthe bestpossiblereplicationstrategy for unstructuredpeer-to-peersystems,subjectto storageconstraints.Theobservationsin this work arenot directlyapplicableto structuredDHTs,be-

12

Page 13: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

causeit assumesthat the lookup time for an object dependsonly on thenumberof replicasandnot theplacementstrategy.Beehive exploits the structureof the overlay to placereplicasat appropriatelocationsin the network to achieve the desiredperformancelevel.

6 Future Work

This paperhasinvestigatedthe potentialperformancebenefitsof model-drivenproactivecachingandhasshown thatit is fea-sible to usepeer-to-peersystemsin cooperative low-latency,high-performanceenvironments. Deploying full-blown appli-cations, such as a completepeer-to-peerDNS replacement,on top of this substratewill requiresubstantialfurther effort.Mostnotably, securityissuesneedto beaddressedbeforepeer-to-peersystemscan be deployed widely. At the applicationlevel, this involvesusingsomeauthenticationtechnique,suchasDNSSEC[11], to securelydelegatenameserviceto nodesin a peerto peersystem.At theunderlyingDHT layer, securerouting techniques[3] arerequiredto limit the impactof ma-licious nodeson the DHT. Both of thesetechniqueswill addadditionallatencies,which may be offset at the cost of addi-tional bandwidth,storageandload by settingBeehive’s targetperformancelevel, � , to a lower, fractionalvalue.At theBee-hive layer, theproactive replicationlayerneedsto beprotectedfrom nodesthat misreportthe popularity of objects. Sinceamaliciouspeerin Beehive canreplicateanobject,or indirectlycauseanobjectto bereplicated,at � nodesthathave thatmali-ciousnodein their routingtables,we expectthatonecanlimittheamountof damagethatattackerscancausethroughmisre-portedobjectpopularities.

7 Conclusion

StructuredDHTs offer many unique propertiesdesirablefora large classof applications,including self-organization,fail-ure resilience,high scalability, anda worst-caseperformancebound.However, theirO(

�����)�) average-caseperformancehas

prohibitedthemfrom beingdeployedfor latency-sensitive ap-plications,including DNS. In this paper, we outline a frame-work for proactive replicationthat can improve the average-caselookup performanceof prefix-basedDHTs to ¦c�\� ! for afrequentlyencounteredclassof querydistributions.

The Beehive framework consistsof threecomponents,lay-eredon top of a standardDHT substrate,suchasPastry. Ananalyticalmodelprovidesaclosedform solutionfor computingthe requisitelevel of replicationin orderto achieve a targetedlookupperformance.This analyticalsolutionis optimal in thenumberof replicasfor Zipf-lik e distributionswith �K[§� . Anestimationtechnique,basedonlocalmeasurementsandlimitedaggregationto addressstatisticalfluctuations,derivesinput pa-rametersfor the model. The estimationprocessis integratedwith backgroundtraffic alreadypresentin the DHT. Comput-ing the level of replicationfor eachobject is performedinde-pendentlyat eachnode,without costly consensusor synchro-nization. A replicationalgorithmproactively disseminatesthe

objectsthroughoutthesystem,alongtheroutingtablesalreadymaintainedby theunderlyingDHT.

Analysisof Beehive’s performancein thecontext of a DNSapplicationindicatesthatit canachievea targetedperformancelevel with low overhead. Beehive adaptsquickly to flashcrowds,which canalter the relative popularitiesof theobjectsin the system. It detectsqualitative shifts in the global querydistribution andadjustsreplicationparametersaccordinglytocompensate.The implementationis smallandtheBeehive ap-proachcan be appliedto other latency-sensitive applications.Overall, the systemderivesits efficiency by taking advantageof theunderlyingstructureof thelower-layerDHT, andmakesit feasibleto useDHTs in low-latency applicationswherethequerydistribution follows a power law by decouplinglookupperformancefrom thesizeof thenetwork.

References

[1] “The Gnutella Protocol Specification v.0.4.”http://www9.limewire.com/developer/gnutella protocol 0.4.pdf, March2001.

[2] Lee Breslau,Pei Cao,Li Fan, GrahamPhillips, andScottShenker. “Web Cachingand Zipf-lik e Distributions: Ev-idenceand Implications.” IEEE International Conferenceon Computer Communications (INFOCOM) 1999, NewYork NY, March1999.

[3] Miguel Castro,PeterDruschel,Ayalvadi Ganesh,AntonyRowstron, and Dan Wallach. “SecureRouting for Struc-turedPeer-to-PeerOverlayNetworks.” Symposium on Op-erating Systems Design and Implementation, OSDI 2002,BostonMA, December2002.

[4] Miguel Castro,PeterDruschel,Charlie Hu, and AntonyRowstron.“Exploiting Network Proximity in Peer-to-PeerOverlay Networks.” Technical Report MSR-TR-2002-82,Microsoft Research, May 2002.

[5] IanClarke,OskarSandberg,BrandonWiley, andTheodoreHong. “Freenet: A Distributed Anonymous InformationStorageandRetrieval System.” Lecture Notes in ComputerScience, vol 2009,pp 46-66,2001.

[6] EdithCohenandHaimKaplan.“ProactiveCachingof DNSRecords:Addressinga PerformanceBottleneck.” Sympo-sium on Applications and the Internet SAINT 2001, SanDiego-MissionValley CA, January2001.

[7] Edith Cohenand Scott Shenker. “Replication Strategiesin UnstructuredPeer-to-PeerNetworks.” ACM SIGCOMM2002, PittsburghPA, August2002.

[8] RussCox, Athicha Muthitacharoen,and Robert Morris.“Serving DNS usinga Peer-to-PeerLookup Service”.In-ternational Workshop on Peer-To-Peer Systems 2002, Cam-bridgeMA, March2002.

13

Page 14: Beehive: Exploiting Power Law Query Distributions …Beehive: Exploiting Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays Venugopalan Ramasubramanian

[9] FrankDabek,FransKaashoek,David Karger, RobertMor-ris, and Ion Stoica.“Wide-areacooperative storagewithCFS.” ACM Symposium on Operating System PrinciplesSOSP 2001, Banff Alberta,Canada,October2001.

[10] JohnR. Douceur, Atul Adya, William J. Bolosky, DanSimon,Marvin Theimer. “ReclaimingSpacefrom Dupli-cateFiles in a ServerlessDistributedFile System.” Inter-national Conference on Distributed Computing Systems,ICDCS 2002, Vienna,Austria,July2002.

[11] D. Eastlake. “Domain Name System Security Exten-sions”.Request for Comments (RFC) 2535, 3 r ed.,March1999.

[12] Indranil Gupta, Ken Birman, PrakashLinga, Al De-mers,andRobertvan Rennesse.“K elips: Building an Ef-ficient and StableP2PDHT ThroughIncreasedMemoryandBackgroundOverhead.” Second International Peer-To-Peer Systems Workshop, IPTPS 2003, Berkeley CA, Febru-ary 2003.

[13] Anjali Gupta,BarbaraLiskov, RodrigoRodrigues.“OneHop Lookupsfor Peer-to-PeerOverlays.” Ninth Workshopon Hot Topics in Operating Systems. Lihue, Hawaii, May2003.

[14] NicholasHarvey, Michael Jones,Stefan Saroiu,MarvinTheimer, andAlec Wolman.“SkipNet: A ScalableOver-lay Network with PracticalLocality Properties.”, FourthUSENIX Symposium on Internet Technologies and Systems,USITS 2003, SeattleWA, March2003.

[15] Jaeyon Jung, Emil Sit, Hari Balakrishnan,and RobertMorris. “DNS PerformanceandEffectivenessof Caching.”ACM SIGCOMM Internet Measurement Workshop 2001,SanFranciscoCA, November2001.

[16] Frans Kaashoekand David Karger. “Koorde: A Sim-ple Degree-OptimalDistributed HashTable.” Second In-ternational Peer-To-Peer Systems Workshop, IPTPS 2003,Berkeley CA, February2003.

[17] Dahlia Malkhi, moni Naor, and David Ratajczak.“Viceroy: A ScalableandDynamicEmulationof theBut-terfly.” ACM Symposium on Principles of Distributed Com-puting, PODC 2002, Monterey CA, August2002.

[18] PetarMaymounkov andDavid Mazieres.“Kademlia: APeer-to-peerInformationSystemBasedon the XOR Met-ric.” First International Peer-To-Peer Systems Workshop,IPTPS 2002, CambridgeMA, March2002.

[19] Greg Plaxton,RajmohanRajaraman,andAndreaRicha.“Accessingnearbycopiesof replicatedobjectsin a dis-tributedenvironment.” Theory of Computing Systems, vol32,pg 241-280,1999.

[20] MemaRoussopoulosandMary Baker. “CUP: ControlledUpdatePropagationin Peer-to-PeerNetworks.” USENIX2003 Annual Technical Conference, SanAntonioTX, June2003.

[21] Sylvia Ratnasamy, Paul Francis,Mark Hadley, RichardKarp,andScottShenker. “A ScalableContent-AddressableNetwork.” ACM SIGCOMM 2001, SanDiego CA, August2001.

[22] Antony RowstornandPeterDruschel.“Pastry: Scalable,decentralizedobject location and routing for large-scalepeer-to-peer systems.” IFIP/ACM International Confer-ence on Distributed Systems Platforms, Middleware 2001,Heidelberg, Germany, November2001.

[23] Antony RowstornandPeterDruschel.“Storagemanage-mentandcachingin PAST, a large-scalepersistentpeer-to-peerstorageutility.” ACM Symposium on Operating Sys-tem Principle, SOSP 2001, Banff Alberta,Canada,October2001.

[24] Ion Stoica,RobertMorris,David Karger, FransKaashoek,and Hari Balakrishnan.“Chord: A scalablePeer-to-peerLookup Service for Internet Applications.” ACM SIG-COMM 2001, SanDiego CA, August2001.

[25] Udi WiederandMoni Naor. “A SimpleFaultTolerantDis-tributed Hash Table.” Second International Peer-To-PeerSystems Workshop, IPTPS 2003, Berkeley CA, February2003.

[26] Ben Zhao, Ling Huang, JeremyStribling, SeanRhea,Anthony Joseph,and John Kubiatowicz. “Tapestry: AResilientGlobal-scaleOverlay for ServiceDeployment.”IEEE Journal on Selected Areas in Communications, JSAC,2003.

14