DNS Performance and the Effectiveness of...

15
DNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris MIT Laboratory for Computer Science 200 Technology Square Cambridge, MA 02139 jyjung, sit, hari, rtm @lcs.mit.edu Abstract—This paper presents a detailed analysis of traces of DNS and associated TCP traffic collected on the Inter- net links of the MIT Laboratory for Computer Science and the Korea Advanced Institute of Science and Technology (KAIST). The first part of the analysis details how clients at these institutions interact with the wide-area DNS system, focusing on performance and prevalence of failures. The sec- ond part evaluates the effectiveness of DNS caching. In the most recent MIT trace, 23% of lookups receive no answer; these lookups account for more than half of all traced DNS packets since they are retransmitted multiple times. About 13% of all lookups result in an answer that in- dicates a failure. Many of these failures appear to be caused by missing inverse (IP-to-name) mappings or NS records that point to non-existent or inappropriate hosts. 27% of the queries sent to the root name servers result in such failures. The paper presents trace-driven simulations that explore the effect of varying TTLs and varying degrees of cache sharing on DNS cache hit rates. The results show that re- ducing the TTLs of address (A) records to as low as a few hundred seconds has little adverse effect on hit rates, and that little benefit is obtained from sharing a forwarding DNS cache among more than 10 or 20 clients. These re- sults suggest that the performance of DNS is not as depen- dent on aggressive caching as is commonly believed, and that the widespread use of dynamic, low-TTL A-record bindings should not degrade DNS performance. I. I NTRODUCTION The Domain Name System (DNS) is a distributed database mapping names to network locations, providing information critical to the operation of most Internet ap- plications and services. Internet applications depend on the correct operation of DNS, and users often wait while browsing because of delays in name resolution rather than This research was sponsored by Defense Advanced Research Projects Agency (DARPA) and the Space and Naval Warfare Systems Center San Diego under contract N66001-00-1-8933, and by IBM Corpora- tion. object download. Yet, despite its importance, little has been published over the last several years on how well this essential service performs. The last large-scale published study of DNS was Danzig et al.’s analysis of root server logs in 1992 [1], which found a large number of implementation errors that caused DNS to consume about twenty times more wide-area net- work bandwidth than necessary. Since 1992, both the In- ternet and DNS have changed in significant ways: for ex- ample, today’s traffic is dominated by the Web, and the past few years have seen DNS being used in unanticipated ways, such as in Web server selection. This paper seeks to understand the performance and be- havior of DNS and explain the reasons for its scalability. Specifically: 1. How reliable and efficient is DNS from the point of view of clients? 2. What role do its scaling mechanisms—hierarchical or- ganization and aggressive caching—play in ensuring scal- able operation? It is widely believed that two factors contribute to the scalability of DNS: hierarchical design around administra- tively delegated name spaces, and the aggressive use of caching. Both factors reduce the load on the root servers at the top of the name space hierarchy, while successful caching reduces delays and wide-area network bandwidth usage. The DNS caching design favors availability over freshness. In general, any DNS server or client may main- tain a cache and answer queries from that cache, allow- ing the construction of hierarchies of shared caches. The only cache coherence mechanism that DNS provides is the time-to-live field (TTL), which governs how long an entry may be cached. One way to gauge the effectiveness of caching is to ob- serve the amount of DNS traffic in the wide-area Internet. Danzig et al. report that 14% of all wide-area packets were DNS packets in 1990, compared to 8% in 1992 [1]. The corresponding number from a 1995 study of the NSFNET by Frazer was 5% [2], and from a 1997 study of the MCI backbone by Thompson et al. was 3% [3]. This downward trend might suggest that DNS caching is working well.

Transcript of DNS Performance and the Effectiveness of...

Page 1: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

DNS PerformanceandtheEffectivenessof CachingJaeyeonJung,Emil Sit, Hari Balakrishnan,andRobertMorris

MIT Laboratoryfor ComputerScience200TechnologySquareCambridge,MA 02139�

jyjung, sit, hari, rtm � @lcs.mit.edu

Abstract—This paperpresentsadetailedanalysisof tracesof DNS and associatedTCP traffic collected on the Inter -net links of the MIT Laboratory for Computer Scienceandthe Korea Advanced Institute of Scienceand Technology(KAIST). The first part of the analysis details how clientsat theseinstitutions interact with the wide-areaDNSsystem,focusingon performanceandprevalenceof failur es.Thesec-ond part evaluatesthe effectivenessof DNS caching.

In the most recent MIT trace, 23% of lookups receiveno answer; theselookups account for more than half of alltraced DNS packets since they are retransmitted multipletimes. About 13% of all lookups result in an answerthat in-dicatesa failur e. Many of thesefailur esappear to becausedby missing inverse (IP-to-name) mappings or NS recordsthat point to non-existentor inappropriate hosts.27% of thequeriessentto the root nameservers result in suchfailur es.

The paper presentstrace-driven simulations that explorethe effect of varying TTLs and varying degreesof cachesharing on DNS cachehit rates. The resultsshow that re-ducing the TTLs of address(A) records to as low as a fewhundred secondshas little adverseeffect on hit rates, andthat little benefit is obtained fr om sharing a forwardingDNS cacheamong more than 10 or 20 clients. Thesere-sults suggestthat the performance of DNS is not as depen-dent on aggressivecachingasis commonlybelieved,and thatthe widespreaduseof dynamic, low-TTL A-record bindingsshouldnot degradeDNS performance.

I . INTRODUCTION

The Domain Name System (DNS) is a distributeddatabasemappingnamesto network locations,providinginformationcritical to the operationof most Internetap-plicationsand services. Internetapplicationsdependonthe correctoperationof DNS, andusersoften wait whilebrowsingbecauseof delaysin nameresolutionratherthan

ThisresearchwassponsoredbyDefenseAdvancedResearchProjectsAgency (DARPA) and the SpaceandNaval WarfareSystemsCenterSanDiego undercontractN66001-00-1-8933,andby IBM Corpora-tion.

object download. Yet, despiteits importance,little hasbeenpublishedover thelastseveralyearsonhow well thisessentialserviceperforms.

Thelastlarge-scalepublishedstudyof DNSwasDanziget al.’s analysisof root server logs in 1992 [1], whichfoundalargenumberof implementationerrorsthatcausedDNS to consumeabouttwenty timesmorewide-areanet-work bandwidththannecessary. Since1992,both the In-ternetandDNS have changedin significantways: for ex-ample, today’s traffic is dominatedby the Web, and thepastfew yearshaveseenDNSbeingusedin unanticipatedways,suchasin Webserver selection.

Thispaperseeksto understandtheperformanceandbe-havior of DNS andexplain the reasonsfor its scalability.Specifically:

1. How reliable and efficient is DNS from the point ofview of clients?2. What role do its scalingmechanisms—hierarchicalor-ganizationandaggressive caching—playin ensuringscal-ableoperation?

It is widely believed that two factorscontribute to thescalabilityof DNS:hierarchicaldesignaroundadministra-tively delegatednamespaces,and the aggressive useofcaching.Both factorsreducethe load on the root serversat the top of the namespacehierarchy, while successfulcachingreducesdelaysandwide-areanetwork bandwidthusage. The DNS cachingdesignfavors availability overfreshness.In general,any DNSserver or clientmaymain-tain a cacheandanswerqueriesfrom that cache,allow-ing the constructionof hierarchiesof sharedcaches.Theonly cachecoherencemechanismthatDNSprovidesis thetime-to-live field (TTL), whichgovernshow longanentrymaybecached.

Oneway to gaugetheeffectivenessof cachingis to ob-serve theamountof DNS traffic in thewide-areaInternet.Danzigetal. reportthat14%of all wide-areapacketswereDNS packets in 1990,comparedto 8% in 1992[1]. Thecorrespondingnumberfrom a 1995studyof theNSFNETby Frazerwas5% [2], andfrom a 1997studyof theMCIbackboneby Thompsonetal. was3%[3]. ThisdownwardtrendmightsuggestthatDNS cachingis workingwell.

Page 2: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

However, the19973%numbershouldbeputin perspec-tiveby consideringit relative to network traffic asawhole.The samestudy showed that DNS accountsfor 18% ofall flows,wherea flow is definedasuni-directionaltrafficstreamswith uniquesourceanddestinationIP addressesandport numbersandIP protocolfields. If oneassumesthat applicationstypically precedeeachTCP connectionwith a call to thelocal DNS resolver library, this suggestsa DNS cachemissrateof a little lessthan25%. However,most TCP traffic is Web traffic, which tendsto producegroupsof aboutfour connectionsto thesameserver [4]; ifoneassumesoneDNS lookupfor every four TCPconnec-tions, the“session-level” DNS cachemissrateappearstobecloserto 100%. While anaccurateevaluationrequiresmore preciseconsiderationof the numberof TCP con-nectionsper sessionandthe numberof DNS packetsperlookup, this quick calculationsuggeststhat DNS cachingis notvery effective at suppressingwide-areatraffic.

An analysisof the effectivenessof DNS cachingis es-peciallyimportantin light of severalrecentchangesin theway DNS is used.Contentdistribution networks (CDNs)andpopularWeb siteswith multiple servers are increas-ingly usingDNS asa level of indirectionto help balanceloadacrossservers,or for fault tolerance,or to directeachclient requestto a topologically nearbyserver. BecausecachedDNS recordslimit theefficacy of suchtechniques,many of thesemultiple-server systemsuseTTL valuesassmall as a few secondsor minutes. Another exampleisin mobilenetworking, wheredynamicDNS togetherwithlow-TTL bindingscanprovide thebasisfor hostmobilitysupportin the Internet[5], [6]. Again, this useof DNSconflictswith caching.

A. Summaryof Results

Thispaperexploresthefollowing questions:1. Whatperformance,in termsof latency andfailures,doDNS clientsperceive?2. What is the impacton cachingeffectivenessof choiceof TTL anddegreeof cachesharing?

Thesequestionsareansweredusinga novel methodofanalyzingtracesof TCPtraffic alongwith therelatedDNStraffic. To facilitatethis,we capturedall DNS packetsandTCPSYN, FIN , andRSTpacketsat two differentlocationson theInternet.Thefirst is at thelink thatconnectsMIT’ sLaboratoryfor ComputerScience(LCS) andArtificial In-telligenceLaboratory(AI) to the restof the Internet.Thesecondis ata link thatconnectstheKoreaAdvancedInsti-tuteof ScienceandTechnology(KAIST) to therestof theInternet.Weanalyzetwo differentMIT datasets,collectedin JanuaryandDecember2000,andoneKAIST datasetcollectedin May 2001.

23%of all client lookupsin themostrecentMIT tracefail to elicit any answer, even a failure indication. Thequerypacketsfor theseunansweredlookups,includingre-transmissions,accountfor morethanhalf of all DNSquerypacketsin thetrace.3%of all unansweredlookupsaredueto loopsin nameserverresolution,andtheseloopsonaver-agecauseover10querypacketsto besentfor each(unan-swered)lookup. In contrast,theaverageansweredlookupsendsabout1.3querypackets.

In the sametrace,13% of lookupsresult in an answerthat indicatesan error. Most of theseerrorsindicatethatthe desirednamedoesnot exist. While no single causeseemsto predominate,inverselookups(translatingIP ad-dressesto names)often causefailures,asdo NS recordsthatpoint to non-existentservers. 27%of lookupssenttoroot serversresultedin errors.

While mediannameresolutionlatency waslessthan100ms, the latency of the worst 10% grew substantiallybe-tweenJanuaryandDecember2000. Roughly15% of alllookupsrequirea querypacket to besentto a root or top-level domainserver.

Therelationshipbetweennumbersof TCPconnectionsandnumbersof DNS lookupsin the MIT tracessuggeststhatthehit rateof DNScachesinsideMIT is between70%and80%. This includesclient andapplicationcaches,in-cludingthecachingdoneby Webbrowserswhenthey openmultipleTCPconnectionsto thesameserver. Thusthishitratecannotbeconsideredgood.

Finally, the percentageof TCP connectionsmade tonameswith low TTL valuesincreasedfrom 12% to 25%betweenJanuaryandDecember2000,probablydueto theincreaseddeployment of DNS-basedserver selectionforpopularsites.

ThecapturedTCPtraffic helpsusperformtrace-drivensimulationsto investigatetwo importantfactorsthataffectcachingeffectiveness:(i) the TTL valueson namebind-ings,and(ii) thedegreeof aggregationdueto sharedclientcaching.Our trace-drivensimulationsshow thatA recordswith 10-minuteTTLs yield almostthe samehit ratesassubstantiallylongerTTLs. Furthermore,the distributionof nameswasZipf-like in ourtraces;consequently, wefindthata cachesharedby asfew astenclientshasessentiallythesamehit rateasa cachesharedby thefull tracedpop-ulation of over 1000 clients. TheseresultssuggestthatDNS worksaswell asit doesdespiteineffective A-recordcaching,andthat thecurrenttrendtowardsmoredynamicuseof DNS (andlower TTLs) is not likely to beharmful.On the other hand,we also find that NS-recordcachingis critical to DNS scalabilityby reducingloadon therootservers.

Page 3: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

Therestof thispaperpresentsourfindingsandsubstan-tiatestheseconclusions.SectionII presentsan overviewof DNSandsurveyspreviouswork in analyzingits perfor-mance.SectionIII describesour traffic collectionmethod-ology andsomesalientfeaturesof our data. SectionIVanalyzestheclient-perceived performanceof DNS, whileSection V analyzesthe effectivenessof caching usingtrace-driven simulation. We concludewith a discussionof ourfindingsin SectionVI.

I I . BACKGROUND

In thissection,wepresentanoverview of DNSandsur-vey relatedwork.

A. DNSOverview

Thedesignof the InternetDNS is specifiedin [7], [8],[9]. We summarizethe importantterminologyandbasicconceptshere.

The basicfunction of DNS is to provide a distributeddatabasethat mapsbetweenhuman-readablehost names(suchaschive.lcs.mit. edu) andIP addresses(suchas18.31.0.35 ). It alsoprovidesotherimportantinfor-mationaboutthe domainor host, including reversemapsfrom IP addressesto hostnamesandmail-routinginforma-tion. Clients(or resolvers) routinely querynameserversfor valuesin thedatabase.

TheDNSnamespaceis hierarchicallyorganizedsothatsub-domainscan be locally administered. The root ofthe hierarchyis centrally administered,and served froma collectionof thirteen(in mid-2001)root servers. Sub-domainsaredelegatedto otherserversthatareauthorita-tive for their portionof thenamespace.This processmayberepeatedrecursively.

At thebeginning of our study, mostof the root serversalsoserved the top-level domains,suchas .com . At theend,thetop-level domainswerelargely servedby a sepa-ratesetof abouta dozendedicated“generictop-level do-main” (gTLD) servers.

Mappingsin the DNS namespacearecalled resourcerecords. Two commontypesof resourcerecordsaread-dressrecords(A records)and nameserver records(NSrecords). An A recordspecifiesa name’s IP address;anNS recordspecifiesthe nameof a DNS server that is au-thoritative for aname.Thus,NSrecordsareusedto handledelegationpaths.

Sinceachieving goodperformanceis animportantgoalof DNS,it makesextensiveuseof cachingto reduceserverloadandclient latency. It is believedthatcacheswork wellbecauseDNS datachangesslowly anda smallamountofstalenessis tolerable. On this premise,many serversarenotauthoritative for mostdatathey serve,but merelycache

�����

����� � ����� ����� ���� ������� � ! "#� $

% & '#!��� & ��$ � ( "��)+* ��� � $ "�� $

,�- .0/ 1+-�/ .

% & ' !�& ' & 2��

�� �� � $ "#� $3 � 4��� � $ "�� $

576 8� � $ "#� $

9: ;:< => ?@ A

B 3 C � � '#� DE�F! & '#!�� � $ "#� $�G $0'�4�4#$ � � �F G�H�H�H 3 I ( � 3 � 4#�J 3 % & '#! )�* ��� � $ "�� $04# � � K�L �ED�K� HNM#'#� DE��$ � 3�F � $ � G � $ ��� +' 3 � 4#��� � $ "#� $O 3 8 2E� 3 � 4#��� � $ "�� $F$ � G � $ �F� +' K 5�6 8 � � $ "#� $P 3 8 2E� 576 8 � � $ "#� $�$ � � �� KE4�� H+( � 27'#K�'#4�4�$ � � �Q 3 8 2E��! & '#!�� � $ "�� $�& '#& 2E� �F$ � � �� KE� �

RS

TU

V

Figure1. Exampleof a DNS lookupsequence.

responsesandserve as local proxiesfor resolvers. Suchproxy serversmay conductfurther querieson behalfof aresolver to completeaqueryrecursively. Clientsthatmakerecursive queriesareknown asstubresolvers in theDNSspecification.Ontheotherhand,aquerythatrequestsonlywhat the server knows authoritatively or out of cacheiscalledan iterativequery.

Figure 1 illustratesthesetwo resolutionmechanisms.The client applicationusesa stub resolver and queriesa local nearbyserver for a name(say www.mit.edu ).If this server knows absolutelynothing else, it will fol-low the stepsin the figure to arrive at the addressesforwww.mit.edu . Requestswill begin atawell-known rootof theDNS hierarchy. If thequeriedserver hasdelegatedresponsibilityfor aparticularname,it returnsa referral re-sponse,which is composedof nameserver records. Therecordsarethesetof serversthathave beendelegatedre-sponsibilityfor thenamein question.Thelocalserverwillchooseoneof theseserversandrepeatits question.Thisprocesstypically proceedsuntil aserverreturnsananswer.

Cachesin DNS aretypically not size-limitedsincetheobjectsbeing cachedare small, consistingusually of nomorethanahundredbytesperentry. Eachresourcerecordis expired accordingto the time set by the originator ofthename.Theseexpiration timesarecalledTimeTo Live(TTL)values.Expiredrecordsmustbefetchedafreshfromtheauthoritative origin server on query. Theadministratorof a domaincan control how long the domain’s recordsarecached,andthushow longchangeswill bedelayed,byadjustingTTLs. Rapidly changingdatawill have a shortTTL, tradingoff latency andserver loadfor freshdata.

To avoid confusion,theremainderof thispaperusestheterms“lookup,” “query,” “response,” and“answer”in spe-cific ways. A lookuprefersto theentireprocessof trans-lating a domainnamefor a client application. A queryrefersto a DNS requestpacket sentto a DNS server. Aresponserefersto a packet sentby a DNS server in replyto a querypacket. An answeris a responsefrom a DNS

Page 4: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

server that terminatesthe lookup, by returningeither therequestedname-to-recordmappingor a failure indication.Valid responsesthatarenotanswersmustbereferrals.

This means,for example, that a lookup may involvemultiple query and responsepackets. The queriesof alookuptypically askfor thesamedata,but from differentDNS servers; all responsesbut the last one(the answer)aretypically referrals.This distinctioncanbeseenin Fig-ure 1; the packets in steps1–4 are all part of the samelookup(drivenby therequestfrom theapplication);how-ever, eachsteprepresentsaseparatequeryandresponse.

B. RelatedWork

In 1992, Danzig et al. [1] presentedmeasurementsofDNS traffic at a root nameserver. Their mainconclusionwas that the majority of DNS traffic is causedby bugsandmisconfiguration.They consideredthe effectivenessof DNSnamecachingandretransmissiontimeoutcalcula-tion, andshowedhow algorithmsto increaseresilienceledto disastrousbehavior whenserversfailedor whencertainimplementationfaultsweretriggered.Implementationis-suesweresubsequentlydocumentedby Kumaretal. [10],who notethatmany of theseproblemshave beenfixed inmorerecentDNSservers.Danzigetal. alsofoundthatonethird of wide-areaDNS traffic that traversedthe NSFnetwasdestinedto oneof the (at the time) seven root nameservers.

In contrastto Danziget al.’swork, ourwork focusesonanalyzingclient-sideperformancecharacteristics.In theprocess,we calculatethe fraction of lookupsthat causedwide-areaDNS packets to be sent,and the fraction thatcauseda root or gTLD server to becontacted.

In studiesof wide-areatraffic in general,DNS is oftenincludedin thetraffic breakdown [2], [3]. As notedin Sec-tion I, thehigh ratioof DNS to TCPflows in thesestudiesmotivatedour investigationof DNS performance.

It is likely that DNS behavior is closely linked toWeb traffic patterns,sincemostwide-areatraffic is Web-relatedandWebconnectionsareusuallyprecededby DNSlookups.Oneresultof Webtraffic studiesis thatthepopu-larity distribution of Webpagesis heavy-tailed [11], [12],[13]. In particular, Breslauet al. concludethat the Zipf-like distribution of Web requestscauseslow Web cachehit rates[14]. We find that the popularitydistribution ofDNS namesis also heavy-tailed, probablyas a result ofthe sameunderlyinguserbehavior. On the other hand,DNS cacheperformancemight beexpectedto differ fromWebcacheperformance:DNS cachestypically do not in-cur cachemissesbecausethey runoutof capacity, andex-clusively useTTL-basedcaching;DNS alsocachesinfor-mationabouteachcomponentof ahierarchicalnamesepa-

rately(NSrecords);andmany Webdocumentsarepresentundera single DNS name. However, we find that DNScachesareneverthelesssimilarto Webcachesin theirover-all effectiveness.

A recentstudy by Shaikhet al. shows the impact ofDNS-basedserverselectiononDNS[15]. ThisstudyfindsthatextremelysmallTTL values(on theorderof seconds)are detrimentalto latency, and that clients are often notclose in the network topology to the nameservers theyuse,potentiallyleadingto sub-optimalserver selection.Incontrast,our study evaluatesclient-perceived latency asa function of the numberof referrals,and analyzestheimpact of TTL and sharingon cachehit rates. We alsostudytheperformanceof theDNS protocolandDNS fail-uremodes.

Wills andShangstudiedNLANR proxy logsandfoundthatDNSlookuptimecontributedmorethanonesecondtoapproximately20%of retrievalsfor theWebobjectsonthehomepageof largerservers.They alsofoundthat20%ofDNS requestsarenot cachedlocally [16], which we con-sidera large percentagefor the reasonsexplainedbefore.CohenandKaplanproposeproactive cachingschemestoalleviate the latency overheadsof synchronouslyrequest-ing expiredDNSrecords[17]; theiranalysisis alsoderivedfrom NLANR proxy log workload. Unfortunately, proxylogsdo not capturetheactualDNS traffic; thusany anal-ysis must rely on on measurementstaken after the datais collected. This will not accuratelyreflect the networkconditionsat thetime of therequest,andtheDNS recordscollectedmay also be newer. Our dataallows us to di-rectly measurethe progressof the DNS lookup as it oc-cured. Additionally, our datacapturesall DNS lookupsand their relatedTCP connections,not just thoseassoci-atedwith HTTP requests.

I I I . THE DATA

Our study is basedon threeseparatetraces. The firsttwo were collected in Januaryand December2000 re-spectively, at the link that connectsthe MIT LCS andAIlabs to the restof the Internet. At the time of the study,therewere24internalsubnetworkssharingtherouter, usedby over 500 usersand over 1200 hosts. The third tracewascollectedin May 2001at oneof two links connect-ing KAIST to the rest of the Internet. At the time ofdatacollectiontherewereover 1000usersand5000hostsconnectedto the KAIST campusnetwork. The tracein-cludesonly internationalTCP traffic; KAIST sendsdo-mestic traffic on a path that was not traced. However,the tracedoesincludeall externalDNS traffic, domesticandinternational:theprimarynameserver of thecampus,ns.kaist.ac.kr , wasconfiguredto forward all DNS

Page 5: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

Collection

Subnet #2machine

Subnet #1

Subnet #3

Subnet #24

Router

MIT LCS and AI

External network

(a) MIT LCS: Thereare24 internalsubnetworkssharingtheborderrouter.

Collectionmachine

ns.kreonet.re.krHPCNet

Router B

Router A ns.kaist.ac.kr

traffic

Domestic traffic

KAIST campusInternational

(b) KAIST: The collection machine is lo-cated at a point that capturesall DNS traf-fic, but only international traffic of othertypes. This because the primary KAISTname server, ns.kaist.ac.kr , forwardsall DNS queries through the traced link tons.kreonet.re.kr .

Figure2. Schematictopologyof thetracednetworks.

queriesto ns.kreonet.re. kr along a route that al-lowedthemto betraced.Figure2 showstheconfigurationsof thetwo networks.

The first trace,mit-jan00 , wascollectedfrom 2:00A .M . on3 January2000to 2:00A .M . on10January2000;the second,mit-dec00 , was collectedfrom 6:00 P.M .on 4 Decemberto 6:00 P.M . on 11 December2000. Thethird set,kaist-may01 , wascollectedat KAIST from5:00 A .M . on 18 May to 5:00 A .M . on 24 May 2001. AlltimesareEST.

A. CollectionMethodology

We filtered the traffic observed at the collection pointto producea dataset useful for our purposes.As manyprevious studieshave shown, TCP traffic (and in partic-ular, HTTP traffic) comprisesthe bulk of wide-areatraf-fic [3]. TCP applicationsusuallydependon the DNS toprovide therendezvousmechanismbetweentheclientandthe server. Thus,TCP flows canbe viewed asthe majordriving workloadfor DNS lookups.

In our study, we collectedboth the DNS traffic anditsdriving workload.Specifically, we collected:1. OutgoingDNS queriesandincomingresponses,and

2. OutgoingTCP connectionstart (SYN) and end (FINand RST) packets for connectionsoriginating inside thetracednetworks.In themit-jan00 trace,only thefirst 128bytesof eachpacketwerecollectedbecausewewereunsureof thespacerequirements.However, becausewefoundthatsomeDNSresponseswerelongerthanthis,we capturedentireEther-netpacketsin theothertraces.

Thetracecollectionpointsdonotcaptureall clientDNSactivity. Queriesansweredfrom cachesinsidethe tracednetworks do not appearin the traces. Thusmany of ourDNS measurementsreflect only those lookups that re-quired wide-areaqueriesto be sent. Sincewe correlatethesewith the driving workloadof TCP connections,wecanstill draw someconclusionsaboutoverallperformanceandcachingeffectiveness.

In addition to filtering for useful information,we alsoeliminatedinformationto preserveuser(client)privacy. IntheMIT traces,any userwho wishedto beexcludedfromthe collectionprocesswasallowed to do so, basedon anIP addressthey provided; only threehostsoptedout, andwereexcludedfrom all our traces. We alsodid not cap-turepacketscorrespondingto reverseDNS lookups(PTRqueries)for a small numberof nameswithin MIT, onceagainto preserve privacy. In addition, all packets wererewritten to anonymize the sourceIP addressesof hostsinside the tracednetwork. This was done in a pseudo-randomfashion—eachsourceIP addresswasmappedus-ing a keyed MD5 cryptographichashfunction [18] to anessentiallyunique,anonymizedone.

Our collection software was derived from Minshall’stcpdpriv utility [19]. tcpdpriv anonymizeslibp-cap -formattraces(generatedby tcpdump ’s packet cap-ture library). It cancollect tracesdirectly or post-processthemaftercollectionusinga tool suchastcpdump [20].We extendedtcpdpriv to support the anonymizationschemedescribedabove for DNS traffic.

B. AnalysisMethodology

We analyzedthe DNS tracesto extract variousstatis-tics about lookupsincluding the numberof referralsin-volved in a typical lookup andthe distribution of lookuplatency. To calculatethelatency in resolvinga lookup,wemaintaineda sliding window of the lookupsseenin thelastsixty seconds;anentryis addedfor eachquerypacketfrom aninternalhostwith a DNS queryID differentfromany lookup in the window. Whenan incomingresponsepacket is seen,the correspondinglookup is found in thewindow. If the responsepacket is an answer(asdefinedin SectionII-A), the time differencebetweentheoriginalquerypacket andthe responseis the lookup latency. The

Page 6: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

mit-jan00 mit-dec00 kaist-may011 Dateandplace 00/01/03-10,MIT 00/12/04-11,MIT 01/05/18-24,KAIST2 Total lookups 2,530,430 4,160,954 4,339,4733 Unanswered 595,290(23.5%) 946,308(22.7%) 873,514(20.1%)4 Answeredwith success 1,627,772(64.3%) 2,648,025(63.6%) 1,579,852(36.4%)5 Answeredwith failure 281,855(11.1%) 545,887(13.1%) 1,834,942(42.2%)6 Zeroanswer 25,513(1.0%) 20,734(0.5%) 51,165(1.2%)7 Total iterative lookups 2,486,104 4,107,439 396,0388 Answered 1,893,882 3,166,353 239,8749 Totalquerypackets 6,039,582 10,617,796 5,326,52710 Distinct secondlevel domains 58,638 84,490 78,32211 Distinct fully-qualifiednames 263,984 302,032 219,14412 Distinct internalquerysources 221 265 40513 Distinct externalnameservers 48,537 61,776 8,11414 TCPconnections 4,521,348 5,347,003 665,36115 #TCP: #valid A answers,sansblack-lists 4.62 3.53 –16 Distinct TCPclients 992 1,233 5,75417 Distinct TCPdestinations 59,588 204,192 11,511

Table1. Basictracestatistics.Thepercentagesarewith respectto total numberof lookupsin eachtrace.

actualend-userDNS requestlatency, however, is slightlylongerthanthis, sincewe seepacketsin mid-flight. If theresponseis notananswer, we incrementthenumberof re-ferralsof thecorrespondinglookupby one,andwait untilthefinal answercomes.To keeptrackof thenameserverscontactedduringa lookup,we maintaina list of all theIPaddressesof nameserversinvolvedin theresolutionof thelookup.

This methodcorrectlycapturesthe list of serverscon-tactedfor iterative lookups,but not for recursive lookups.Mostlookupsin theMIT tracesareiterative;weeliminatedthe small numberof hostswhich sentrecursive lookupsto nameserversoutsidethe tracednetwork. The KAISTtracescontainmostlyrecursive lookupssentto a forward-ing server justoutsidethetracepoint;hence,while wecanestimatelowerboundsonnameresolutionlatency, wecan-not derive statisticson thenumberof referralsor thefrac-tion of accessesto a top-level server.

C. Data Summary

Table1 summarizesthebasiccharacteristicsof ourdatasets.Wecategorizelookupsbasedon theDNS codein theresponsethey elicit, asshown in rows 3-6 of Table1. Alookupthatgetsaresponsewith non-zeroresponsecodeisclassifiedasansweredwith failure, asdefinedin theDNSspecification[8]. A zero answeris authoritative andindi-catesno error, but hasno ANSWER, AUTHORITY, or AD-DITIONAL records[10]. A zeroanswercanarise,for ex-ample,whenanMXlookupis donefor a namethathasnoMXrecord,but doeshave otherrecords.A lookup is an-swered with successif it terminateswith a responsethat

mit-jan00 mit-dec00 kaistA 60.4% 61.5% 61.0%PTR 24.6% 27.2% 31.0%MX 6.8% 5.2% 2.7%ANY 6.4% 4.6% 4.1%

Table2. Percentageof DNS lookupsacrossthe mostpopularquerytypes.

hasa NOERRORcodeandoneor moreANSWERrecords.All otherlookupsareareconsideredunanswered.

Clientscanmake a variety of DNS queries,to resolvehostnamesto IP addresses,find reverse-mappingsbetweenIP addressesand hostnames,find mail-recordbindings,etc. There are twenty query types definedin the DNSspecification[8]. Table 2 lists the four most frequentlyrequestedquery typesin eachof our traces. About 60%of all lookupswerefor hostname-to-addressA recordsandbetween24%and31%werefor thereversePTRbindingsfrom addressesto hostnames.

Although mostansweredA lookupsare followed by aTCPconnectionto thehostIP addressspecifiedin there-turnedresourcerecord,thereareasmallnumberof excep-tions.Themostimportantof theseareto so-calledreverseblack-lists, which in our traceswere A-record lookupsmadeto rbl.maps.vix.co m to ask if an IP addressa.b.c.d (in standarddottednotation) is on the list ofknown mail spammersmaintainedat maps.vix.com .To do this, the client sendsa A-record lookup for thenamed.c.b.a.rbl.map s. vix .c om. If the corre-

Page 7: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

spondingIP addressis on thelist, theserver answerswitha TXT record; if it is not on the list, it sendsan NX-DOMAINanswer. Therefore,A-recordanswersresultingfrom thosequeriesdo not have any associatedTCP con-nection,andshouldbeexcludedin calculatingtheratio ofTCP connectionsto DNS lookups(in estimatingcachingeffectiveness).We found8,397suchlookups(0.33%)formit-jan00 , 6,456(0.15%)for mit-dec00 , and3,226(0.07%)for kaist-may01 .

Oneof themajormotivationsfor ourwork wastheratioof DNS lookupsto TCP connectionsin thewide-areaIn-ternet,asdescribedin SectionI. Thedatain Table1 (rows4 and14) andTable2 (A-recordpercentage)allow us toestimatethis ratio for the tracedtraffic, asthe ratio of thenumberof TCPconnectionsto thenumberof successfullyansweredlookupsthat areA recordsandarenot on listslike rbl.maps.vix.c om. Thesenumbersare shownfor mit-jan00 andmit-dec00 in row 15, suggestingaDNScachehit ratio (for A-records)for theMIT tracesofbetween70%and80%. As explainedin SectionI, this hitrateis not particularlyhigh, sinceit includesthe cachingdoneby Web browserswhenthey openmultiple connec-tionsto thesameserver.

The total number of successfully answered DNSlookupswasover1.5million in thekaist-may01 trace,but only about0.67 million TCP connectionsshowed upin the trace. This tracerecordedall DNS traffic from thecampus,but only the internationalTCPtraffic.

IV. CLIENT-PERCEIVED PERFORMANCE

Thissectionanalyzesseveralaspectsof client-perceivedDNS performance. We start by discussingthe distribu-tion of time it took clients to obtain answers. We thendiscussthe behavior of the DNS retransmissionprotocolandthe situationsin which client lookupsreceive no an-swer. We alsostudythefrequency andcausesof answersthat are error indicationsand the prevalenceof negativecaching. Finally, we look at interactionsbetweenclientsandroot/gTLD servers.

A. Latency

Figure3 shows thecumulative DNSlookuplatency dis-tribution for our datasets.Themedianis 85 msfor mit-jan00 and 97 ms for mit-dec00 . Worst-caseclientlatenciesbecamesubstantiallyworse—thelatency of the90th-percentileincreasedfrom about 447 ms in mit-jan00 to about1176msin mit-dec00 . In thekaist-may01 data,about35% of lookupsreceive responsesinlessthan 10 ms and the medianis 42 ms. The KAISTtracehasmore low-latency lookupsthan the MIT tracesbecausetherequestedresourcerecordissometimescached

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000

CD

FW

Latency (ms)

mit-jan00 mit-dec00

kaist-may01

Figure3. Cumulativedistributionof DNS lookuplatency.

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000

CD

FW

Latency (ms)

0 referral1 referral2 referrals

overall

Figure4. Latency distribution vs. numberof referralsfor themit-dec00 trace.

at ns.kreonet.re. kr , which is closeto the primarynameserver for the campus(seeFigure2(b)). However,the worst 50% of the KAIST distribution is significantlyworsethanthat of MIT. Many of thesedatapointscorre-spondto lookupsof namesoutsideKorea.

Latency is likely to be adverselyaffectedby the num-ber of referrals. Recall that a referral occurs when aserverdoesnotknow theanswerto aquery, but doesknow(i.e., thinks it knows) wheretheanswercanbe found. Inthat case,it sendsa responsecontainingoneor moreNSrecords,andtheagentperformingthelookupmustsendaqueryto oneof the indicatedservers. Table3 shows thedistribution of referralsperlookup.About80%of lookupsareresolvedwithoutany referral,whichmeansthey getananswerdirectly from theserver first contacted,while onlya tiny fraction(0.03%–0.04%for MIT) of lookupsinvolvefour or morereferrals.

Figure 4 shows the latency distribution for differentnumbersof referralsfor the mit-dec00 dataset. Forlookupswith onereferral,60%of lookupsareresolved inlessthan100msandonly 7.3%of lookupstakemorethan1 second.However, morethan95%of lookupstake morethan 100 ms, and 50% take more than1 second,if theyinvolve two or morereferrals.

Page 8: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

mit-jan00 mit-dec00 kaist-may010 74.62% 81.17% 86.09%1 24.07% 17.86% 10.43%2 1.16% 0.87% 2.10%3 0.11% 0.07% 0.38%XZY

0.04% 0.03% 1.00%

Table3. Percentageof lookupsinvolving variousnumbersofreferrals.The numberof lookupsusedin this analysisforeachtraceis shown in row 8 of Table1. Theaveragenum-berof queriesto obtainananswer, notcountingretransmis-sions,was1.27,1.2,and1.2,respectively.

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000

CD

FW

Latency (ms)

NS cache hitNS cache miss

Figure5. Distribution of latenciesfor lookupsthat do anddonot involvequeryingrootservers.

To illustratethe latency benefitsof cachedNS records,we classify eachtracedlookup as either a hit or a missbasedon the first server contacted.We assumea miss ifthe first querypacket is sentto oneof the root or gTLDserversandelicits a referral. Otherwise,we assumethatthereis ahit for anNSrecordin a localDNScache.About70% of lookupsin the MIT tracesarehits in this sense.Figure5 shows the latency distribution for eachcase. Itshows that cachingNS recordssubstantiallyreducestheDNS lookuplatency eventhoughit mayinvolve somere-ferralsto completethelookup. CachedNSrecordsarees-peciallybeneficialbecausethey greatlyreducetheloadontheroot servers.

B. Retransmissions

This sectionconsiderslookupsthatresultin no answer,andlookupsthat requireretransmissionsin orderto elicitan answer. This is interestingbecausethe total numberof querypackets is much larger thanthe total numberoflookups;theprevious section(andTable3) show that theaveragenumberof query packets for a successfullyan-sweredqueryis 1.27(mit-jan00 ), 1.2 (mit-jan00 ),and1.2 (kaist-may01 ). However, theaveragenumber

mit-jan00 mit-dec00Zeroreferrals 139,405(5.5%) 388,276(9.3%)Non-zeroreferrals 332,609(13.1%) 429,345(10.3%)Loops 123,276(4.9%) 128,687(3.1%)

Table4. Unansweredlookupsclassifiedby type.

of DNS querypacketsin thewide-areaperDNS lookupissubstantiallylargerthanthis.

We cancalculatethis ratio, [ , asfollows. Let the totalnumberof lookupsin a tracebe \ , of which ] areitera-tively performed.This separationis usefulsincethetraceis lesslikely to show all the retransmittedquerypacketsthat traversethe wide-areaInternet for a recursively re-solved query. We do not observe several of thesequerypackets becausea \_^`] lookups are done recursively;however, we may observe retransmissionsfor recursivelookupsin somecases.Let the numberof querypacketscorrespondingto retransmissionsof recursive lookupsbea

. Let b bethetotal numberof querypacketsseenin thetrace.Then,\c^d]feg[h]gi`bj^ a or [gi_klejmNbj^n\c^ apo�q ] .Thevaluesof \ , ] , and b for thetracesareshown in rows2, 7, and9 of Table1.

The valueof [ is relatively invariantacrossour traces:2.40 for mit-jan00 (

a i r Ytsvuhwlx ), 2.57 for mit-dec00 (

a i k xys�Yzuhx ), and 2.36 for kaist-may01(a i YlY{xys rl|l} ). Notice that in eachcase[ is substan-

tially largerthantheaveragenumberof querypacketsfor asuccessfullyansweredlookup.This is becauseretransmis-sionsaccountfor a significantfractionof all DNS packetsseenin thewide-areaInternet.

A queryingnameserver retransmitsa query if it doesnotgetaresponsefrom thedestinationnameserverwithina timeoutperiod. This mechanismprovidessomerobust-nessto UDP packet lossor server failures. Furthermore,eachretransmissionis often targetedat a different nameserver if oneexists,e.g.,a secondaryfor thedomain.De-spiteretransmissionsandserver redundancy, about24%oflookupsin theMIT tracesand20%of in theKAIST tracesreceived neithera successfulanswernor an error indica-tion asshown in thethird row of Table1.

Webreaktheunansweredlookupsinto threecategories,asshown in Table4. Lookupsthat elicited zero referralscorrespondto thosethat did not receive even one refer-ral in response.Lookupsthat elicited oneor morerefer-rals but did not lead to an eventualanswerareclassifiedasnon-zero referrals. Finally, lookupsthatledto loopsbe-tweennameserverswherethequerieris referredto asetoftwoormorenameserversformingaqueryingloopbecauseof misconfiguredinformationareclassifiedas loops. We

Page 9: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

distinguishthezero referrals andnon-zero referrals cate-goriesbecausethe formerallows us to isolateandunder-standtheperformanceof theDNS retransmissionmecha-nism.Wedonot reportthisdatafor kaist-may01 sincemostlookupsin that tracewererecursively resolved by aforwarderoutsidethetracepoint.

The packet load causedby unansweredqueriesis sub-stantialfor two reasons:first, theratherpersistentretrans-missionstrategy adoptedby many queryingnameservers,andsecond,referralloopsbetweennameservers.

Onaverage,eachlookupthatelicitedzeroreferralsgen-eratedaboutfivetimes(in themit-dec00 trace)asmanywide-areaquerypacketsbeforethequeryingnameservergave up, asshown in Figure6. This figurealsoshows thenumberof retransmissionsfor queriesthatwereeventuallyanswered(thecurvesat thetopof thegraph)—over 99.9%of theansweredlookupsincurredat mosttwo retransmis-sions,and over 90% involved no retransmissions.Whatis especiallydisturbingis that the fractionof suchwastedquerypacketsincreasedsubstantiallybetweenJanuaryandDecember2000; in themit-jan00 trace,theworst5%of unansweredlookups caused6 retransmissions,whilethey caused12 retransmissionsin themit-dec00 trace.

Given that the queriescorrespondingto theselookupsdo not elicit a response,andthatmostqueriesthatdo geta responseget onewithin a small number(two or three)of retransmissions,we concludethat many DNS nameserversaretoo persistentin their retry strategies. Our re-sultsshow that it is betterfor themto give up sooner, af-ter two or threeretransmissions,and rely on client soft-ware to decidewhat to do. We believe this is appropri-atebecausetheseretransmissionsarebeingdoneby nameserversthatarerecursively resolvingqueriesfor client re-solver libraries that areoften on differenthoststhat mayalsoberetryingnamerequests.

Interestingly, between12% (January2000) and 19%(December2000)of unansweredlookupsinvolved no re-transmissionwithin 1minute.Thissuggestsaninappropri-ateoptionsettingof thenumberof retriesor anextremelylarge timeoutvalueof morethan60 seconds.We believethisconclusionis correctbecauseall theseretransmissionsarebeingdoneby nameserversthatarerecursively resolv-ing querieson behalfof clientsthatareon otherhosts,soeven if the client were to terminatethe connection(e.g.,becauseno progresswasbeingmade),therecursive nameresolutionby the nameservers would still continueas ifnothinghappened.

Figure7 shows theCDFsof thenumberof querypack-ets generatedfor the non-zero referrals and loops cate-goriesof unansweredlookups.As expected,thenon-zeroreferrals(which do not have loops) did not generateas

0

20

40

60

80

100

0 2 4 6 8 10 12 14

CD

FW

Number of retransmissions

mit-jan00 (Answered)mit-dec00 (Answered)

mit-jan00 (Zero referral)mit-dec00 (Zero referral)

Figure6. Cumulativedistributionof numberof retransmissionsfor answered(top-mostcurves)andunansweredlookups.

0

20

40

60

80

100

10 20 30 40 50 60 70 80 90 100

CD

FW

Number of referrals

mit-jan00 (Non-zero referrals)mit-dec00 (Non-zero referrals)

mit-jan00 (Loops)mit-dec00 (Loops)

Figure7. Toomany referralsof thelookupsin loops: thisgraphcomparescumulative distributionsof numberof referralsfor the lookupsthat causeloopsand that causenon-zeroreferrals.

many packets as the loops, which generatedon averageaboutten query packets. Although unansweredlookupscausedby loopscorrespondto only about4.9%and3.1%of all lookups,they causea largenumberof querypacketsto begenerated.

This analysisshows that a large fraction of the tracedDNS packets are causedby lookupsthat end up receiv-ing no response. For example, mit-dec00 included3,214,646lookupsthat received an answer;the previoussectionshowed that the averagesuch lookup sends1.2querypackets.Thisaccountsfor 3,857,575querypackets.However, Table1 showsthatthetracecontains10,617,796query packets. This meansthat over 63% of the tracedDNS query packets were generatedby lookupsthat ob-tainedno answer! The correspondingnumberfor mit-jan00 is 59%.Obviously, someof thesewererequiredtoovercomepacket losseson Internetpaths.Typical averagelossratesarebetween5% and10%[4], [21]; thenumberof redundantDNS querypacketsobserved in our tracesissubstantiallyhigherthanthis.

Page 10: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

Lookups Namemit-jan00

7,368 loopback5,200 ns1.cvnet.com3,078 jupiter.isq.pt2,871 shark.trendnet.com.br2,268 213.228.150.38.in-addr.arpa1,913 mail.geekgirl.cx1,887 swickhouse.w3.org1,642 ns3.level3.net1,547 mickey.payne.org1,377 239.5.34.206.in-addr.arpa

mit-dec0026,308 33.79.165.208.in-addr.arpa11,503 195.70.45.111,270 195.70.35.3410,271 112.221.96.206.in-addr.arpa9,440 104.196.168.208.in-addr.arpa5,772 110.221.96.206.in-addr.arpa5,290 216.4.7.226.ehomecare.com5,235 216.4.7.227.ehomecare.com4,137 auth01.ns.u.net3,317 ns.corp.home.net

Table5. The ten mostcommonnamesthat producedNXDO-MAIN errorsin themit-jan00 trace(top) andthemit-dec00 trace(bottom). A total of 194,963lookupspro-ducedNXDOMAINin mit-jan00 ; 47,205distinctnameswereinvolved. A total of 464,761lookupsproducedNX-DOMAINin mit-dec00 ; 85,353distinct nameswerein-volved.

C. Failures

As shown in Table1, between10%and42%of lookupsresult in an answerthat indicatesan error. Most of theseerrorsareeitherNXDOMAINor SERVFAIL. NXDOMAINsignifiesthat the requestednamedoesnot exist. SERV-FAIL usuallyindicatesthata server is supposedto beau-thoritative for a domain,but doesnot have a valid copy ofthedatabasefor thedomain;it may alsoindicatethat theserver hasrunoutof memory.

Table 5 shows the ten most commonnamesthat re-sultedin NXDOMAINerrorsin themit-jan00 andmit-dec00 . Thelargestcauseof theseerrorsareinverse(in-addr.arpa ) lookups for IP addresseswith no inversemappings.

For mit-jan00 in-addr.arpa accounted for33,272out of 47,205distinct invalid names,and 79,734of the 194,963total NXDOMAINresponses. Similarly,for mit-dec00 in-addr.arpa accountedfor 67,568out of 85,353distinct invalid names,and250,867of the

464,761total NXDOMAINresponses. Other significantcausesfor NXDOMAINresponsesincludeparticularinvalidnamessuchas loopback , andNS andMXrecordsthatpoint to namesthatdo notexist. However, no singlenameor eventypeof nameseemsto dominatetheseNXDOMAINlookups.

SERVFAILs accountedfor 84,906of the answersinmit-jan00 (out of 4,762distinctnames)and61,498oftheanswersin mit-dec00 (outof 5,102distinctnames).3,282of the namesand 24,170of the lookupswere in-verselookupsin mit-jan00 , while 3,651of thenamesand41,465of the lookupswereinverselookupsin mit-dec00 . Mostof thelookupswereaccountedfor by arela-tively smallnumberof names,eachlookedupalargenum-ber of times;presumablythe NS recordsfor thesenamesweremisconfigured.

D. NegativeCaching

The large numberof NXDOMAINresponsessuggeststhat negative cachingmay not be as widely deployed asit shouldbe. To seehow many nameservers implementnegative caching,we senttwo consecutive queriesfor anon-existentname(a.b.c.com ) to 53,774distinctnameserversfoundin thetracesandrecordedtheresponse.Therecursion-desiredbit was set in the first query to trig-gercompletenameresolutionto thetargetednameserver.Then, the secondquery was sentwithout the recursion-desiredbit. Theideais thatif negativecachingwereimple-mented,thenthesecondquerywould returnNXDOMAIN,while it would returnareferralif it werenot implemented.

Table 6 shows the results of this experiment. Thefirst NXDOMAINanswerimpliesthatcorrespondingnameservershonoredtherecursion-desiredbit anddid recursivelookupsfor that query. We found that 90.8%of the con-tactednameservers respondedwith NXDOMAINfor thesecondof theconsecutive queries,suggestingthatat leastthis fraction of the contactednameservers implementedsomeform of negative caching.On theotherhand,2.4%nameservers returneda referral with a list of authorita-tive serversof .com domainwith theNOERRORresponsecode,which indicatesthat they wererunningan old ver-sionof anameserver withoutnegative caching.

Theoverallconclusionfrom thisexperimentis thatover90%of thecontactednameserversspreadacrosstheInter-netimplementnegative caching.

E. Interactionswith RootServers

Table7 shows the percentageof lookupsforwardedtoroot andgTLD serversandthepercentageof lookupsthatresultedin an error. We observe that 15% to 18% oflookupscontactedrootor gTLD serversandthepercentage

Page 11: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

1stanswer 2ndanswerNXDOMAIN NXDOMAIN 48831(90.8%)NOERROR NOERROR 1874( 3.5%)NXDOMAIN NOERROR 1272( 2.4%)REFUSED REFUSED 1124( 2.1%)

Table 6. Responseswhen two consecutive a.b.c.comlookups were sent to 53,774nameservers in December2000from MIT LCS.

mit-jan00 mit-dec00RootLookups 406,321(16%) 270,413(6.4%)RootErrors 59,862(2.3%) 73,697(1.7%)gTLD Lookups 41,854(1.6%) 353,295(8.4%)gTLD Errors 2,676(0.1%) 16,341(0.3%)

Table7. The total numberof lookupsthat contactedroot andgTLD servers,andthe total numberof failure answersre-ceived.Thepercentagesareof thetotal numberof lookupsin thetrace.

slightly decreasedbetweenJanuaryandDecember2000.This wasprobablycausedby anincreasein thepopularityof popularnames(seeSectionV andFigure9), which de-creasedDNS cachemissrates.The tablealsoshows thatloadonrootservershasbeenshiftedto gTLD serversovertime. ThegTLD serversatendof 2000wereservingmorethanhalf of all top-level domainqueries.

Between15%and27%of thelookupssentto rootnameservers resultedin failure responses.Most of theseap-pear to be mis-typednames(e.g. prodiy.net ), barehostnames(e.g. loopback andelectric-banana ),or othermistakes(e.g. index.htm ). Thepercentagein-creasedbetweenJanuaryandDecemberbecausemuchofthe load of servinglegitimatenamesmoved to the gTLDservers.

V. EFFECTIVENESS OF CACHING

The previous sectionsanalyzedthe collectedtracestocharacterizethe actual client-perceived performanceofDNS. This sectionexplores DNS performanceunder arangeof controlledconditions,usingtrace-driven simula-tions.Thesimulationsfocuson thefollowing questionsinthecontext of A-records:1. How useful is it to shareDNS cachesamongmanyclient machines? The answerto this questiondependson the extent to which differentclients look up the samenames.2. What is thelikely impactof choiceof TTL on cachingeffectiveness?Theanswerto this questiondependson lo-

mit-jan00

mit-dec00

kaist-may01

Fully-qualified 0.88 0.91 0.94Second-level 1.09 1.11 1.18

Table8. Popularityindex ~ for the tail of the domainnamedistribution.

cality of referencesin time.We startby analyzingour tracesto quantifytwo impor-

tant statistics:(i) thedistribution of namepopularity, and(ii) thedistribution of TTL valuesin thetracedata.Thesedetermineobservedcachehit rates.

A. NamePopularityandTTLDistribution

To deducehow the popularity of namesvaries in ourclient traces,we plot accesscountsas a function of thepopularity rank of a name,first consideringonly “fullyqualifieddomainnames.” This graph,on a log-log scale,is shown in Figure8(a).To understandthebehavior of thetail of this distribution, andmotivatedby previousstudiesthatshowedthatWebobjectpopularityfollows aZipf-likedistribution [11], we representtheaccesscountasa func-tion � q���� , where � is termedthepopularityindex.. If thisis a valid form of the tail, thena straightline fit throughthetail wouldbeagoodone,andthenegative of theslopewould tell uswhat � is. This straightline is alsoshown inthefigure,with ��������|�k .

We also considerwhether this tail behavior changeswhen names are aggregated according to their do-mains. Figure 8(b) shows the correspondinggraphfor second-level domain names, obtained by tak-ing up to two labels separated by dots of thename; for example, foo.bar.mydomai n. co m andfoo2.bar2.mydom ai n. co m would both be aggre-gated together into the second-level domain mydo-main.com . The � for this is closerto 1, indicatingthatthetail fallsoff alittle fasterthanin thefully qualifiedcase,althoughit is still apower-law. Theslopescalculatedusinga least-squarefit for eachtraceareshown in Table8.1

Figure9 illustratestheextent to which lookupsareac-countedfor by popularnames. The

a-axis indicatesa

fraction of the most populardistinct names;the � -axisindicatesthecumulative fractionof answeredlookupsac-countedfor by thecorresponding

amostpopularnames.

For example,themostpopular10%of namesaccountformorethan68%of totalanswersfor eachof thethreetraces.�

Wecalculatedtheleast-squarestraightline fit for all pointsignoringthe first hundredmostpopularnamesto moreaccuratelyseethe tailbehavior.

Page 12: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

1

10

100

1000

10000

100000

1e+06

1 10 100 1000 10000 100000

Acc

ess

coun

ts

Object number

full namealpha = 0.90583

(a)Full name

1

10

100

1000

10000

100000

1e+06

1 10 100 1000 10000 100000

Acc

ess

coun

ts

Object number

second level domain namealpha = 1.11378

(b) Secondlevel domainname

Figure8. Domainnamepopularityin themit-dec00 trace.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fra

ctio

n of

req

uest

s

Fraction of query names

mit-jan00 mit-dec00

kaist-may01

Figure9. Cumulativefractionof requestsaccountedfor byDNSname,most popularfirst. The popularnamesappeartohave becomeeven morepopularin December2000com-paredto January2000,althoughthey arenot necessarilythesamenames.

However, it alsohasa long tail, anda large proportionofnamesthatareaccessedpreciselyonce.For instance,outof 302,032distinctnamesinvolvedin successfulA lookupsin mit-dec00 , there were 138,405unique namesac-cessedonly once,which suggeststhat a significantnum-ber of root querieswill occur regardlessof the cachingscheme.

Figure10showsthecumulativedistributionof TTL val-uesfor A andNS records.NS recordstendto have muchlongerTTL valuesthanA records.Thishelpsexplainwhyonly about20% of DNS responses(including both refer-rals andanswersin Table1) comefrom a root or gTLDserver. If NS recordshad lower TTL values,essentiallyall of theDNS lookuptraffic observed in our tracewouldhave goneto a root or gTLD server, whichwould have in-creasedthe loadon themby a factorof aboutfive. GoodNS-recordcachingis thereforecritical to DNSscalability.

Figure 10 shows how TTL valuesare distributed, but

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000

CD

FW

TTL (sec)

1 day

1 hour

ANS

Figure10. TTL distribution in themit-dec00 trace.

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000 100000

CD

FW

TTL (sec)

5 minutes

15 minutes

1 hour

1 daymit-jan00mit-dec00

kaist-may01

Figure 11. TTL distribution weightedby accesscountsformit-jan00 , mit-dec00 andkaist-may01 traces.

doesnot considerhow frequentlyeachnameis accessed.If it turnsout (asis plausible)thatthemorepopularnameshave shorterTTL values,thenthecorrespondingeffect oncachingwouldbeevenmorepronounced.Figure11showsthe TTL distribution of names,weightedby the fractionof TCP connectionsthat were madeto eachname. Weshow this for both mit-jan00 and mit-dec00 , anddraw two key conclusionsfrom this. First, it is indeedthe

Page 13: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

casethatshorter-TTL namesaremorefrequentlyaccessed,which is consistentwith the observation that DNS-basedload-balancing(the typical reasonfor low TTL values)makessenseonly for popularsites. Second,the fractionof accessesto relatively short(sub-15minute)TTL valueshasdoubled(from 12% to 25%) in 2000 from our site,probablybecauseof the increaseddeployment of DNS-basedserver selectionandcontentdistribution techniquesduring2000.

B. Trace-drivenSimulationAlgorithm

To determinethe relative benefits of per-client andsharedDNS cachingof A-records,we conducteda tracedriven simulation of cachebehavior underdifferent ag-gregation conditions. First, we pre-processedthe DNSanswersin the traceto form two databases.The “namedatabase”mapsevery IP addressappearingin anA answerto the domain namein the correspondinglookup. The“TTL database”mapseachdomainnameto the highestTTL appearingin anA recordfor thatname.After build-ing thesedatabases,thefollowing stepswereusedfor eachsimulationrun.1. Randomlydivide theTCPclientsappearingin thetraceinto groupsof size � . Give eachgroupits own simulatedsharedDNScache,asif thegroupsharedasingleforward-ing DNSserver. Thesimulatedcacheis indexedbydomainname,andcontainsthe (remaining)TTL for that cachedname.2. For eachnew TCP connectionin the trace,determinewhich client is involved by looking at the “inside” IP ad-dress;let that client’s groupbe � . Usethe outsideIP ad-dressto index into the namedatabaseto find the domainname� that theclient would have looked up beforemak-ing theTCPconnection.3. If � exists in � ’s cache,and the cachedTTL hasnotexpired,recorda “hit.” Otherwise,recorda “miss.”4. On a miss,make anentry in � ’s cachefor � , andcopytheTTL from theTTL databaseinto the � ’s cacheentry.

At the end of eachsimulationrun, the hit rate is thenumberof hitsdividedby thetotalnumberof queries.

This simulationalgorithmis drivenby theIP addressesobserved in the tracedTCP connections,ratherthan do-mainnames,becauseDNS queriesthathit in local cachesdo not appearin the traces. This approachsuffers fromtheweaknessthatmultiple domainnamesmaymapto thesameIP address,assometimesoccursatWebhostingsites.This may causethe simulationsto overestimatethe DNShit rate. Thesimulationalsoassumesthateachclient be-longs to a single cachinggroup, which may not be trueif a client usesmultiple local forwarding DNS servers.However, becauseDNSclientstypically queryserversin a

0

20

40

60

80

100

10 20 30 40 50 60 70 80 90 100

Hit

rate

(%

)�

Group size

mit-jan00mit-dec00

kaist-may01

Figure12. Effect of the numberof clientssharinga cacheoncachehit rate.

strictly sequentialorder, thismaybeareasonableassump-tion to make.

C. Effectof Sharingon Hit Rate

Figure12 shows the hit ratesobtainedfrom the simu-lation, for a rangeof differentcachinggroupsizes.Eachdatapoint is the averageof four independentsimulationruns. With a groupsizeof 1 client (no sharing),theaver-ageper-connectioncachehit rateis 64%for mit-jan00 .At the oppositeextreme,if all 992 tracedclientsshareasinglecache,theaveragehit rateis 91%for mit-jan00 .However, mostof thebenefitsof sharingareobtainedwithasfew as10or 20clientspercache.

The fact that domainnamepopularity hasa Zipf-likedistribution explains theseresults. A small numberofnamesarevery popular, andeven small degreesof cachesharingcantake advantageof this. However, the remain-ing namesarelarge in numberbut areeachof interesttoonly a tiny fraction of clients. Thus very large numbersof clientsarerequiredbeforeit is likely that two of themwouldwish to look upthesameunpopularnamewithin itsTTL interval. Most cacheablereferencesto thesenamesarelikely to besequentialreferencesfrom thesameclient,which areeasily capturedwith per-client cachesor eventheper-applicationcachesoftenfoundin Webbrowsers.

D. Impactof TTL onHit Rate

The TTL valuesin DNS recordsaffect cacheratesbylimiting the opportunitiesfor reusingcacheentries. If aname’s TTL is shorterthanthe typical inter-referencein-terval for thatname,cachingwill not work for thatname.Oncea name’s TTL is significantlylongerthanthe inter-referenceinterval, multiple referencesarelikely to hit inthecachebeforetheTTL expires.Therelevantinterval de-pendson thename’s popularity:popularnameswill likelybe cachedeffectively even with shortTTL values,whileunpopularnamesmaynot benefitfrom cachingevenwith

Page 14: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Hit

rate

(%

)�

TTL (sec)

group size = 1group size = 25

group size = 992

(a) mit-jan00

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Hit

rate

(%

)�

TTL (sec)

group size = 1group size = 25

group size = 1233

(b) mit-dec00

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Hit

rate

(%

)�

TTL (sec)

group size = 1group size = 25

group size = 5754

(c) kaist-may01

Figure13. Impactof TTL on hit rate.

very longTTL values.In turn,aname’s popularityamonga groupof clientsthatsharea cacheis to someextentde-terminedby thenumberof clientsin thegroup.

To gaugethe effect of TTL on DNS cachehit rate,weperformsimulationsusinga small modificationto theal-gorithm describedin SectionV-B. Insteadof usingTTLvaluestakenfrom theactualDNS responsesin thetraces,thesesimulationssetall TTL valuestospecificvalues.Fig-ure 13 shows the results,with onegraphfor eachof thethreetraces.Eachgraphshows the hit rateasa function

of TTL. Sincetheresultsdependon thenumberof clientsthatshareacache,eachgraphincludesseparatecurvesforper-client caches,groupsof 25 clientspercache,andonecachesharedamongall clients.We usea groupof size25becauseSectionV-C showedthatfor theactualTTL distri-bution observed in our traces,a groupsizeof 25 achievesessentiallythesamehit-rateastheentireclient populationaggregatedtogether.

As expected,increasingTTL valuesyield increasinghitrates.However, theeffecton thehit rateis noticeableonlyfor TTL valueslessthanabout1000seconds.Most of thebenefitof cachingis achieved with TTL valuesof only asmallnumberof minutes.This is becausemostcachehitsareproducedby singleclientslooking up thesameservermultipletimesin quicksuccession,apatternprobablypro-ducedby Webbrowsersloadingmultiple objectsfrom thesamepageor usersviewing multiple pagesfrom thesameWebsite.

Theseresultssuggestthat giving low TTL valuesto Arecordswill not significantly harm hit rates. Thus, forexample,the increasinglycommonpracticeof using lowTTL valuesin DNS-basedserver selectionprobablydoesnotaffecthit ratesmuch.

In general,cachingof A recordsappearsto have limitedeffectivenessin practiceandin potential.Eveneliminatingall A-recordcachingwouldincreasewide-areaDNStrafficby at mosta factorof four, almostnoneof which wouldhit a root or gTLD server. Eliminating all but per-clientcachingwould little more thandoubleDNS traffic. Thisevidencefavors recentshifts towardsmoredynamic(andlesscacheable)usesof DNS,suchasmobilehostlocationtrackingandsophisticatedDNS-basedserver selection.

We concludethis sectionby notingthatwe do not sug-gestthat it is a goodideato make theTTL valueslow onNS-records.On thecontrary, doingsowould increasetheloadontherootandgTLD serversby abouta factorof fiveandsignificantlyharmDNSscalability.

VI. CONCLUSION

This paperpresenteda detailedanalysisof tracesofDNS and associatedTCP traffic collectedon the Inter-net links of the MIT Laboratoryfor ComputerScienceand the KoreaAdvancedInstitute of Scienceand Tech-nology. We analyzedtheclient-perceived performanceofDNS, includingthelatency to receive answers,theperfor-manceof theDNSprotocol,theprevalenceof failuresanderrors,and the interactionswith root/gTLD servers. Weconductedtrace-driven simulationsto studytheeffective-nessof DNS cachingasa functionof TTL anddegreeofcachesharing.

A significantfraction of lookupsnever receive an an-

Page 15: DNS Performance and the Effectiveness of Cachingconferences.sigcomm.org/imc/2001/imw2001-papers/89.pdfDNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari

swer. For instance,in themostrecentMIT trace,23%oflookupsreceivenoanswer;theselookupsaccountfor morethanhalf of all tracedDNS packetsin thewide-areasincethey areretransmittedquitepersistently. In addition,about13%of all lookupsresultin ananswerthatindicatesafail-ure.Many of thesefailuresappearto becausedby missinginverse(IP-to-name)mappingsor NSrecordsthatpoint tonon-existent or inappropriatehosts. We also found thatover a quarterof thequeriessentto theroot nameserversresultin suchfailures.

Our trace-driven simulationsyield two findings. First,reducingthe TTLs of address(A) recordsto as low asafew hundredsecondshaslittle adverseeffect on hit rates.Second,little benefitis obtainedfrom sharinga forward-ing DNS cacheamongmorethan10 or 20 clients. Theseresultssuggestthat theperformanceof DNS is not asde-pendenton aggressive cachingas is commonlybelieved,andthatthewidespreaduseof dynamic,low-TTL A-recordbindingsshouldnot degradeDNS performance.The rea-sonsfor thescalabilityof DNS areduelessto thehierar-chicaldesignof its namespaceor goodA-recordcaching(asseemsto bewidely believed), thanto thecacheabilityof NS-recordsthatefficiently partitionthenamespaceandavoid overloadingany singlenameserver in theInternet.

ACKNOWLEDGMENTS

WearegratefultoGarrettWollmanandMaryAnnLadd,who run thenetwork infrastructureat MIT LCS, for facil-itating this collectionprocessandfor their prompthelpatvarioustimes.WethankDorothyCurtisfor helpwith datacollection.This studywould not have beenpossiblewith-out the cooperationof the LCS andAI Lab communitiesat MIT. We arealsogratefulto Heonkyu Park andTaejinYoonwho helpedcollect tracesat KAIST. We alsothankSarahAhmedandDavid Andersenfor severalusefuldis-cussions,andNick Feamsterfor usefulcomments.

REFERENCES

[1] P. Danzig,K. Obraczka,andA. Kumar, “An Analysisof Wide-Are NameServer Traffic: A Studyof theInternetDomainNameSystem,” in Proc ACM SIGCOMM, Baltimore,MD, Aug. 1992,pp.281–292.

[2] K. Frazer, “NSFNET: A Partnership for High-SpeedNet-working,” http://www.merit.edu/merit/archive/nsfnet/final.report/ , 1995.

[3] K. Thompson,G. Miller, andR. Wilder, “Wide-areaTraffic Patt-ternsandCharacteristics,” IEEE Network, vol. 11,no.6, pp.10–23,November/December1997.

[4] H. Balakrishnan,V. Padmanabhan,S. Seshan,M. Stemm,andR. Katz, “TCP Behavior of a Busy Web Server: AnalysisandImprovements,” in Proc. IEEE INFOCOM, SanFrancisco,CA,Mar. 1998,vol. 1, pp.252–262.

[5] A. SnoerenandH. Balakrishnan,“An End-to-EndApproachtoHostMobility,” in Proc.6thACMMOBICOM, Boston,MA, Aug.2000,pp.155–166.

[6] J. Loughney and J. Yu, Roaming Support with DNS, July2000, Internet-Draft draft-loughney-enum-roaming-00.txt;expires January2001. Available from http://www.ietf.org/internet-drafts/draft-loughney-enum -roaming-00.txt .

[7] P. Mockapetris, DomainNames—Conceptsand Facilities, Nov.1987,RFC1034.

[8] P. Mockapetris,Domainnames—ImplementationandSpecifica-tion, Nov. 1987,RFC1035.

[9] P. Mockapetrisand K. Dunlap, “Developmentof the DomainNameSystem,” in Proc. ACM SIGCOMM, Stanford,CA, 1988,pp.123–133.

[10] A. Kumar, J.Postel,C. Neuman,P. Danzig,andS.Miller, Com-monDNSImplementationErrorsandSuggestedFixes, Oct.1993,RFC1536.

[11] L. Breslau,P. Cao, L. Fan, G. Phillips, and S. Shenker, “Onthe implicationsof Zipf ’s law for web caching,” TechnicalRe-port CS-TR-1998-1371,Universityof Wisconsin,Madison,Apr.1998.

[12] M. Crovella andA. Bestavros, “Self-similarity in World WideWebTraffic: EvidenceandPossibleCauses,” IEEE/ACM Trans-actionsonNetworking, vol. 5, no.6, pp.835–846,Dec.1997.

[13] A. Wolman,G. Voelker, N. Sharma,N. Cardwell,A. Karlin, andH. Levy, “On thescaleandperformanceof cooperativewebproxycaching,” in 17thACM SOSP, KiawahIsland,SC,Dec.1999,pp.16–31.

[14] S. Glassman, “A CachingRelay for the World-Wide Web,”Computer Networks and ISDN Systems, vol. 27, pp. 165–173, 1994, http://www.research.compaq.com/SRC/personal/steveg/CachingTheWeb/paper.htm l .

[15] A. Shaikh,R. Tewari, andM. Agrawal, “On theEffectivenessofDNS-basedServer Selection,” in Proc. IEEE INFOCOM, April2001.

[16] C. Wills andH. Shang,“The Contributionof DNSLookupCoststo WebObjectRetrieval,” Tech.Rep.TR-00-12,WorcesterPoly-technic Institute, July 2000, http://www.cs.wpi.edu/˜cew/papers/tr00-12.ps.gz .

[17] E. CohenandH. Kaplan, “Proactive Cachingof DNS Records:Addressinga PerformanceBottleneck,” in Proc.Symp.onAppli-cationsandtheInternet(SAINT), SanDiego,CA, January2001.

[18] R.Rivest,TheMD5 Message-DigestAlgorithm, April 1992,RFC1321.

[19] G. Minshall, “Tcpdpriv,” http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html , Aug. 1997.

[20] V. Jacobson,C. Leres,andS. McCanne, “tcpdump,” http://www.tcpdump.org/ .

[21] V. Paxson, “End-to-EndInternetPacket Dynamics,” in Proc.ACM SIGCOMM, Cannes,France,Sept.1997,pp.139–152.