Cache-and-Query for Wide Area Sensor Databasessknath/papers/D+03.pdf · 2003-02-18 ·...

12
Cache-and-Query for Wide Area Sensor Databases Amol Deshpande Suman Nath Phillip B. Gibbons Srinivasan Seshan Intel Research Pittsburgh U.C. Berkeley Carnegie Mellon University ABSTRACT Webcams, microphones, pressure gauges and other sensors pro- vide exciting new opportunities for querying and monitoring the physical world. In this paper we focus on querying wide area sen- sor databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We present the first scalable system for executing XPATH queries on such databases. The system main- tains the logical view of the data as a single XML document, while physically the data is fragmented across any number of host nodes. For scalability, sensor data is stored close to the sensors, but can be cached elsewhere as dictated by the queries (auto-tuning). Our design enables self-starting distributed queries that jump directly to the lowest common ancestor of the query result, dramatically re- ducing query response times. We present a novel query-evaluate- gather technique (using XSLT) for detecting (1) which data in a local database fragment is part of the query result, and (2) how to gather the missing parts. We define partitioning and cache invari- ants that ensure that even partial matches on cached data are ex- ploited and that correct answers are returned, despite our dynamic query-driven caching. Experimental results demonstrate that our techniques dramatically increase query throughputs and decrease query response times in wide area sensor databases. 1. INTRODUCTION From webcams to smart dust, pervasive sensors are becoming a reality, providing opportunities for new sensor-based services. Consider for example a Parking Space Finder service for locating available parking spaces near a user’s destination. The driver en- ters into a PDA (or a car navigation system) her destination and her criteria for desirable parking spaces — e.g., within two blocks of her destination, at least a four hour meter, minimum price, etc. She gets back directions to an available parking space satisfying her criteria. If the space is taken before she arrives, the directions are automatically updated to head to a new parking space. What is required to create such a Parking Space Finder service for a large metropolitan area? First, sensors are needed that can determine whether or not a parking space is available. We en- vision a large collection of webcams overlooking parking spaces, pressure sensors on the spots themselves, or some combination of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Submitted and accepted to SIGMOD ’03 San Diego, California USA Copyright 2003 ACM ...$5.00. both. Second, a sensor database is needed, which stores the current availability information along with static attributes of the parking spaces such as the price and hours of their meters. Finally, a web- accessible query processing system is needed, to provide answers to high-level user queries. Given the large number of sensors spread over a wide area, and the desire to support high query volumes, what is a good sensor database system architecture? Our group has developed a wide area sensor database system as part of the Internet-scale Resource- Intensive Sensor Network services (IrisNet) project, with the fol- lowing architectural design. For high update and query throughputs, the sensor database is partitioned across multiple sites operating in parallel, and the queries are directed to the sites containing the answers. We assume that sites are Internet-connected (powered) PCs. A site manager runs on each site. Communication with the sensors is via sensor proxies [26], which collect nearby sensor data, process/filter it to extract the desired data (e.g., parking space availability information), and send update queries to the site that owns the data (and possibly other sites). As in [26], we assume that sensor prox- ies run on Internet-connected (powered) PCs. 1 A large col- lection of sensor proxies operating in parallel ensure that the system scales to a larger number of sensors. Moreover, a potentially high volume of data (webcam feeds) is reduced at the sensor proxy to a much smaller volume of data (e.g., availability information) sent to site managers. To make posing queries easy for users, IrisNet supports a standard data representation (XML), a standard high-level query language (XPATH), and a logical view of its distributed database as a single centralized database (data transparency) 2 . XML is used to accommodate a heterogeneous and evolv- ing set of data types, aggregate fields, etc., best captured by self-describing tags. Moreover, each sensor takes readings from a geographic location, so it is natural to organize sensor data into a geographic/political-boundary hierarchy (as de- picted in Figure 1) — again, a good match for XML. We use an off-the-shelf XML database system, and interact with the database using standard APIs (XPATH or XSLT for query- ing, and SixDML or XUpdate for updates), in order to take advantage of future improvements in XML database technol- ogy. 1 For example, webcams are typically attached to PCs that can host a sensor proxy. For battery-powered motes, the sensor proxy is the powered base station communicating with the motes [26]. 2 There is one exception to our data transparency: user queries can specify a degree of tolerance for answers based on stale (cached) data, as discussed in Section 4. 1

Transcript of Cache-and-Query for Wide Area Sensor Databasessknath/papers/D+03.pdf · 2003-02-18 ·...

Cache-and-Quer y for Wide Area Sensor Databases

Amol Deshpande��� �

Suman Nath��� �

Phillip B. Gibbons�

Srinivasan Seshan��� �

�Intel Research Pittsburgh

�U.C. Berkeley

�Carnegie Mellon University

ABSTRACTWebcams,microphones,pressuregaugesand other sensorspro-vide exciting new opportunitiesfor queryingand monitoring thephysicalworld. In this paperwe focuson queryingwideareasen-sordatabases, containing(XML) dataderivedfrom sensorsspreadovertensto thousandsof miles.Wepresentthefirst scalablesystemfor executingXPATH queriesonsuchdatabases.Thesystemmain-tainsthelogicalview of thedataasasingleXML document,whilephysicallythedatais fragmentedacrossany numberof hostnodes.For scalability, sensordatais storedcloseto the sensors,but canbecachedelsewhereasdictatedby thequeries(auto-tuning).Ourdesignenablesself-startingdistributedqueriesthat jump directlyto thelowestcommonancestorof thequeryresult,dramaticallyre-ducingqueryresponsetimes. We presenta novel query-evaluate-gather technique(using XSLT) for detecting(1) which datain alocal databasefragmentis partof thequeryresult,and(2) how togatherthemissingparts. We definepartitioningandcacheinvari-antsthat ensurethat even partial matcheson cacheddataareex-ploitedandthatcorrectanswersarereturned,despiteour dynamicquery-driven caching. Experimentalresultsdemonstratethat ourtechniquesdramaticallyincreasequery throughputsanddecreasequeryresponsetimesin wideareasensordatabases.

1. INTRODUCTIONFrom webcamsto smartdust, pervasive sensorsare becoming

a reality, providing opportunitiesfor new sensor-basedservices.Considerfor examplea ParkingSpaceFinderservicefor locatingavailableparkingspacesneara user’s destination.The driver en-ters into a PDA (or a car navigation system)her destinationandhercriteria for desirableparkingspaces— e.g.,within two blocksof herdestination,at leasta four hour meter, minimumprice,etc.Shegetsbackdirectionsto an available parkingspacesatisfyinghercriteria. If thespaceis takenbeforeshearrives,thedirectionsareautomaticallyupdatedto headto a new parkingspace.

What is requiredto createsucha Parking SpaceFinderservicefor a large metropolitanarea? First, sensorsareneededthat candeterminewhetheror not a parking spaceis available. We en-vision a large collectionof webcamsoverlookingparkingspaces,pressuresensorson thespotsthemselves,or somecombinationof

Permissionto make digital or hardcopiesof all or part of this work forpersonalor classroomuseis grantedwithout fee provided that copiesarenot madeor distributedfor profit or commercialadvantageandthatcopiesbearthisnoticeandthefull citationon thefirst page.To copy otherwise,torepublish,to postonserversor to redistributeto lists,requiresprior specificpermissionand/ora fee.Submittedandacceptedto SIGMOD’03 SanDiego,CaliforniaUSACopyright 2003ACM ...$5.00.

both.Second,asensordatabaseis needed,whichstoresthecurrentavailability informationalongwith staticattributesof the parkingspacessuchasthepriceandhoursof their meters.Finally, a web-accessiblequeryprocessingsystemis needed,to provide answersto high-level userqueries.

Giventhelargenumberof sensorsspreadover a wide area,andthe desireto supporthigh queryvolumes,what is a good sensordatabasesystemarchitecture? Our group hasdevelopeda wideareasensordatabasesystemaspartof theInternet-scaleResource-Intensive SensorNetwork services(IrisNet) project,with the fol-lowing architecturaldesign.

� For high updateandquerythroughputs,thesensordatabaseis partitionedacrossmultiple sitesoperatingin parallel,andthe queriesaredirectedto the sitescontainingthe answers.We assumethatsitesareInternet-connected(powered)PCs.A sitemanager runsoneachsite.

� Communicationwith the sensorsis via sensorproxies[26],which collect nearbysensordata,process/filterit to extractthedesireddata(e.g.,parkingspaceavailability information),andsendupdatequeriesto the site that owns the data(andpossiblyothersites).As in [26], weassumethatsensorprox-ies run on Internet-connected(powered)PCs.1 A largecol-lectionof sensorproxiesoperatingin parallelensurethatthesystemscalesto a larger numberof sensors.Moreover, apotentiallyhigh volumeof data(webcamfeeds)is reducedat the sensorproxy to a muchsmallervolumeof data(e.g.,availability information)sentto sitemanagers.

� To make posingquerieseasyfor users,IrisNet supportsastandarddata representation(XML), a standardhigh-levelquerylanguage(XPATH), andalogicalview of itsdistributeddatabaseasasinglecentralizeddatabase(datatransparency)2.XML is usedto accommodatea heterogeneousandevolv-ing setof datatypes,aggregatefields,etc.,bestcapturedbyself-describingtags. Moreover, eachsensortakes readingsfrom ageographiclocation,soit is naturalto organizesensordatainto a geographic/political-boundaryhierarchy(asde-pictedin Figure1) — again,agoodmatchfor XML. Weuseanoff-the-shelfXML databasesystem,andinteractwith thedatabaseusingstandardAPIs (XPATH or XSLT for query-ing, andSixDML or XUpdatefor updates),in orderto takeadvantageof futureimprovementsin XML databasetechnol-ogy.

1For example,webcamsaretypically attachedto PCsthatcanhosta sensorproxy. For battery-poweredmotes,thesensorproxy is thepoweredbasestationcommunicatingwith themotes[26].2Thereis oneexceptionto our datatransparency: userqueriescanspecifya degreeof tolerancefor answersbasedon stale(cached)data,asdiscussedin Section4.

1

� � � � � � �� � � �

� � � � � � �� � � �

city block

� � � � � �

� � � � � � �

usRegion

� �

state

� �

� �

neighborhood

� � � � � � � �

Figure 1: Logical SiteHierar chy

AlthoughwehaveusedParkingSpaceFinderasamotivatingap-plication,we envision IrisNet asa generalplatform for wide areasensorservices.For example,wearecurrentlyworkingwith ateamof oceanographersto deploy IrisNet along the Oregon coastline.Moreover, we believe that future wide areasensordatabases(notjust IrisNet) will build upona similar design,in orderto meetthedesiredperformancerequirements.3 Thusproviding techniquesforcorrectly and efficiently processingquerieson suchdatabasesiscrucial to the widespreaduseof wide areasensorsystems.How-ever, asoutlinedbelow, thereareanumberof challengesto provid-ing fastexecutionof XPATH querieson sucha system;this paperis thefirst to addressthesechallenges.

In this paper, we presenttechniquesfor query processingandcachingin wide areasensordatabases.The contributionsof thispaperare:

� We presentthe first scalablesystemfor executingXPATHquerieson wide areasensordatabases.The systemusesalogical hierarchyof sitesdictatedby the XML document;theselogicalsitesaremappedto asmallercollectionof phys-ical sites,asdictatedby thesystemadministratoror thequeriesthemselves.

� Wegiveatechniquefor self-startingdistributedqueries, whichjump directly to the lowestcommonancestor(LCA) of thequeryresult,by usingDNS-stylesite namesextractedfromthe queryitself. A key featureis that althougha querycanbe posedanywherein the Internet,no global informationisneededto producetheLCA sitenamefor thequery.

� We show how generalXPATH queriescanbe evaluatedona singleXML documentwhenthe documentitself is frag-mentedacrossmachines,andthefragmentationis constantlychanging.Weproposeanovel query-evaluate-gather(QEG)techniquefor detecting(1) which datain thedatabasefrag-mentatasiteis partof thequeryresult,and(2) how to gatherthemissingparts.To ourknowledge,noeffectivesolutiontothis problemwasknown prior to ourwork.

� OurQEGtechniqueusesXSLT programs,andweshow howtheseprogramscanbe generateddirectly from the XPATH

3Forexample,anobject-relationalwideareasensordatabasewouldfacemany of the sameissuesandwould greatlybenefitfrom thetechniquespresentedin this paper(seeSection6).

/usRegion[@id=’NE’]/state[@id=’NY’]/city[@id=’New York’]

/neighborhood[@id=’Soho’ OR @id=’Tribeca’]

/block[@id=’1’]/parkingSpace[available=’yes’]

Figure 2: An XPATH query requestingall available parkingspacesin Soho block 1 or Tribeca block 1.

<usRegion @id=’NE’>

<state @id=’NY’>

<city @id=’New York’>

<neighborhood @id=’Soho’>

<block @id=’1’>

<parkingSpace @id=’1’>

<available>yes</available>

</parkingSpace>

<parkingSpace @id=’2’>

<available>no</available>

</parkingSpace>

</block>

</neighborhood>

</city>

</state>

</usRegion>

Figure3: An XML fragment at the New York site.

query, without accessingthe database.Moreover, we showhow to avoid the overheadsof query-timecompilation ofXSLT programsby generatingthem(almost)alreadycom-piled.

� Wepresentaschemefor cachingqueryresultsatsitesasdic-tatedby the queries(auto-tuning).Owneddataandcacheddataarestoredin thesamesitedatabase,with differenttags,unifying the queryprocessingat a site. Moreover, we sup-portquery-basedconsistency, in whicheachuserquerymayspecifyits tolerancefor usingstale(cached)datato quicklyanswerthequery.

� We definepartitioningandcacheinvariantssupportingpar-tial match caching, which ensuresthatevenpartialmatchesoncacheddatacanbeexploitedandthatcorrectanswersarereturned,despiteourdynamicquery-drivencaching.Ourap-proachis basedon leveragingthe dataresidingin the sitedatabase,which differs from previous approachesbasedonleveraginga collectionof views.

� Wepresentexperimentalresultsdemonstratingtheeffective-nessof our techniquesin dramaticallyincreasingupdateandquery throughputsand decreasingquery responsetimes inwideareasensordatabases.

The remainderof this paperis organizedasfollows. Section2describesthequeryprocessingchallenges.Section3 presentsourbasicqueryprocessingandcachingtechniques.Thenin Section4,wedescribeextensionsto moregeneralXPATH queries,cachecon-sistency, andownershipmigration.Experimentalresultsarein Sec-tion 5. Section6 describesrelatedwork, andconclusionsare inSection7.

2. QUERY PROCESSINGCHALLENGESIn this section,we considera coreproblemin queryprocessing

in wide areasensordatabases,anddemonstratethe challengesinsolvingthisproblem.

We would like to evaluatean XPATH queryon a singleXMLdocumentthathasbeenfragmentedacrossmultiplesites.Consider

2

the examplequeryin Figure2 andthe documentfragment(at theNew York site)in Figure3. Thequeryasksfor all availableparkingspacesin two adjacentblocksof SohoandTribeca.If this queryisposedto theNew York site,parkingspace1 in Sohoblock1 will bereturned.Thechallengeis in determiningwhetherthis is theentireanswer. In particular, arethereotherparkingspacesin block 1 ofSoho? Moreover, no parkingspaceswerereturnedfrom Tribeca:wasthatbecausethey areall takenor thesitedatabasewasmissingTribeca? This informationcannotbe determinedsolely from thequeryanswer.

How might we try to solve this problem? First, we might trysplitting theXPATH queryinto two queries,onefor Sohoandonefor Tribeca,but we would not learnanything more. Second,wemight try augmentingthe databasewith placeholders(e.g., forTribeca)that tag missingdata. However, unlessthe site databasecontainsplaceholdersfor all datain New York, which is notascal-able solution, the XPATH query would not return all the neededplaceholders.E.g.,adding

<neighborhood @id=’Tribeca’ @tag=’placeholder’>

to the New York site would not changethe answer, becausethequerycalls for specificdatawithin theTribecaneighborhood.Fi-nally, we might try maintainingmetainformationaboutwhatfrag-mentof theXML documentis in thesitedatabase.Thereis atrade-off betweenhow muchmetadatawe would needandhow flexiblethepartitioningis — this approachmayrequiresignificantrestric-tionsonthefragmentation,otherwisethemetadatamaybeaslargeasthedatabaseitself! Moreover, givenanXPATH query, it is notclearhow to combinethismetadataoutsidethedatabasewith whatis inside the database.For example,supposethat neighborhoodshadanumberOfFreeSpots attributeandparkingspaceshadaprice

attribute,andthequeryasksfor availableno costspotsin block 1of SohoandTribeca:

/usRegion[@id=’NE’]/state[@id=’NY’]/city[@id=’New York’]

/neighborhood[@id=’Soho’ OR @id=’Tribeca’]

[@numberOfFreeSpots > 0]

/block[@id=’1’]/parkingSpace[available=’yes’][@price=’0’]

If thesitedatabasecontainedthisattributefor bothneighborhoods,and the metadatareflectedthis, then whetheror not we needtovisit thesite(s)containingTribecablockdatadependson thevalueof this attribute. In order to make this decision,we would needto determinethata specificsubqueryrequestingjust this attributeshouldbeposedto thesitedatabase,andthencombineits answerwith themetadata– clearlythis wouldnot beeasy. Giventhegen-eralityof XPATH queries,thistypeof scenariomayarisefrequentlyandin morecomplicatedcontexts.

A betterapproachis to includeinsidethedatabaseany metain-formation on what fragmentof the XML documentis presentatthe site, as tag attributesassociatedwith the relevant portionsofthe database.Theseattributesmust indicatenot only what datais presentor missing,but which sitesto visit to find the missingdata. As illustratedabove,XPATH is insufficiently powerful to ef-fectively usethesetags. In thenext section,we presentour novelsolutionto thesechallenges,basedonthemorepowerful XSLT lan-guage.Our techniquesolvesthesefragmentationchallenges,andmoreover, it enablesveryflexible queryresultcachingat thesites.

3. OUR SOLUTIONIn this section,we describeour solutionto the challengesout-

lined in the previous section. We begin with an overview of thesystembeforedescribingthevariouscomponentsin detail.

3.1 OverviewOurqueryprocessingsystemstartswith theXML documentde-

fined by the service. For simplicity, assumethat the documentisfixed: only the valuesof attributesandfields arechanging.4 Thedocumentdefinesa logicalhierarchyon thedata(seeFigure1). Atany point in time, we have partitionedownership for fragmentsofthedocumentto a collectionof sites(discussedin Section3.2). Asitemayalsocachedataownedby othersites(Section3.3).

UsersposeXPATH querieson the view of the dataasa singleXML document.Thequeryselectsdatafrom a setof nodesin thehierarchy. Thequeryis sentdirectly to thelowestcommonances-tor of thesenodes,usingour techniquefor self-startingdistributedqueries(Section3.4). Thereit begins a potentiallyrecursive pro-cess,which we denotequery-evalute-gather(Section3.5). Uponreceiving a query, the site managerqueriesits local databaseandcache,andevaluatestheresult.If necessary, it gathersmissingdataby sendingsubqueriesto othersites,whomayrecursively queryad-ditional sites,andsoon. Finally thesubqueryanswersreturnfromtheothersites,andthecombinedresultsaresentbackto theuser.

XPATH queries supported. Throughoutthis paper, we take thecommonapproachof viewing anXML documentasunordered, inthatwe ignoreany orderingbasedsolelyon thelinearizationof thehierarchyinto a sequentialdocument.For example,althoughsib-lings mayappearin thedocumentin a particularorder, we assumethat siblingsareunordered,asthis matchesour datamodel. Thuswe focuson theunordered fragmentof XPATH, ignoring the fewoperatorssuchas "!$#&%(')%*!,+.-*/ or axes like following-siblingsthatareinappropriatefor unordereddata.We support(andhave imple-mented)theentireunorderedfragmentof XPATH 1.0.

3.2 Data partitioningTherearetwo distinctaspectsto datapartitioning: dataowner-

ship, which dictateswho ownswhatdata,anddatastorage, whichdescribesthe actualdatastoredat eachsite. Our partitioningschemeis basedon a seriesof partitioningrules,tags(i.e., specialattributes),andinvariantsthatmustbemaintainedto ensurecorrectanswers.Ourschemeusesthefollowing definitions.

IDable nodes.Weassociatewith certain(element)nodesin a doc-umentan “id” attribute (seeFigure3). This id attribute is in thespirit of a DTD’s ID type.5 However, we requireonly that the idbeuniqueamongits siblingswith thesameelementname(e.g.,theparkingSpace siblingswithin a block), whereasID typeattributesmusthave globally uniquevalues.This distinctionis quitehelpfulfor us,becauseourdocumentis fragmentedacrossmany sites,soitwould bedifficult to ensureglobally uniquenames.Moreover, thevaluesfor our id attributesareshortnamesthatmake senseto theuserquery(e.g.,New York).

DEFINITION 3.1. A node in a documentis called an IDablenodeonly if (1) it hasa uniqueid amongits siblingswith thesameelementname, and(2) its parent is an IDable node. Theroot nodeof a documentis alsoan IDablenode. TheID of an IDablenodeisdefinedto beits 0214351&671&+8'9+;:<671>=?%*@BA pair.

4Themoregeneralscenariosareaddressedin Section4.5A DTD definestheschemafor XML documents,includingrestric-tionson attributevalues.

3

neighborhood Cid = 'Soho' D

zipcode = '10012' E

block Fid =' 1' D

available-spaces Gblock Fid =' 2'

pSpace Hid = '1' D

in-use D

GPS I

price H

25 cents J

no C

pSpace Hid = '2' D pSpace H

id = '1' D

8 K

Figure4: IDable Nodesin a Document

Figure4 shows anexampledocument,andits IDablenodes(de-notedin bold). NotethateachIDablenodecanbeuniquelyidenti-fied by thesequenceof IDs on thepathfrom theroot to thenode.We will frequentlyexploit this featurein our system.For instance,IDablenodescanbeplacedon differentsitesthantheir parentsorsiblings,withoutany ambiguity.

DEFINITION 3.2. The local information of an IDable node +is definedto be the documentfragmentcontaining: (1) all the at-tributesof + , (2) all its non-IDablechildrenandtheir descendants,and (3) the IDs of its IDable children. The local ID informationof an IDable node + is definedto be the documentfragmentthatcontains(1) the ID of the node, and (2) the IDs of all its IDablechildren.

Thusthelocal ID informationof anIDablenodeis a subsetof thelocal informationof thatnode.For example,the local informationof theSohonodein Figure4 is:<neighborhood @id=’Soho’ @zipcode=’10012’>

<block @id=’1’></block>

<block @id=’2’></block>

<available-spaces>8</available-spaces>

</neighborhood>

Its local ID informationis:<neighborhood @id=’Soho’>

<block @id=’1’></block>

<block @id=’2’></block>

</neighborhood>

Notethatthedocumentfragmentscorrespondingto thelocal in-formationsof the IDablenodesform a nearly-disjointpartitioningof the original document,with the only overlapbeing the IDs ofthe IDable nodes. Also, the notion of an IDable nodecould beextendedto includeany nodewith a uniquelynamedroot-to-nodepath(suchasavailable-spaces in Figure4); however, this wouldcomplicateour techniquebelow for self-startingqueries.

Data ownership. We permit eachsite to own an arbitrarysetofnodesfrom thedocument,underthefollowing constraints:

� Eachnodein the documentmustbe ownedby exactly onesite.

� A nodemayhave a differentowner thanits parentonly if itis anIDablenode.

This enablesconsiderableflexibility for partitioning.For example,a sitemayown a node,a few of its grandchildrenandcousins,butnot the interveningnodesin the hierarchy. Our queryprocessingalgorithmsmustensurecorrectanswersin thepresenceof any suchpartitionings.

Theownerof anodeis ultimatelyresponsiblefor any queriesonthatnode.By requiringthatonly anIDablenodemaybeon a dif-ferentsitethanits parent,we ensurethatany suchnodeis globallyaddressable.(We describein Section3.4 how the systemlocatesthesitecorrespondingto thenode.)

Data stored at eachnode. The datathat is storedat eachsite isessentiallytheunionof the local informationsandthelocal ID in-formationsof a setof nodes.Therearetwo invariantswe maintainon the storeddata: (I1) eachsite muststorethe local informationfor thenodesit owns,and(I2) if (at least)theID of anodeis stored,thenthelocal ID informationof its parentmustalsobestored.Notethat this implies that the local ID informationof all ancestorsofsucha nodeis alsostored.

A specialattributecalledstatus(meaningfulonly for IDablenodes)is usedto summarizethestoreddatafor a node,in orderto ensurecorrectandefficientqueryprocessing.It hasfour possiblevalues:

� owned:Thesiteownsthis node.By invariants(I1) and(I2),it hasthelocal informationof thenodeandat leastthelocalID informationof all its ancestors.

� complete:Thesitehasthesameinformationstoredasowned,exceptthatit doesnotown thenode.

� ID-complete:The site hasthe local ID informationfor thisnodeandthe local ID informationof all its ancestors,but itdoesnothave all thelocal informationfor thenode.

� incomplete:For this node,thesitehasonly its ID.

Our systemmaintainsthe invariant that eachnodeat a site fallsinto oneof thesefour categories(a non-IDablenodeis implicitlyassumedto have thesamestatusasits lowestIDableancestor).

Intuition behind this structur e. Whathave we accomplishedbythesepartitioningrules,specialattributes,andinvariants?If a sitehasinformationabouta node(beyondjust its ID), it knowsat leasttheIDs of all its IDable children,andalsotheIDs of all its ances-tors and their siblings. Thuswhena query is routedto this site,it can either answerit using the documentfragmentit has,or itknows which partsaremissing(the missingpartswill alwaysbeIDable nodesandthe informationin the subtreesbelow them). Itcan then contactthe appropriatesitesthat own the missingparts(becauseit hastheir uniquenames)andget the informationfromthem.

As such,any given site hasin its documentfragmentthe infor-mationneededto gatheran answerto a query, even if it doesnothave all thedataitself.

3.3 CachingOur goalsare(1) to have considerableflexibility for replicating

dataat multiple siteson thefly, and(2) to enableefficient andcor-rectqueryprocessingdespitethisdynamiccaching.Thekey obser-vation is that the schemeoutlinedin Section3.2 is well-suitedtoaccomplishingthesegoals.

Storing additional data. A site canaddto its currentdocumentany documentfragmentthatsatisfies:(C1) thedocumentfragmentis a unionof local informationsor local ID informationsfor a cer-tain setof nodes,and(C2) if thedocumentfragmentcontainslocalinformationor local ID informationfor a node,it alsocontainsthe

4

local ID informationfor its parent(henceall its ancestors).Thenby merL ging this new documentfragmentwith the existing docu-ment,weareguaranteedto maintaininvariants(I1) and(I2) above.Moreover, updatingthestatusattributesis straightforward.

Caching query results. An importantspecialcaseof the aboveis the cachingof query results. Recall that we route a query toits LCA andthenrecursively progressdown the hierarchyto pulltogethertheanswer. In our currentprototype,whenever a (partial)answeris returnedto a site,we cachetheanswerat thesite. Thusa site manageraggressively cachesany datathat it sees.This hastwo benefits. First, subsequentquerieson the samedatacan beansweredby thesite,therebyavoiding theexpenseof gatheringupthe answeragain.6 Second,it automaticallytunesthe creationofadditionalcopiesto thequeryworkload;suchcopiescandistributethe workloadover multiple sites. To make this cachingpossible,we generalizethe subqueriessentto sites,makingthemfetch thesmallestsupersetof theanswerthatsatisfies(C1) and(C2) above.(This doesnot alter theanswerreturnedto theuser. DetailsareinSection3.5.)

Onecouldchooseto useourgeneralizedsubqueriesandcachingtechniquesonly whentheworkloadseemsto dictateit. However,we found that thecostof initially transferringadditionaldatawasminimalcomparedto thesignificantgainsachievedby caching(seeSection5).

Partial match caching. A key featureof ourcachingschemeis itsability to supportpartial match caching, which ensuresthat evenpartial matcheson cacheddatacan be exploited and that correctanswersare returned. Becauseour invariantsensurethat we canalwaysusewhatever datawe have in a sitedatabase,andfetchanymissingpartsof the answer, we can provide very flexible partialmatchcaching. For example,the queryin Figure2 may usedatafor Sohocachedat theNew York site,eventhoughthis datais onlya partial matchfor the new query. Similarly, if distinct SohoandTribeca queriesresult in the databeingcachedat New York, thequerymay usethe mergedcacheddatato immediatelyreturnananswer. Even if the earlierquerieshave differentpredicates,ourgeneralizationof subqueriesmayenablethelaterqueriesto usethecacheddata.Notealsothata sitecandeterminewhena collectionof querieshasresultedin all theIDablesiblingsbeingcached,andhencerespondto queriesover all suchsiblings(subsumption). Forexample,supposethereare threeneighborhoodsdowntown, mid-town, anduptown in a city Brooktown, all on differentsitesthanthecity node.Thenindependentqueriesthatcausethethreeneigh-borhoodsto be cachedat the Brooktown node,would leadto thequery

... /city[@id=’Brooktown’]/neighborhood/block

/parkingSpace[@available = ’yes’]

beingcorrectlyansweredfrom just theBrooktown site. (Thisqueryrequestsall available parkingspacesin Brooktown.) This is be-causeinvariant (I1) above ensuresthat Brooktown always main-tainsthe IDs of all its neighborhoods,so it candetectwhenit hasall of its neighborhoodscached.

Evicting data. Any datacanbe removed as long as it is alwaysremoved in unitsof local informationsor local ID informationsofIDablenodesandtheconditionsoutlinedabove arestill valid aftertheremoval.

In summary, ourschememakesit easytoprovideflexiblecaching.Themainchallengeis to do theabove operationsefficiently, with-out having to fetch theentiredocumentinto memoryandwithouttouchingany moreof thedocumentthannecessary. As it turnsout,

6Issuesof stalenessof cacheddataarediscussedin Section4.

thistaskis accomplishedusingthemechanismfor queryprocessingin general,whichwe discussin Section3.5.

3.4 Finding sitesIn this subsection,we discusshow thesystemcandeterminethe

IP addressfor any siteneededduringqueryprocessing.Recallthatthemappingof IDablenodesto sitesis arbitraryanddynamicallychanging.However, thereareonly two situationsin which IP ad-dressesare neededduring queryprocessing:(1) when the queryinitially is posedby ausersomewherein theInternet,and(2) whenasitemanagerdeterminesthatit needsto poseasubqueryto aspe-cific IDablenode.Weconsidereachsituationin turn.

Self-starting distrib uted queries.Usersanywhereon theInternetcanposequeries.For scalability, weclearlydonotwantto sendallqueriesto thesite(s)hostingtherootnodeof thehierarchy. Instead,ourgoalis to sendthequerydirectlyto thelowestcommonancestor(LCA) of thequeryresult.But how dowefind thesitethatownstheLCA node,giventhelargenumberof nodesandthedynamicmap-pingof nodesto sites?Oursolutionis (1) to haveDNS-stylenamesfor nodesthatcanbeconstructedfrom thequeriesthemselves,(2)to createa DNS server hierarchyidenticalto theIDablenodehier-archy, and(3) to useDNSlookupsto determinetheIP addressesofthedesiredsites. RecallthateachIDablenodeis uniquelyidenti-fied by thesequenceof IDs on thepathfrom theroot to thenode.Thusour DNS-stylenamesaresimply the concatenationof theseIDs. For example,for thequeryin Figure2, its LCA nodeis NewYork. We constructtheDNS-stylename

city-new york.state-ny.usregion-ne.parking.ourdomain.net

performa DNS lookupto get theIP addressof theNew York site,androutethequerythere.

A key featureis thatno globalinformationis neededto producethis DNS-stylename: it is extracteddirectly from the query! Wehave a simpleparserthatprocessesthequerystringfrom its begin-ning, andaslong astheparserfindsa repeatedsequenceof /ele-mentname[@id=x], it prependsto theDNS name.TheDNS lookupmayneedseveralhopsto find theappropriateDNS entry, but thenthis entry is cachedin a DNS server nearto the query, so subse-quentlookupswill find the IP addressin the nearbyDNS server.Notethatno informationabouttheXML document(or its schema)is neededby theparser.

Sendinga subquery. Whenasitemanagerdeterminesthataqueryrequiresdatanot in its sitedatabase,thenby our invariants,it hasthe root-to-nodeID pathfor the IDable nodeit needsto contact.To seethis, observe thateachpieceof missingdatais in the localinformationof someIDablenode.ConsideronesuchIDablenode+ . By invariant (I1), this nodeis owned by a different site. Byinvariant(I2), regardlessof + ’sstatusvalue,wehaveits ID, andtheIDs of all its ancestors.Thuswe canproducetheDNS-stylenamefor any neededIDablenodesolelyfrom theinformationin thesitedatabase,and then perform the lookup to get the IP address.Akey featureof this designis that the mappingof IDable nodestoIP addressesis encapsulatedentirely in the DNS entries,andnotin any site databases.This makesit relatively easyto changethemappingasdesiredfor loadbalancingandotherpurposes.

3.5 Query-Evaluate-GatherWe now describeour query-evaluate-gathertechniquefor de-

tecting(1) which datain a local databasefragmentis part of thequery result, and (2) how to gatherthe missingparts. As notedabove,our invariantsguaranteethatthesitehasall theinformationrequiredto answeraquery(includingwhetherit is requiredto con-tactothersites).HandlingarbitraryXPATH queriesturnsout to be

5

quitechallengingthough,becauseof therichnessof thelanguage.AsM an example,considerthe following (only moderatelycom-

plex) query:

/usRegion[@id=’NE’]/state[@id=’NY’]/city[@id=’New

York’]

/neighborhood[@id=’Soho’]/block[@id=’1’]

/parkingSpace[not(price > ../parkingSpace/price)]

This query requeststhe leastpricey parking spot in a particularblockin Soho(XPATH 1.0doesnothaveamin operator).ConsiderascenariowheretheindividualparkingSpacesareownedby differ-ent sitesandmoreover, eachsiteonly storesthe local informationfor theparkingSpaceit owns (this is a permissibleconfiguration).Suchaconfigurationis problematicfor thisquery, becausenoneofthesiteshave sufficient informationto evaluatethepredicate.Thismotivatesthefollowing definition.

DEFINITION 3.3. Thenestingdepthof an XPATH queryis de-finedto be the maximumdepthat which a location path that tra-versesover IDablenodesoccurs in thequery.

Wewill illustratethisdefinitionthrougha few examples:

/a[@id=x]/b[@id=y]/c N nestingdepth= 0/a[@id=x]//c N nestingdepth= 0/a[./b/c]/b N nestingdepth= 1 (if O is IDable)or 0(otherwise)/a[count(./b/c) = 5]/b N nestingdepth= 1 (if O isIDable)or 0 (otherwise)/a[count(./b[./c[@id=1]])] N nestingdepth= 2 (ifP is IDable)or 1 (if P is not IDable,but O is) or 0 (oth-erwise)

Thecomplexity of evaluatinga queryincreaseswith thenestingdepthof the query. Querieswith nestingdepth0 are the easiestto solve, becausethepredicatescanalwaysbeevaluatedusingthelocal information (which is always presentat the site that ownsthe node). However, as the examplesin Section2 showed, eventhis caseis challenging,andtherewereno goodpreviously knownsolutions.

The basicQEG scheme.In theremainderof this section,we de-scribeourapproach,assumingnestingdepth0 (Extensionsto largernestingdepthsarediscussedin Section4.) BecauseXPATH is in-sufficiently powerful, weuseXSLT to querythedatabase,evaluatewhat is there,andsendsubqueriesto gather the missingpartsoftheanswer. Weshow how theXSLT programsusedby ourschemecanbegenerateddirectly from theXPATH query.

Given an XPATH query Q , let LOCAL -INFO-REQUIRED be thesetof elementnames(tags)suchthat the final answershouldin-cludetheentirelocal informationfor any IDablenodewith oneofthesetags,if thenodesatisfiesQ . As anexample,for thefollowingqueryon thedatabaseshown in Figure4,... /neighborhood[@id=’Soho’]/block

LOCAL -INFO-REQUIRED = R block, parkingSpace S Thequery

... /neighborhood[@id=’Soho’]/block/parkingSpace

on the other hand, only requireslocal information about park-ingSpace. Notethat,thisisconsistentwith thesemanticsof XPATH,becauseXPATH returnsentiresubtreesin thedocumentrootedatthenodesselectedby thequery.

Whena sitemanagerreceivesa query Q , it generatesanXSLTprogramfrom thatquerythatexecutesthefollowing algorithm:

1. Let PUT"V bethenodein thedocumentunderconsiderationatany time, '):XW$Y)Z\[ betheelementname(tag)of P�T"V , andlet ]bethesetof predicateson '9:XW Y)Z\[ in thequery.

2. Set PUT"V to betherootof thedocumentat thesite.

3. Dependingon the statusof PUT^V in the document,therearefour possibilities:

(a) status= incomplete: In this case,seeif ] canbe di-videdinto two predicates]`_ba and ]c[�d)e9f , suchthat ]`_bacontainsonly predicateson the id attribute, and ]hg]`_ba$ijij]`[kd9e)f . If this is possible,evaluate ]`_ba againstthe currentnode. If it evaluatesto true, then form asubqueryfor evaluatingthe restof the queryandnotethis by addinganasksubquerytagto theanswerbeingformed.A post-processingstepwill thensendthissub-queryto its LCA, in orderto gathermissingpartsof theanswer. If ]c_la evaluatesto false,it is alsonotedin theanswerbeingformed,sothatthepost-processorknowsthata subquerydoesnotneedto beasked.If the division of ] into two such predicatesis notstraightforward, we assumethat the nodemay be partof the answerand form a subqueryfor evaluatingtherestof thequeryasabove(i.e.,wewereunableto avoidthis subquery).

(b) status= id-complete: The actionsperformedin thiscasearesimilar to theabove case,exceptthatif '):XW Y)Z\[is not in LOCAL -INFO-REQUIRED and ]mgn]`_ba , thenwe can recurseon the childrennodeswithout havingto form any subquery. On the otherhand,if '9:XW Y9Z,[ isin LOCAL -INFO-REQUIRED, thenthelocal informationfor this nodeis requiredin theanswer, andassuch,wemustaskasubqueryto gatherthatinformation.

(c) status= owned: In this case,we have completein-formation to evaluatethe predicate] . If ] is satis-fied, then recurseon the IDable childrenof P�T"V , andalsocopy the local information into the answerbeingformeddependingonwhether'9:XW Y9Z,[ is in LOCAL -INFO-REQUIRED. Only localID informationneedstobecopiedif '9:XW Y9Z,[ is not in LOCAL -INFO-REQUIRED.

(d) status= complete:Theactionsperformedin this caseare similar to that for the above case,except for anypredicatesthat specify consistency requirements.Wewill discussthiscasein thenext section.

This XSLT programis compiledandthenexecutedon the sitedocument,with the resultbeingan annotateddocumentthat con-tainsasubsetof theanswerplusplaceholdersfor wheresubqueriesneedto beasked(if any). Whenthesubqueriesreturn,thereturnedanswersaresplicedin, replacingthe placeholders.Whenall thesubquerieshave returned,the resultingansweris returnedto thesitethatsentQ .

4. EXTENSIONSIn this section,we discussextensionsto our scheme,including

cacheconsistency issues,handlinglargernestingdepths,ownershipchanges,schemachanges,andspeedingupXSLT processing.

Query-basedconsistency. Due to delaysin the network andtheuseof cacheddata,answersreturnedto userswill not reflect theabsolutelymost recentdata. Instead,we provide a very flexiblemechanismin which each querymayspecifyits tolerancefor staledata. We storetimestampsalongwith the data,indicating whenthe datawascreated.An XPATH queryspecifyinga toleranceisautomaticallyroutedto thedataof appropriatefreshness.In partic-ular, eachquerywill take advantageof cacheddataonly if thedata

6

is sufficiently fresh. For example,a consistency predicatesuchas[timestampo > now - 30] in a query Q meansthat Q can be an-sweredusingany datawhosetimestampis within 30 secondsofthetime Q wasposed.

Althoughallowing usersto specifyadegreeof stalenessviolatesstrictdatatransparency, webelieveit providesaneasilyunderstand-ablemeansfor queriesthat tradeoff freshnessfor possiblylargeperformanceimprovements.For example,whena useris severalmiles from her destination,theParkingSpaceFinderservicemayfire off queriesthat tolerateminutes-oldavailability information.As theuserapproachesherdestination,theservicefiresoff queriesthatinsistuponthemostrecentdata.

Thefollowing changesto theXSLT programareneededto han-dle consistency predicatesin queries.If status= owned, thenweignoreconsistency predicates,becausethe owner of the datahasthe freshestcopy. Thus the semanticsof the above consistencypredicateallows for returning the freshestdataeven if that datais more than 30 secondsold. This ensuresthat usersget an an-swer. Alternatively, the systemcould returnan error message.Ifstatus= complete, andwe canseparateout theconsistency predi-cates]cY)p�q$e _ e9frd9q\Y)s from ] , thenwe first check ]c[�d)e9f — therestofthepredicatesin P. If thatevaluatesto false,thenthereis no needto checkfor theconsistency predicates.If thatevaluatesto trueand]tY)p�q$e _ e9f5d)q\Y9s evaluatesto false,thenwe adda asksubquerytag atthispoint to signalthepost-processor. As before,if ] Y?p�q\e _ e9f5d9q,Y9s isnot readilydividedout, we fall backto addinga asksubquerytag,in orderto querytheowner.

Lar ger nestingdepths. Themainchallengewith XPATH querieswith nestingdepthsgreaterthat 0 is that the query may specifypredicatesthat can not be evaluatedlocally (recall the examplequeryin Section3.5). Many suchqueriesarequite naturalin thekindsof applicationsweareinterestedin.

Therearetwo approachesto solvingsuchqueries.Thefirst ap-proachis to collectall thedataneededto evaluatethepredicateatonesite.Themainquestionhereis: whatdataneedsto fetchedandat which siteshouldthedatabecollected?Referringto our exam-plequeryfrom Section3.5,

.../block[@id=’1’]

/parkingSpace[not(price > ../parkingSpace/price)]

even thoughthe predicatewith nestingdepth1 is associatedwiththeparkingSpacetag,becauseof theupward referencein thepred-icate(“..”), thedataneededto checkthispredicateis theentiresub-treeundertheblock tag.

Currently, we solve sucha queryby analyzingthequeryto findout the earliesttag that is referredto in sucha nestedpredicatein the query. In this examplequery, this tag would be the blocktag. During queryexecution,whentheXSLT programencountersa nodewith this tag, it stopsthe executionat that point, issuesasubqueryto fetchall thedataunderthatblock (in this case,usingthequery.../block[@id=’1’]/), andproceedswith theexecutionwhentheanswerto thesubqueryreturns.Thisapproachguaranteesthatwhenever thepredicateis evaluated,all thedatathatit requiresis alwayspresentlocally.

Thisapproachmaynotbeoptimalfor certainqueries.For exam-ple, considera (frivolous)queryrequestingall availablespacesinall citiesthathave Sohoasa neighborhood:

... /city[./neighborhood[@id=’Soho’]]/ ...

Fetchingall thedatabelow thecity nodesatthecorrespondingsites(aswill bedoneby theabove approach)maybeanoverkill for thisquery. It wouldbepreferableto justevaluatethepredicatesdirectly,which canbedoneby firing off subqueriesof theform

boolean(... /city/neighborhood[@id=’Soho’])

We areplanningto implementandexperimentwith this approachin thefuture.

Ownership changes. For the purposesof load balancingor ac-commodatingarriving or departingsites,it is usefulto be abletodynamicallychangethe ownershipof a setof IDable nodes.Thetransitionmustappearsimultaneousto therestof thesystem.Thestepsto be doneto transferan IDablenodeare(1) the site takingownershipof an IDablenodefetchesa copy of the local informa-tion from theowner, (2) any sensorproxy reportingto thepreviousowner is asked to reportto thenew owner, and(3) thenew ownersetsthestatusfor thatnodeto ownedwhile theold ownersetsthestatusto complete. In addition,theDNSentryneedsto beupdatedto the IP addressof the new owner. Variousrelaxationsin simul-taneity are possible,due to the fact that fresh dataobsoletestheold data.TheDNS updatecanbedonelazily becausetheold sitecanreject(or possiblyforward)messagestargettingthetransferredIDablenode.

Schemachanges. Schemachangesthat do not affect the hierar-chy of IDable nodescanbe donelocally by the site managerthatowns the relevant fragmentof thedata. Suchschemachangesin-cludeaddingattributesto, or deletingattributesfrom thedata,andaddingor removing non-IDablenodes.Thismight leadto transientinconsistenciesas the site managerhasno way of knowing whoelsemight have cachedthatpart of thedata. But in a continouslychangingenvironmentsuchasours,we expect this inconsistencyto becorrectedquickly.

Someof theschemachangesthataffect thehierarchyof theID-ablenodescanbehandledsimilarly. For example,additionor dele-tion of IDablenodesis performedby thesitemanagerthatownstheparentof theaffectedIDablenode.More drasticschemachanges,suchasa restructuringof thehierarchy, mayhave to beperformedby collectingtheentiredocumentat onesite,makingthechanges,and redistributing the documentback amongstthe nodes. Onceagain,sucha changemight leadto transientinconsistenciesin thedata that, althoughundesirable,are permissiblefor the kinds ofapplicationswearelookingat. Morestringentconsistency guaran-teescanalsobeimplementedby maintainingauxiliary informationaboutthecopiesof thedataaswe will discussin Section6.

Speedingup XSLT processing. Recall that the QEG process-ing at a site involvescreatinganXSLT programfrom theXPATHquery, compilingtheXSLT program,andthenexecutingthecom-piledprogramagainstthedocument.As demonstratedin Section5,thereis significantoverheadin the compilationstep. We now de-scribeour techniquefor eliminatingmostof this overheadby di-rectly generatingmostlycompiledXSLT programs.

Whena site managerstartsup, oneof the first thingsit doesisto createandcompilean XSLT programusinga dummyXPATHquery. Oncethis is done,it identifiesthe partsof this compiledXSLT querythatdependontheXPATH query. Subsequently, whenthe site managerneedsto createthe XSLT programfor an actualXPATH query, it simply modifiesthis XSLT programdirectly tosetthequery-dependentinformation.Somecompilationis still re-quired,becausethequery-dependentstructuresarein XPATH andneedto berecompiled.But thecostis muchlower thanthecostofcompilingtheentireXSLT program.

5. PERFORMANCE STUDYIn thissection,wepresentapreliminaryperformancestudydemon-

stratingthefeaturesof oursystem,andtheflexibility andeffective-nessof our architecture. The salientpoints of our experimentalstudycanbesummarizedasfollows :

� Ourflexiblearchitectureallowingarbitrarylogical-to-physical

7

Queries u

SA Updates v

Parking Space

Block

Neighborhood

City

Queries u

SA Updates v

Neigh- borhood

City

(i) (ii)

Parking Space

Block

Queries w

SA Updates x

Neigh- borhood

City

DNS Server

Queries w

SA Updates x

DNS Server

City

County

(iii) (iv)

Block

Neighborhood

Parking Space

Block

Figure6: SensorDatabaseAr chitectures

Figure5: Webcamsmonitoring toy parking lots

mappingscan harvest the processingpower in the systemmoreeffectively thanany traditionalsolution.

� Cachingcanbe very effective in both reducinglatenciesofqueries,andin offloadingwork to increaseoverall through-put.

� Compiling XSLT queriesdirectly leadsto hugesavings inqueryprocessingtime.

5.1 Experimental setupFor mostof our experiments,we usea homogenousclusterof 9

2GHz PentiumIV machinesrunningRedhatLinux 7.3 connectedby alocalareanetwork. In ourcurrentprototype,wehave10sensorproxiesthateachhaveanassociatedsensor(webcam)thatmonitorsa toy parking lot (Figure 5). For theselarger-scaleexperiments,we simulateas many sensorproxiesas requiredby running fakesensorproxiesthatproducerandomdataupdates.As our backend,we usetheApacheXindice 1.0 [1] native XML database.XindicecurrentlydoesnotsupportXSLT processing(thoughit is aplannedfeature).Hence,in our currentprototype,we usetheXalanXSLTprocessor[2] for that purpose(Xalan is alsousedby Xindice forprocessingXPath).

We useanartificially generateddatabasefor our parking space

finderapplicationconsistingof a total of 2400parkingspacesus-ing ahierarchysimilar to theoneshown in Figure1. Thisdatabasemodelsa smallpartof a nationwidedatabaseandcontains2 cities,3 neighborhoodspercity, 20blocksperneighborhood,and20park-ingspacesperblock. Weenvisionthatthequeriesin suchadatabasewill typically askfor availableparkingspacesgeographicallycloseto a particularlocation. As describedin Section3.4, the queriesareinitially routedto the site managerthatowns the lowestcom-monancestorof thedata,andassuch,we distinguishbetweenthequeriesin ourworkloadbasedonthelevel in thishierarchytowhichthey arefirst routed.

� Type 1: Thesequeriesaskfor datafrom oneblock,specify-ing theexactpathto theblock from theroot.

� Type 2: Thesequeriesaskfor datafrom two blocksfrom asingleneighborhood.

� Type 3: Thesequeriesask for datafrom two blocks fromtwo differentneighborhoods.(Sucha querymay be askedby a userif her destinationis nearthe boundaryof the twoneighborhoods.)

� Type 4: Thesequeriesask for datafrom two blocks fromtwo different cities. (The destinationis nearthe boundarybetweentwo cities.)

We expectthattype3 andtype4 querieswill berelatively uncom-mon,andmostof thequerieswill beof type1 or 2. Hence,wewillalsoshow resultsfor mixedworkloadsthatincludemorequeriesofthefirst two types.

5.2 Handling sensorupdatesAn updatefrom a sensingagentis typically routeddirectly to

thesite managerthatowns the relevant nodein thehierarchy. Onthesitemanagerside,processingasensorupdateinvolvesupdatingthe documentat the site with the new availability information,aswell astimestampingthe data. A singlesite manageris typicallyableto handle200updatesa secondin our currentprototype.Thetotal numberof updatesthat canbe handledby the systemscaleslinearlywith thenumberof sitemanagersamongwhich thedataisdistributed.

5.3 Ar chitectural comparisonWith our first setof experiments,we demonstratetheflexibility

of our arhcitecturein harvestingtheprocessingpower availableinthesystem.We considerfour differentarchitectureseachof which

8

QW-1y QW-2y QW-3y QW-4y QW-Mix

Query Workloadz0

20

40

60

80

100A

vera

ge T

hrou

ghpu

t (qu

erie

s/se

c)Architecture 1{Architecture 2{Architecture 3{Architecture 4{

Figure7: Query Thr oughputsfor Various Ar chitectures

is a viablealternative for our application.We show resultsfor fivequeryworkloads,QW-1, QW-2, QW-3, QW-4, consistingof ran-domlygeneratedqueriesof types1 to 4 respectively, andQW-Mix,amixedqueryworkloadthatasks40%queriesof type1 and2 each,15%queriesof type3 and5% queriesof type4.

� Centralized – Figure6(i): In thisarchitecture,all thedataislocatedatacentralserver, anddataupdatesaswell asqueriesaresentto a centralserver. Suchan architecturecannot beexpectedto scalevery well asit canhandlevery few sensorupdates(200updatespersecond).

� Centralized querying, distrib utedupdate– Figure6(ii): Inthisscenario,weoffloadthesensorupdatestoothermachinesby distributing the blocks amongthe rest of the machines.The queriesarestill sentto the centralizedserver, becausetheserver is thesolerepositoryfor themappingfrom blocksto physicalmachines. This scenariois intendedto simu-lateasimpledistributedobject-relationaldatabase,wheretheblocksform anobject-relationaltablethat is distributedandthe hierarchyis maintainedasanothersetof tablesthat areall storedat thecentralserver. Of course,this is not theonlypossibledesignusinganobject-relationaldatabase,but mostsuchdesignswill suffer from similar flaws as this design.(Object-relationalsensordatabasesarediscussedfurther inSection6.)

� Distrib uted querying, distrib uted update, fixed two-levelorganization – Figure6(iii): This scenariois similar to theabove scenario,exceptthat we usethe DNS server to storethemappingfrom blocksto physicalmachines.Thishelpsinsolvingthequeriesof type1 significantly, but doesnot helpmuchwith otherkindsof queries.

� Distrib utedquerying, distrib utedupdate,hierarchicalor-ganization– Figure6(iv): A morelogicalorganizationof thedata,consideringthe natureof the queryworkloads,wouldbe to arrangeit hierarchicallyin a geographicfashion. Wedo this by assigningthe6 neighborhoodsto 6 differentsite,assigningthe2 citiesto two differentsites,andassigningtherestof thehierarchyto onesite.This correspondsto thesce-narioof choicein IrisNet.

Note thatall architecturesusethesamenumberof sensorproxies,andthelatterthreearchitecturesusethesamenumberof sites.

Figure 7 shows the query throughputsfor thesefour architec-turesfor thefive queryworkloads.As we cansee,thecentralizedsolutiondoesnot scalevery well for queryingeither, andcanhan-dle very few queriesefficiently. Althoughdistributingonly theup-dates(Architecture2) increasesthenumberof sensordataupdatesthesystemis ableto handle,it only improvesquerythroughputbya factorof 2 over the centralizedsolution,becauseall queriesgothroughthe centralizedserver. Using DNS for self-startingdis-tributed queries(Architecture3) shows the effectivenessof thistechnique,asthethroughputfor type1 queriesincreasesby nearlyanorderof magnitude.However, all otherqueriesstill go throughthe centralserver, which is the bottleneckfor the other typesofqueries,andalsofor themixedworkload. Thehierarchicaldistri-butionof data(Architecture4) turnsout to performthebestoverallas it can handleall kinds of queriesquite well. It doesperform25% worsethan Architecture3 for type 1 queries,becauseit isusing25%fewer machinesin processingthesequeries.However,it performsat least66% betterthanthe otherarchitectureson themixedworkload.

Becausethe mappingof logical nodesin the hierarchyto thephysicalsitesis not fixed in our architecture,our systemis ableto handlethe skewed queryworkloadsarising in our applicationmuchmoreeffectively thanotherarchitectures.For example,dur-ing businesshours,a largepercentageof thequeriesmaybeaskingfor informationfor blocksin theDowntownneighborhood.Suchaskewedqueryworkloadcanbemuchbetterhandledby redistribut-ing thoseblocksacrossall availablemachines,asopposedto thembeingonasinglenodeasin theabovemapping.Figure8 showstheresultsof a simpleexperimentwherewe skewed the querywork-loadto consistof 90%queriestargettinga singleneighborhoodfortype1 andtype2 queries.As we cansee,theoriginal distributionof thedata(Architecture4) doesnot performvery well, whereasamorebalancedarchitecturethatdistributestheblocksin thatneigh-borhoodacrossall siteshasa factorof 4 higherthroughputfor thisworkload.

5.4 Dynamic load balancingAs discussedin the earlier section,our systemis capableof

changingthemappingof logicalnodesin thehierarchyto thephys-ical sitesdynamicallywhile still beingableto answerqueries.

Weshow theeffectivenessof dynamicloadbalancingthroughanexperimentthat tracedthe averagethroughputof the systemovertime. For this experiment,we startedmultiple queryingclientsall askingqueriesof type 1 with 90% of the queriesdirectedtoafixedneighborhood| , and10%of thequeriesaskingfor ablockin a randomlychosenneighborhood.Figure9 shows the averagethroughputof thesystemover time. As we cansee,initially whenall the blocksin neighborhood| arelocatedon a singlesite, theaveragethroughputof the systemwasquite low. At 206 secondsinto theexperiment(thefirst dashedline), we startedredistributingthedataon thatsiteto othersitesby explicitly askingthesiteman-agerto delegateits blocksto othernodesoneby one.In ourcurrentprototype,this hasto bedoneby sendinga requestfor delegatingownershipof eachblockoneat a time. Theserequestsweresentateven intervals until 373seconds(theseconddashedline). At thattime, theblocksunderneighborhood| weredistributedacrossallthe machinesevenly. As we can see,the averagethroughputofthe systemincreasedby nearlya factorof 3 even with this crudeload-balancingscheme,while the systemwasstill ableto answerqueries.

5.5 CachingCachingof queryresultshastwo benefits:it canbe usedto re-

9

QW-1} QW-2} QW-Mix2~Query Workloads�

0

20

40

60

80

100

Ave

rage

Thr

ough

put

Original Distribution�Balanced Distribution

Figure 8: Load Balancing (QW-Mix2consistsof 50% type 1 and 50% type2 queries)

0�

200�

400�

600�

Time (secs)

0

50

100

150

200

Num

ber

of q

uerie

s fin

ishe

d in

pre

cedi

ng 5

sec

��� �(� � �2�(�

Figure9: Dynamic Load Balancing

QW-1� QW-2� QW-3� QW-4� QW-Mix

Query Workload�0

20

40

60

80

100

Ave

rage

Thr

ough

put (

quer

ies/

sec)

No Caching�Caching with no hits�Caching with 50% hits�Caching with 100% hits�

Figure10: CachingThr oughputs(Ar chitecture4)

duceresponsetimesby bringing thedatacloseto thequeries,andit canbeusedto off-loadwork from thesitesthatown populardatato lessloadedsites(auto-tuning). Figure10 shows how cachingcanhelpin increasingoverall throughputof thesystemby offload-ing work. We usethefourth architectureandshow resultswithoutcaching,with cachingbut no hits (this demonstratestheoverheadsof caching),with cachingand50% hits, andfinally, with cachingand100%hits. As we cansee,cachinginducesminimal overheadin the system. Cachingdoesnot affect the throughputsfor type1 andtype 2 in this scenariobecausethesequeriesarealwaysdi-rectedto themachinethathasthefull data.On theotherhand,fortype3 andtype4 queries,we seethatastheprobabilityof hits in-creases,theoverall throughputof thesystemreducessignificantly.This happensbecauseafter the initial few queries,all the queriesare completelyansweredby the top-level sites(sitesthat are as-signedto thecity andcountynodes),andthesenodesbecomethebottleneck.As wewill seein Section5.6,thetimetakento forwarda query to anothernodeis much lessthanthe time taken to pro-cessthequerywhentheansweris presentat a node.This suggeststheneedfor bypassingthecacheunderheavy loadimbalance.Ontheotherhand,cachingimprovesthroughputby up to 33%for themorerealisticmixedworkload,becausetheotherwiseidle top-level

QW-1� QW-2� QW-3� QW-4� QW-Mix

Query Workload�0

100

200

300

400

500

Ave

rage

Lat

ency

(m

s)�

No Caching�Caching with no hits�Caching with 50% hits�Caching with 100% hits�

Figure11: Caching Latencies(Ar chitecture4)

sitescanabsorbsomeof theloadfrom thelower-level sites.Theeffectsof cachingaremorepronouncedin termsof latency.

Unfortunately, becausewe are not using a wide areanetwork inour experiments,it is hardto seetheseeffectsin our experiments.Figure11shows theaveragelatenciesfor thefive workloadsunderdifferentcachingscenariosasdiscussedabove. Even in the caseof near-zeronetwork latencies,we canseethatquerylatenciesarereducedby 10–33%for type3 andtype4 queries,andfor themixedworkload.

5.6 Micr o-benchmarksTo understandhow the time to answera query is distributed

amongstvarioustasks,weransomemicro-benchmarksagainstoursystem.Figure12 shows thebreakdown of queryprocessingtimedependingon at which level in thehierarchythequerywasasked.Thequeryusedin thisexperimentwasaqueryof type1 askingforoneparticularblock. Eventhoughthis querywill alwaysberoutedto the site that owns the neighborhood,we artificially routedthisqueryto thesiteshigherup in thehierarchyto seetheeffect of thenumberof hopstaken by the query. Threesettingswerestudied:smalldatabasewith naive XSLT creation,smalldatabasewith fastXSLT creation,andlargedatabasewith fastXSLT creation.

As we cansee,for all of thescenarios,thetotal processingtime

10

Small Database, Naive� Small Database, Fast� Large Database, Fast�0

100

200

300

400

500A

vera

ge ti

me

(mse

c)Rest CommunicationExecuting the XSLT QueryCreating the XSLT query

� � �

� � � � � �� � � �

� � � � � � � �� � � � �� � � � � � � � � �

XSLT Creation XSLT Creation XSLT Creation

Figure 12: Micr obenchmarks: when the query is routedto the site managerthat owns (i) the county node,(ii) thecity node,(iii) the neighborhoodnode

consumedby the query is reducedsignificantly (by over 50%) ifthequeryis routeddirectly to thesitethathasthedata,onceagaindemonstratingtheeffectivenessof self-startingdistributedqueries.This experimentalsoshows why we choseto optimizethe XSLTquerycreationtime. As we cansee,if the XSLT query is gener-atedandcompiledusingtraditionalinterfaces,thenthe time takenfor this completelydominatesthe overall queryprocessingtime.Usingdirectcompilationto XSLT from theoriginal XPATH queryreducestheoverall queryprocessingtime by over 50%!

To seehow ourqueryprocessingmechanismscaleswith respectto thedatabasesize,weincreasedthetotalsizeof thedatabaseby afactorof 8 by doublingthenumberof neighborhoods,thenumberof blocksin a neighborhood,andthenumberof parkingspacesina block. As we cansee,theprocessingtime increasedby lessthan20%at eachof thenodes!

Thesemicro-benchmarksalsoshow wherethebottlenecksin ourcurrentprototypeare. As we cansee,mostof the time is spentinexecutingtheXSLT queryandduringCPUprocessingfor commu-nication. The communicationpart also includesthe costof con-structinganddeconstructingthe messages.We believe that muchof this is becausewe are using Java 1.3; using JIT (just-in-timecompilation)andnewer XSLT processingpackagesshouldsignifi-cantlyreducethis time.

6. RELATED WORKPreviouswork in sensordatabaseshasfocusedprimarily on net-

works of closelyco-locatedsensorswith limited resources[5, 26,27]. Thesensorqueryprocessingsystemoperatesdirectly on con-tinuous,never-endingstreamsof sensordatathat arepushedintothequeryplansfor continuousqueries.Thestreamof dataisviewedasastreamingrelation(e.g.,in Fjords[26]) or atimeseries(e.g.,inCougar[5]). Theseefforts have developedtechniquesfor answer-ing continuousqueriesoverstreamingdataandfor performingcer-tain queryprocessingtasksin thesensornetwork itself in ordertoeliminatecommunicationandextendsensorbatterylifetimes.

This papercomplementsthis previous work by addressingfun-damentalchallengesin distributedqueryprocessingoverwideareasensordatabases.Basedon theapplicationswe wereconsidering,wesoughtto provideamorefamiliarabstractionof thedatabaseasthe collectionof valuescorrespondingto the mostrecentupdates(e.g.,thecurrentlyavailableparkingspaces).Evenin thismoretra-ditional setting,therewereplentyof challengesto overcome.Thedistributeddatabaseinfrastructurein IrisNet sharesmuchin com-monwith a varietyof large-scaledistributeddatabases.For exam-ple,DNS [28] reliesona distributeddatabasethatusesa hierarchybasedon the structureof host names,in order to supportname-

to-addressmapping.LDAP [35] addressessomeof DNS’s limita-tionsby enablingricherstandardizednamingusinghierarchically-organizedvaluesandattributes. However, a key differenceis thattheseefforts targeta very narrow setof lookupqueries(with verylimitedpredicates),nottherichnessof aquerylanguagelikeXPATH.

Abiteboul et al. [19] presenttechniquesfor materializingandmaintainingviews over semistructureddata. Answeringqueriesfrom views is a hardproblemin general.Our cachinginfrastruc-turediffers significantlyfrom this work in thatwe do not storeorusethe queriesthat resultedin the cachingof the data,only thedataitself. Our approachis moredata-drivenin nature.We gener-alizesubqueries,tagthedata,andmaintaincertaininvariants,all tomakeourpartial-matchcachingschemetractable.Therehasbeenalot of work oncachingin distributeddatabases,andobject-orienteddatabases.Franklin andCarey [22] presenttechniquesfor client-server cachingin object-orienteddatabases.Variouswork [13, 12,30, 3, 25, 4, 6, 29] discussesissuesof datareplicationandrepli-catemanagementin distributeddatabases.Much of this work hasfocusedon maintainingreplicasconsistentwith the original data,with variouslevelsof consistency guarantees.In ourwork, wetaketheapproachof not providing any strict guaranteesof consistency.We believe that for the kinds of applicationsfor which wide areasensordatabaseswill be used,suchstrict guaranteesare not re-quired. Dependinguponthe requirementsof an application,it iscertainlypossibleto provide suchguaranteesby maintainingaux-iliary informationaboutthereplicasof thedatain thesystem.Ourapproachis more akin to DNS, in that it is basedon specifyingconsistency requirementswith thequeries,andusinga time-to-live(ttl) field associatedwith the cachedcopiesin orderto determinestaleness.

Recently, therehasbeenalot of interestin streamingdatasourcesandcontinuousqueryprocessing[9, 20, 18]. This work is relatedto oursin that the sensorproxiescanbe thoughtof asproducingdatastreams,andcontinuousqueriesover suchstreamsarequitenaturalin sucha system.To our knowledge,mostof this work fo-cusesonacentralizedsystem,whereasdistributeddatastorageandqueryprocessingis oneof our maindesigngoals.As a result,thequeryprocessingchallengeswe facein oursystemarequitediffer-ent.Also, thesesystemsassumethatthestreamingdatasourcesarerelationalin nature,whereaswe useXML asourdatamodel.

Although we proposea hierarchical,native XML storageap-proachto wide areasensordatabases,an alternative would be tousea distributedobject-relationaldatabase[32] to storethe leavesof the XML document(asdiscussedin Section5). In our park-ing spacefinder application,thesewould correspondto eithertheblocks or the parking spaces.The hierarchyinformation can bemaintainedeitherat a centralserver or alongwith theleavesthem-selves. This approachhasseveralcritical disadvantages.First, thehierarchyinformationbecomesa bottleneckresource,asdemon-stratedin our performancestudy. Approachesto avoid this bottle-neckwould likely entail mimicking muchof our hierarchicalap-proach,and hencewould benefit from the techniquespresentedin this paper. Second,the richnessof XML allows transparentschemachanges,andthe useof highly expressive languagessuchasXPATH, XSLT or XQuery. Many queriesthat canbenaturallydescribedusingtheselanguagesarenot easilyexpressiblein SQL.Third, useof sucha databaseseriouslyrestrictshow datacanbepartitionedamongavailablesites,limiting opportunitiesfor load-balancing.Our architecturealsoenablespowerful cachingseman-tics naturally;we arenot awareof any work on cachingin object-relationaldatabasesthat is equally aspowerful. Much work hasalsobeendoneon storingXML usingobject-relationaldatabases,andpublishingobject-relationaldataasXML [15,24,33,31]. This

11

work is orthogonalto theissueswe discusshere,asthechallengesin our� query processingcomemainly from the single documentview of thedata,andthedistributednatureof oursystem.

Recentwork on peer-to-peerdatabases[14, 7, 21, 16] is quitecloselyrelatedto our work. Althoughour datais organizedhierar-chically, andfor performancereasons,we expecttheparticipatingsitesto alsohave a hierarchicalorganization,this is not requiredby our architecture.As such,theparticipatingsitescanbethoughtof as peerscooperatingwith eachother to storeand query data.In [14], distributedhashtables(DHTs) performtheanalogousroleof theDNS server in our architecturein thatbothof themareusedto find relevant datasatisfyinga query. DNS is moreattractive inour scenariobecauseof the hierarchicalnatureof our data. Ourwork differsconsiderablyin theactualqueryprocessingpart itselfbecauseof our useof XML andtheXPATH querylanguage. [21,16] discussissuesof dataplacement,andcachingin peer-to-peernetworks. TheOLAP cachingframework presentedin [16] relatesquitecloselyto ourcachingframework, but handlesdifferentkindsof dataandqueries.

Thereis a largebodyof literatureon loadbalancingtechniquesfor parallelanddistributedsystems(e.g., [17, 8, 23, 11, 34, 10]).Our currentsystemprovidesa naturalmechanismfor performingloadbalancing,but we have not yet determinedeffective loadbal-ancingpoliciesfor oursetting.

7. CONCLUSIONSIn thispaper, wemotivatedtheview of awideareasensordatabase

asadistributedhierarchicaldatabasewith timestampedupdatesar-riving at the leaves. We showed theadvantagesof usingXML asa datarepresentation,constructinga logical site hierarchymatch-ing the XML documenthierarchy, mappingthis logical hierarchyonto a smallerhierarchyof physicalsites,andproviding for flex-ible partitioningandcachingthat adaptsto queryworkloads. Wedescribedthe many challengesin providing efficient and correctXPATH query processingin suchan environment,and proposednovel solutionsto addressthesechallengesin an effective, flexi-ble, unified,andscalablemanner. New techniqueswerepresentedfor self-startingdistributedqueries,query-evaluate-gather, partial-matchcaching,andquery-basedconsistency. Experimentalresultson our IrisNet prototypedemonstratedthesignificantperformanceadvantagesof ourapproachevenfor asmallcollectionof sites.Weanticipatethat theseadvantageswill only increasewhenIrisNet isdeployedover hundredsof sitesandthousandsof miles.

8. REFERENCES[1] ApacheXindiceDatabase.http://www.dbxml.org.[2] Xalan-Java.http://xml.apache.org/xalan-j.[3] D. Agrawal andS.Sengupta.Modularsynchronizationin

distributed,multi-versiondatabases:Versioncontrolandconcurrency control.IEEE TKDE, 1993.

[4] R. Alonso,D. Barbara,andH. Garcia-Molina.Datacachingissuesin aninformationretrieval system.ACM TODS, 1990.

[5] P. Bonnet,J.E. Gehrke,andP. Seshadri.Towardssensordatabasesystems.In MDM, 2001.

[6] S.W. ChenandC. Pu.A structuralclassificationofintegratedreplicacontrolmechanisms.ACM TODS, 1992.

[7] A. CrespoandH. Garcia-Molina.Routingindicesforpeer-to-peersystems.In ICDCS, 2002.

[8] A. Fox et al. Cluster-basedscalablenetwork services.InSOSP, 1997.

[9] D. Carney etal. Monitoringstreams- A new classof datamanagementapplications.In VLDB, 2002.

[10] D. Fergusonetal. An economyfor managingreplicateddatain autonomousdecentralizedsystems.In ISADS, 1993.

[11] D. R. Kargeret al. Consistenthashingandrandomtrees:Distributedcachingprotocolsfor relieving hot spotson theworld wideweb. In ACM STOC, 1997.

[12] J.Grayet al. Thedangersof replicationanda solution.InSIGMOD, 1996.

[13] J.Sidell et al. Datareplicationin mariposa.In ICDE, 1996.[14] M. Harrenetal. Complex queriesin dht-basedpeer-to-peer

networks.In IPTPS, 2001.[15] M. J.Carey et al. XPERANTO: Publishingobject-relational

dataasXML. In WebDB, 2000.[16] P. Kalniset al. An adaptive peer-to-peernetwork for

distributedcachingof olapresults.In SIGMOD, 2002.[17] R. Blumofeet al. Cilk: An efficient multithreadedruntime

system.In PPoPP, 1995.[18] R. Motwani et al. Queryprocessing,approximation,and

resourcemanagementin a datastreammanagementsystem.In CIDR, 2003.

[19] S.Abiteboulet al. Incrementalmaintenancefor materializedviews over semistructureddata.In VLDB, 1998.

[20] S.Chandrasekaranetal. Telegraphcq:Continuousdataflowprocessingfor anuncertainworld. In CIDR, 2003.

[21] S.Gribbleetal. Whatcandatabasesdo for peer-to-peer. InWebDB, 2001.

[22] M. FranklinandM. Carey. Client-servercachingrevisited.InIWDOM, 1992.

[23] G. Graefe.Queryevaluationtechniquesfor largedatabases.ACM ComputingSurveys, 1993.

[24] M. Klettke andH. Meyer. XML andobject-relationaldatabasesystems— enhancingstructuralmappingsbasedonstatistics.LNCS, 1997:151,2001.

[25] N. KrishnakumarandA. Bernstein.Boundedignoranceinreplicatedsystems.In PODS, 1991.

[26] S.MaddenandM. J.Franklin.Fjordingthestream:Anarchitecturefor queriesover streamingsensordata.In ICDE,2002.

[27] S.Madden,M. J.Franklin,J.M. Hellerstein,andW. Hong.Tag:A tiny aggregationservicefor adhocsensornetworks.In OSDI, 2002.

[28] P. V. MockapetrisandK. J.Dunlap.Developmentof theDomainNameSystem.In SIGCOMM, 1988.

[29] C. OlstonandJ.Widom.Best-effort cachesynchronizationwith sourcecooperation.In SIGMOD, 2002.

[30] C. PuandA. Leff. Replicacontrolin distributedsystem:Anasynchronousapproach.In SIGMOD, 1991.

[31] T. Shimura,M. Yoshikawa,andS.Uemura.Storageandretrieval of XML documentsusingobject-relationaldatabases.In DatabaseandExpertSystemsApplications,pages206–217,1999.

[32] M. Stonebraker andG. Kemnitz.Postgresnext generationdatabasemanagementsystem.CACM, 1991.

[33] B. Surjanto,N. Ritter, andH. Loeser. XML contentmanagementbasedon object-relationaldatabasetechnology.In WebInfo. Sys.Eng., pages70–79,2000.

[34] B. W. Wah.File placementon distributedcomputersystems.IEEEComputer, 1984.

[35] M. Wahl,T. Howes,andS.Kille. LightweightDirectoryAccessProtocol(v3). Technicalreport,IETF, 1997.RFC2251.

12