Tools and methods for evaluation of overlay networks...Proceedings of the 4:th Scandinavian Workshop...

IT Licentiate theses2007-004

Tools and methods for evaluation ofoverlay networks

OLOF RENSFELT

UPPSALA UNIVERSITYDepartment of Information Technology


BY

OLOF RENSFELT

September 2007

DIVISION OF COMPUTER SYSTEMS

DEPARTMENT OF INFORMATION TECHNOLOGY

UPPSALA UNIVERSITY

UPPSALA

SWEDEN

Dissertation for the degree of Licentiate of Philosophy in Computer Scienceat Uppsala University 2007


Olof [email protected]

Division of Computer SystemsDepartment of Information Technology

Uppsala UniversityBox 337

SE-751 05 UppsalaSweden

http://www.it.uu.se/

c© Olof Rensfelt 2007ISSN 1404-5117

Printed by the Department of Information Technology, Uppsala University, Sweden

Abstract

Overlay networks is a popular method to deploy new functionality whichdoes not currently exist in the Internet. Such networks often use the peer-to-peer principle where users are both servers as well as clients at the sametime. We evaluate how overlay networks performs in a mix of strong andweak peers. The overlay system of study in this thesis is Bamboo, which isbased on a distributed hash table (DHT).

For the performance evaluation we use both simulations in NS-2 andemulations in the testbed PlanetLab. One of our contributions is a NS-2implementation of the Bamboo DHT. To simulate nodes joining and leaving,NS-2 is modified to be aware of the identity of overlay nodes.

To control experiments on PlanetLab we designed Vendetta. Vendetta isboth a tool to visualize network events and a tool to control the individualpeer-to-peer nodes on the physical machines. PlanetLab does not supportbandwidth limitations which is needed to emulate weak nodes. Therefore wedesigned a lightweight connectivity tool called Dtour.

Both the NS-2 and PlanetLab experiments indicate that a system likeBamboo can handle as much as 50 % weak nodes and still serve requests.Although, the lookup latency and the number of successful lookups sufferwith the increased network dynamics.

Acknowledgments

I would like to thank my supervisor Lars-Ake Larzon. He has always beenvery helpful and supportive and makes it fun to go to work. I would alsolike to thank my secondary supervisor Per Gunningberg, not only for all thehelp with the thesis but also for creating such a creative work environmentin the communication research group (CoRe).

I have very much enjoyed working with Sven Westergren and without himthis thesis would have been very different. His PlanetLab skills have beeninvaluable and I am very grateful to him. I would also like to thank PeterDrugge for all his work on Vendetta and Magnus Rundlof for implementingDtour. It has been great fun working with all of them.

I would also like to thank Arnold Pears for his feedback on the thesis aswell as the other CoRe group members. They have helped much throughdiscussions and feedback. So I would like to thank Christian Rohner, ErikNordstrom, Oskar Wibling, Laura Feeney, Thabotharan Kathiravelu, IoanaRodhe, and Fredrik Bjurefors.

Past members who I would also like to thank for helping me when I wasa new student are Henrik Lundgren and Richard Gold.

2

Included papers

Paper A: A bandwidth study of a DHT in a heterogeneous environ-mentOlof Rensfelt and Lars-Ake LarzonUppsala University Technical report no: 2007-017

Paper B: Vendetta - A Tool for Flexible Monitoring and Manage-ment of Distributed TestbedsOlof Rensfelt, Lars-Ake Larzon and Sven WestergrenIn the proceeding of TridentCom 2007, May , Orlandoc© 2007 IEEE. Reprinted, with permission

Paper C: Evaluating a DHT in a heterogeneous environmentOlof Rensfelt, Sven Westergren and Lars-Ake LarzonSubmitted for publication

Comments on my participation

Paper A: I implemented the overlay system in NS-2 and performed the sim-ulations. I also analyzed the data and was the main author of thereport.

Paper B: I participated in the design process and implemented the C-clientof Vendetta. Vendetta was implemented as a Master thesis which Isupervised.

Paper C: I had a big role in the design of doing connectivity modeling ina Pre-loaded library and I implemented the generic filter support. Iworked both on the simulations and the PlanetLab experiments and Iwas the main author of the paper.

3

List of work not included in the thesis

A LUNAR over BluetoothOlof Rensfelt, Richard Gold and Lars-Ake LarzonProceedings of the 4:th Scandinavian Workshop on Wireless Ad-HocNetworks 2004, May, Johannesberg

B LUNAR - A Lightweight Underlay Network Ad-hoc RoutingProtocol and ImplementationChristian Tschudin, Richard Gold, Olof Rensfelt and Oskar WiblingNext Generation Teletraffic and Wired/Wireless Advanced Networking(NEW2AN’04) 2004, February , St.Petersburg

C Addressing heterogeneity in Peer-to-Peer networksOlof Rensfelt and Lars-Ake LarzonposterProceedings of the Swedish National Computer Networking Workshop2004, November, Karlstad

D NoteNetSven Westergren, Peter Drugge, Olof Rensfelt and Lars-Ake LarzondemonstrationMobisys 2006, June , Uppsala

E A bandwidth study of a DHT in a heterogeneous environmentOlof Rensfelt and Lars-Ake LarzonposterSwedish National Computer Networking Workshop 2006, October, Lulea

F Dtour - An Approach to Reproducibility on PlanetLabOlof Rensfelt, Lars-Ake Larzon and Sven WestergrenposterACM SigComm 2007, August, Kyoto

4

Contents

1 Thesis introduction 91.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Overlay networks . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.1 Peer to peer networks . . . . . . . . . . . . . . . . . . . 121.2.2 Unstructured overlay networks . . . . . . . . . . . . . . 121.2.3 Structured overlay networks . . . . . . . . . . . . . . . 131.2.4 Distributed Hash Tables . . . . . . . . . . . . . . . . . 131.2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Evaluation methods . . . . . . . . . . . . . . . . . . . . . . . . 181.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 181.3.2 Network emulation . . . . . . . . . . . . . . . . . . . . 191.3.3 Real experiments . . . . . . . . . . . . . . . . . . . . . 19

1.4 Summary of papers . . . . . . . . . . . . . . . . . . . . . . . . 211.5 Ongoing work . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . 221.5.2 Investigating network instability . . . . . . . . . . . . . 22

1.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . 24Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Paper A: A bandwidth study of a DHT in a heterogeneousenvironment 312.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2 Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3 Distributed Hash Tables . . . . . . . . . . . . . . . . . . . . . 34

2.3.1 Bamboo . . . . . . . . . . . . . . . . . . . . . . . . . . 342.3.2 Management traffic . . . . . . . . . . . . . . . . . . . . 36

2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4.1 NS-2 specifics . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Packet handler . . . . . . . . . . . . . . . . . . . . . . 402.4.3 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.4.4 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5

2.4.5 Data storing . . . . . . . . . . . . . . . . . . . . . . . . 422.4.6 Other differences to Bamboo . . . . . . . . . . . . . . . 43

2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.5.1 Physical network layout . . . . . . . . . . . . . . . . . 432.5.2 Overlay network layout . . . . . . . . . . . . . . . . . . 442.5.3 Stabilization time . . . . . . . . . . . . . . . . . . . . . 452.5.4 Measurements . . . . . . . . . . . . . . . . . . . . . . . 452.5.5 Simulation specifics . . . . . . . . . . . . . . . . . . . . 47

2.6 Network variables . . . . . . . . . . . . . . . . . . . . . . . . . 472.6.1 Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.6.2 Node capacities . . . . . . . . . . . . . . . . . . . . . . 502.6.3 Churn rate . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Paper B: Vendetta - A Tool for Flexible Monitoring andManagement of Distributed Testbeds 573.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2 Vendetta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.1 Vendetta and PlanetLab . . . . . . . . . . . . . . . . . 633.3 Case: A DHT testbed . . . . . . . . . . . . . . . . . . . . . . 64

3.3.1 DHT canvas . . . . . . . . . . . . . . . . . . . . . . . . 663.3.2 Monitor configuration . . . . . . . . . . . . . . . . . . 663.3.3 Working with the monitor . . . . . . . . . . . . . . . . 703.3.4 Node client configuration . . . . . . . . . . . . . . . . . 723.3.5 Using Vendetta with the DHT testbed . . . . . . . . . 72

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . 73Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Paper C: Evaluating a DHT in a heterogeneous environment 774.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.3 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . 814.3.2 PlanetLab setup . . . . . . . . . . . . . . . . . . . . . 82

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.4.1 Comparing simulations to PlanetLab measurements . . 854.4.2 PlanetLab results . . . . . . . . . . . . . . . . . . . . . 85

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7

Chapter 1

Thesis introduction

1.1 Introduction

As new networking technologies constantly evolve, we need to gain new un-derstanding of how to evaluate them. A dominating trend in the Internet isthat a wider spectra of devices are connected. The change from a networkenvironment consisting of mainly desktop machines to a network with mobileusers, cellphones, and new phenomena like peer to peer networks stretchesthe capabilities of the current design of the Internet. Weak Internet nodesbecome more common, strong nodes become stronger; and network hetero-geneity increases as access technologies range from low-bandwidth wirelessnetworks to GigaBit Ethernet.

It is not only the nodes that are changing but also how nodes communi-cate. New communication models appear for nodes being both clients androuters in ad hoc networks, intermittent connectivity in delay tolerant net-works, data centric communication in sensor networks, and the use of overlaysystems to provide indirection points.

New applications need new functionality in the Internet. The deploymentof new functionality is a very slow process. The Overlay concept to add newfunctionality avoids this by adding new functionality between the applicationand the existing Internet.

Many overlay services are peer to peer networks, which creates trafficpatterns that were not foreseen during the design of the Internet.

This new functionality, combined with the increasing heterogeneity, cre-ates a need for new ways to evaluate the performance of the system. Althoughmuch of the experience gathered from evaluating applications and protocolsin a fixed network are still valuable, they do not cover how to study theimpact of node mobility, or how a system handles nodes joining and leaving

9

the network in an unpredictable manner.The main question explored in this thesis is whether it is possible to

have nodes with limited connectivity, for example cell phones, as membersof applications based on distributed hash tables (DHTs) and how a mix ofdifferent nodes would influence the performance of such a system. The workis motivated by the increasing number of 3G enabled devices, and the avail-ability of flat rate pricing for such devices, which makes it more likely thatusers would use a bandwidth intensive application. The contribution is anincreased understanding of how a DHT performs in a heterogeneous envi-ronment. Different methods are used to explore the impact of heterogeneity.Another novelty is our design of the evaluation tools, including the extensionof previous tools for this environment.

Previous work on the DHT Pastry in heterogeneous environments[4] sug-gests that management traffic could be a problem because it causes the mobilenodes access links to get congested. The network management was redesignedin the follow-up DHT to Pastry, called Bamboo[22]. It performs updates pe-riodically rather than when changes occur. One advantage is that it canhandle higher network dynamics and avoids feed-back loops.

Our first evaluation approach uses simulations to allow us to configurelink parameters like bandwidth and delay. We use the NS-2 simulator[9],the ”standard” simulator in the research community. To study a DHT insimulation means that the DHT has to be re-implemented from scratch whichis a very time consuming task (paper A). The simulations of Bamboo indicatethat mobile nodes can participate in the DHT, but that the complexity ofthe system makes it impossible to simulate long enough scenarios to makeany firmer claims. The most valuable outcome of the simulations was theunderstanding of how to design scenarios with thin nodes and mobility whichplace stress on the system. The insight that the complexity of the problemmade simulations hard to perform was also valuable.

Our second approach was to perform experiments on PlanetLab[6]. As apart of this work, our tool Vendetta was developed to manage and analyzethe experiments (Paper B). Vendetta consists of two parts, the client and themonitor. The client runs on every node that participates in the experimentand controls the application that is evaluated. The monitor is a centralcontrol and visualization application. When the application runs, the clientcaptures the output from it and continuously parses it for predefined logentries. When a log entry is found, the client performs a configurable action.Such actions can for example be to stop the application and send a messageto the monitor. The monitor receives such messages from the nodes in theexperiment and can visualize them in a 3-D canvas. The contribution ofVendetta is not the individual functions but how they are incorporated into

10

Figure 1.1: A logical network on top the existing network

one powerful tool and Vendetta has proved very valuable for understandingunexpected behavior in Bamboo.

A reason for using simulations in paper A was the need to limit linkcapacities and a need to have complete control of the network topologies.When you do experiments on a testbed like PlanetLab, you do not have toworry about the accuracy of the network model as the physical network youuse is the Internet. To create a heterogeneous environment on PlanetLab,the bandwidth of some of the participating nodes needs to be limited. Un-fortunately that functionality is not currently offered by PlanetLab. Thatneed prompted us to design the Dtour tool. Dtour is a connectivity emula-tion system residing in user-space, implemented as a shared library. When anevaluated application uses system calls to send and receive traffic, the systemcalls are redirected to Dtour. There the traffic is either dropped or forwardedafter being filtered. The bandwidth limitation is implemented using a tokenbucket that the packets need to pass before going into the network stack. Thedesign is very lightweight with no needs to modify the operating system.

1.2 Overlay networks

An overlay network is a logical network built on top of an existing networkinfrastructure such as the Internet. The strength of overlay networks is that

11

you do not need to modify anything in the existing Internet to deploy them,as you only use the Internet as a transport service between nodes in the logicalnetwork. Overlay networks are often used to provide functionality which aremissing in the existing Internet, for example to support mobility[23], mediaservices[12] or virtual private networks (VPNs). It is also a fast way to deploynew services compared to incorporating them in the Internet design.

In networking terms, an overlay network uses the Internet as a globallydistributed link layer, as a “link” between two nodes. When an overlaynode sends traffic to another overlay node, it looks like the traffic is sentdirectly to the other node, although it may be sent through many physicalmachines. Many overlay networks offer a lookup service to their users. ALookup consists of a query from a user that gets a response from the network.

1.2.1 Peer to peer networks

Peer to peer (p2p) networks are overlay networks where all participatingnodes have the same initial functionality. In a p2p system, all nodes arepeers in the sense of having equivalent roles and responsibilities. The p2pmodel is in contrast to the classic client-server model used for example in webservices, where different nodes have clear roles in the system. All decisionmade within a pure p2p network are distributed, since there is no centraldecision point.

Some p2p networks do however select supernodes which are nodes withhigh performance and stable network connections[8]. Such nodes are typicallyused to perform network management tasks. Even though supernodes mightseem a contradiction to the p2p philosophy, all nodes still have the possibility(or risk) of being chosen to become a supernode.

1.2.2 Unstructured overlay networks

Overlay networks can be classified to be either structured or unstructured.An unstructured overlay network builds a random graph between the par-ticipating nodes and uses algorithms like random walk[29] or flooding todistribute queries through the network. If the flooding is not complete, thereis no guarantee that an answer will be found. Unstructured overlay networkshave good scalability properties because a node only needs to know a fewother nodes in the network. For that reason, unstructured p2p networks areoften used for file sharing[11] where an user wants to find a certain file, notall copies of it.

12

Figure 1.2: The dotted lines show a leafset in a ring based DHT. A leafsetis pointers to neighboring nodes in the key space, the dashed lines show arouting table

1.2.3 Structured overlay networks

Structured overlays assign keys to data items and have a mapping functionthat map a key onto a node in the overlay. Having such a mapping functionmakes it possible to have efficient lookups of the data as every node in thenetwork knows where to forward requests. It also makes it possible for anode to insert certain data into the overlay, and for another node to be sureto retrieve it at a later time. There are of course times when a structuredoverlay can not return values previously inserted but that is an error state,not as in unstructured overlays in which it can occur because of incompleteflooding. To decrease the risk of failed lookups due to nodes leaving thenetwork, many structured overlay networks use replication of data amongmultiple nodes.

1.2.4 Distributed Hash Tables

Structured overlays are mainly implemented DHTs. A DHT offers a storageservice to its users in which data can be inserted and later retrieved fromanywhere in the network. A DHT handles key-value pairs where the keyoften is a hash of the value. A key-value pair might be a name and a tele-phone number, and in that case the telephone number could be retrieved byusing the name. Such a service is a useful building block when designingdistributed systems. Examples of where DHTs are used are in Azureus[1] tofind torrent files, as building blocks in systems supporting mobility[23], andin grid computing[3].

13

Figure 1.3: The PUT-GET semantic of DHTs

Four different proposals for DHTs were published in year 2001. They areChord[24], Pastry[21], Tapestry [28], and CAN[16]. Their algorithmic back-ground is consistent hashing[13], which has the property that when addingor removing bins in a hash table, a limited amount of keys need to be movedbetween bins. If the number of bins in an ordinary hash table is changed, amajority of the keys needs to change bin. The hash function is expected todistribute keys evenly over the key space. Consistent hashing was initiallyused to do load balancing among web servers but it is also a good way topartition the key space.

Chord, Pastry, Tapestry, and Bamboo[18] organize keys and nodes in acircular key space using SHA-1[7] as the hash function. CAN uses a morecomplex key space, a n-dimensional Cartesian coordinate space on a multi-torus. The 2-dimensional CAN key space is presented in figure 1.5.

The keys in a DHT are flat identifiers, meaning that they do not holdany hierarchical information. This is in contrast to an IPv4 address thatis tightly coupled to a physical location in the Internet, where the locationcan be derived from the hierarchy of the address. Because of this difference,DHTs are useful building blocks in systems that want to differentiate betweenlocation and user identity. Such systems can enable transparent mobilitybecause a user can keep her ID as she moves around in the physical network.

There are also approaches where keys are given an hierarchical meaning.For example, in a global geonotes system where geographical location is partof the key[25]. The DHT is then made aware of the hierarchy of to increaseefficiency. Others have built data structures on top of DHTs [5], which

14

goes well with the idea of having a DHT as a service[19, 2] for nodes notparticipating in the overlay.

The separation between physical location and logical position in the over-lay network makes DHTs robust against network disturbances. It is highlyunlikely that two adjacent overlay nodes are located close to each other inthe underlying physical network, thereby risking being affected by the samelocal network outages. However, if the underlying network gets partitioneda DHT might be divided into two different networks. Therefore most DHTshave network merging functionality.

Data insertion and retrieval

When data items are inserted into a DHT, they are given an identifier valuein the network by the hash function. The hash function is applied to the dataor meta-data and the result is called a key. The set of all possible keys iscalled the key space and is dependent on the hash function used. Because ofthe properties of hash functions, every data item has one unique place in thekey space, and that place can be located by any member of the network. Themost common hash function used in DHTs today is SHA-1 which distributedata among 2160 bins. An one dimensional key space can be thought of asa ring into which values are put, which in the case of SHA-1 are all valuesbetween 0 and 2160 − 1 (figure 1.2). This thesis concentrates on ring-basedDHTs.

When values are inserted into the key space, you need to decide whichnode should be responsible for what values. To do that, nodes are also putinto the key space, often by applying the same hash function to the port andIP address. We will use the term node ID to indicate the place where a nodeis put into the key space. In Chord for instance, a node is responsible for allvalues between its node ID and the node ID of the next node in key spacewhile in some other DHTs the node with the numerically closest node ID tothe key is responsible for attached data[21, 18].

The purpose of a DHT is to offer a service for handling key-value pairs.To insert data is called a PUT and to later retrieve it is called a GET (figure1.3). When a user inserts data to, or requests data from, a DHT, messagesneed to be forwarded between the DHT nodes for the PUT or GET to reachthe responsible node. To find the responsible node is called a lookup whichcan be done in two different ways. The first way is that the initiating nodeasks a node for a pointer to the next suitable node in order to forward thelookup. When an answer arrives, the initiating node issues a new request tothe node pointed to by the previous node, and the process continues untilthe right node is found. The second way is that an initiating node sends

15

Figure 1.4: Different approaches for lookups in a DHT

a request to another node to do a lookup, and if the receiving node is notthe responsible node, the receiving node forwards the request through theDHT (figure 1.4). The first approach, iterative routing, gives the initiatingnode control of the lookup which circumvents the potential problem of amalicious node dropping requests. The second approach, recursive routing,on the other hand performs lookups with lower latency and also tackles theproblem of non-transitive connectivity[10].

Management traffic

As DHTs should work in dynamic network environments, nodes need tocommunicate with each other in order to know what nodes are still membersof the network. The nodes a certain node communicate with directly arecalled neighbors. The status of neighbor nodes is often tested using somekind of echo-reply communication. The interval with which a neighbor iscontacted affects how fast network dynamics the system can handle. This isa relevant setting in our scenario when mobile users are expected to join andleave at a high rate.

A design decision when creating a DHT is how a node should select neigh-bors. The crude approach is to have all nodes communicating with all othernodes in the DHT. Such an approach is feasible for small networks, and willgive good lookup performance as all values can be reached with only onerequest, or with complexity O(1). However, such an approach becomes inef-ficient when the number of nodes participating in the network grows large,and it is therefore common to also use more advanced methods when selectingneighbors.

In DHTs that use a circular key space, it is common to let nodes keep track

16

Figure 1.5: A 2-dimensional CAN key space divided among eleven nodes

of certain number of nodes before and after their node ID. The set of suchnodes are sometimes referred to as a leafset (figure 1.2). The leafset ensuresthat messages can be passed between any two nodes in a stable network,using other nodes. To only use a leafset causes a high lookup path lengthof O(n), where n is the number of participating nodes. Therefore, althoughsufficient to ensure correct lookups, it is not efficient at handling lookups inbig networks. To increase performance, nodes can keep information aboutother nodes far away in key space. That information is often called a routingtable (figure 1.2).

1.2.5 Metrics

There are certain metrics commonly used when evaluating the efficiency ofDHTs. An obvious metric is how timely the DHT is in terms of servicingrequests, often called lookup latency. Lookup latency and successful lookupsare two metrics with which to evaluate DHT services. However, to onlyconsider lookups overlooks the internal processes causing delays and failures.To see how efficiently an overlay uses network resources, you often measurethe ratio between physical network hops and overlay hops, the overlay stretch.A high overlay stretch increases the risk of something going wrong duringthe lookup and can therefore lead to a decrease in successful lookups.

The price of providing an overlay service can also be evaluated. It mainlyconsists of the traffic that needs to be sent between nodes regardless ofwhether requests are served or not.

17

1.3 Evaluation methods

With overlay services becoming more widely deployed, the need to evaluatethe performance of such systems have increased. Experimental evaluationtools include simulators, emulators, and testbeds, while theoretical methodsinvolve statistics and formal methods. The theoretical methods are beyondthe scope of this thesis, so only the experimental methods will be discussedhere.

1.3.1 Simulation

The most used method to evaluate overlay systems is by simulation. Simu-lations can be very useful, as you have complete control of the environmentin which you evaluate an application. You often need to implement an ap-plication or a protocol specificly for a simulator to evaluate it. That mightbe good, as you can make simplifications that makes the model less com-plex. Such a simplified implementation might make it possible to simulate abigger network or longer scenarios. Simulation does not typically run in realtime, so it is also feasible to study quite long scenarios in a short time if thecomplexity is kept low.

When a scenario is designed, there are a multitude of configurable param-eters which need to be assigned. Parameters might control network topology,network dynamics, and timers in the evaluated application or protocol im-plementations. There are numerous tools for creating network topologiesaccording to Internet models, but still, you always need to estimate the rel-evance of the created topologies for your experiments.

The simulation environment makes it possible to quickly change thetopologies and even model configurations which are rare in real life. Thisalso means that the evaluation results are directly dependent on the accu-racy of the models, e.g. of the network topology and the application behavior.It can make it hard to draw general conclusions outside the the assumptionsof the models and the parameters used.

Simulators

The most common network simulator within the academic research commu-nity is NS-2[9], which is an event-driven packet level simulator. By simulat-ing every packets path through a network topology, links and queues can besimulated. Such a detailed network model is needed when evaluating trans-port protocols. However, such high detail in simulations is computationally

18

expensive which makes it time consuming to evaluate large scale networkconfigurations.

Another approach to simulate big networks is to have a network modelonly modeling delays. It is fairly simple to implement an event driven simu-lator with such a network model, so many researchers implement their own.Such simulators are often used to verify functionality rather than to evaluateperformance. This is because they can not model bandwidth or packet losscaused by full network queues. The praxis that developers and researchers ofoverlay systems implement their own simulator unfortunately makes it hardto compare different results, implementations, and algorithms.

Our choice to implement Bamboo in NS-2 was based on the need to setlink parameters to model wireless access links. If we only wanted to modelhigh network dynamics caused by mobile users a simpler simulator couldhave been used.

1.3.2 Network emulation

In emulation, parts of the real system and models are combined. The modeledpart is used for different reasons. It could replace a complicated part whichis difficult to provide, such as a large network. It could be used to providea repeatable environment, such as a radio network which otherwise has anunpredictable component. In both examples, parameters for the model couldbe systematically changed during an experiment. With an network emulatorwe mean that the actual application is used and that the emulator providesthe same interfaces as the real network. Still, a designer of a network emulatorneeds to design a communication scenario for the emulator.

Emulation has been used a lot within wireless network research due tothe ability to model mobility as connectivity changes[27]. For research onsystems in wired network there are hardware network emulators available,as well as publicly available emulation testbeds like Emulab[26]. Emulabsupports both wireless and wired experiments as well as mobility.

Recently there has been work done that aim at using measured networkphenomena in an emulated network, where the properties of the emulatednetwork is affected by measurements from a real network[20].

1.3.3 Real experiments

To evaluate how a networked application would behave in a real deployment,real experiments is the most valuable method as it is common that an appli-cation behaves unexpectedly when it is exposed to real network dynamics.Real experiments are often hard to perform due to coordination problems,

19

time synchronization, and hardware. Unlike in simulation and some em-ulation, you do not have a global clock to time your measurements with.There are tools available to support real life experiments that typically helpsin choreographing node behavior, synchronizing experiment start, and latergather logfiles and other data[15].

A fundamental property of real experiments is the varying environmentbetween experiments. It might be that the background noise varies overtime when doing wireless measurements, or cross traffic in the Internet whendoing overlay experiments. On one hand such variations reflect what a realdeployed system would have to cope with, so results gathered under suchcircumstances are highly relevant. On the other hand, if the variations arehigh, they can cause results to be hard to compare or reproduce.

In overlay network research, where you typically want to evaluate systemswith many nodes spread over the whole Internet, real experiments are costly.Some companies have testbeds that can be used to perform experiments[5]but the most commonly used testbed is PlanetLab[6]. PlanetLab is a coop-eration between mainly research institutions that provides a global testbed.The testbed currently consists of 777 machines at 378 physical locationsaround the world. The users of the testbed can get shell accounts on themachines. To have access to that many machines distributed over the worldenables many interesting experiments, but it also creates new problems; forexample clock skew, machine crashing, and other experiments competing forresources.

Running experiments on PlanetLab involves problems that you do nothave in simulation or emulation. First you need to distribute software andpossibly different configuration files to all the participating nodes, and unlikein simulation, it is a time consuming task. When the nodes have the rightsoftware installed the experiment needs to start on all nodes synchronized,which is pretty hard to achieve on PlanetLab. While the experiment isrunning, it is nice to be able to monitor how it proceeds, but it is often hardto get a good picture of the experiment by looking at logfiles at differentnodes.

Currently there is no way offered by the PlanetLab testbed to controlnetwork specifics like intermittent connectivity or limited bandwidth. Beingable to control such properties of the nodes in a experiment can be valu-able when evaluation overlay systems that should function in other networkenvironments than the fixed Internet.

To evaluate a DHT running on the Internet with low bandwidth nodesparticipating we designed Dtour. It is a lightweight connectivity emulationlibrary which allows us to emulate weak access links on PlanetLab.

20

1.4 Summary of papers

This thesis consists of the following papers.

Paper A: A bandwidth study of a DHT in a heterogeneous environ-mentOlof Rensfelt and Lars-Ake LarzonUppsala University Technical report no: 2007-017

This technical report documents the work of implementing a versionof the Bamboo DHT to NS-2. It describes how NS-2 was modified tobetter handle node churn, as well as how the heterogeneous scenariowas modeled. It also presents simulation results indicating that mobilephones might actually work as full members of a DHT. The choiceto use NS-2 might in retrospect be questioned, as it turned out itwas extremely time consuming to reimplement the system. However,the experience about what scenarios to study and how to model themhave showed themselves to be very valuable when doing PlanetLabexperiments.

Paper B: Vendetta - A Tool for Flexible Monitoring and Manage-ment of Distributed TestbedsOlof Rensfelt, Lars-Ake Larzon and Sven WestergrenIn the proceeding of TridentCom 2007, May , Orlando

In this paper, the Vendetta monitoring and management tool is de-scribed. Vendetta is a tool both used to interactively control experi-ments as well as visualize events that occur in an overlay network. Thesystem consists of two parts - first a small piece of software runningon every node in a testbed called the client and second, a monitorwhere an experiment can be set up, monitored, and controlled. A maincontribution is the framework to handle logfile parsing during exper-iments, which in combination with the generic event queue allows agreat amount of flexibility when controlling experiments.

Paper C: Evaluating a DHT in a heterogeneous environmentOlof Rensfelt, Sven Westergren and Lars-Ake LarzonSubmitted for publication

Using both the NS-2 implementation of a DHT as well as real exper-iments on PlanetLab, the impact of weak nodes to a DHT was evalu-ated. To model heterogeneity on PlanetLab, a lightweight emulationlibrary called Dtour was designed. Dtour decides whether packet shouldbe forwarded or dropped. It is implemented by catching system calls

21

like send() and sendto() and a filter mechanism. The packets are sentthrough a token bucket filter to limit bandwidth. The results show thatthere is a good match between simulation results and PlanetLab mea-surements and that a DHT like Bamboo can actually cope quite wellwith bandwidth limited nodes and high churn rates. However, the DHTenters an oscillating state which needs to be addressed. The oscillationdoes not occur when nodes without bandwidth limitation churn in thesame pattern as bandwidth limited nodes in the oscillating experiment.Neither does it occur when bandwidth limited nodes participate in thenetwork without churning so the combination of churn and bandwidthlimitations seems problematic.

1.5 Ongoing work

Since PlanetLab allowed us to evaluate long scenarios compared to simu-lations, we were able to observe strange behavior that did not appear insimulation.

1.5.1 Experimental setup

The experiments running on PlanetLab matches the simulations from pa-perA. The only difference is that there is no extra delay on weak nodes accesslinks on PlanetLab. Nodes are either weak or strong where weak nodes mod-els mobile terminals and the strong nodes models nodes with broadband con-nection. The weak nodes are bandwidth limited according to measurementsfrom an commercial available 3G service where the uplink was measured to384 kb/s and the downlink to 64 kb/s[14]. While the strong nodes stay con-nected to the network for the duration of the experiment, weak nodes joinand leave with short intervals. The network size is kept fixed by letting anew node join as soon as a node leaves. When nodes join and leave are mod-eled by a Poisson process which creates exponentially distributed connectiontimes with the mean of 3 minutes.

1.5.2 Investigating network instability

In figure 1.6 we present performance over time for the DHT. All experimentswith weak nodes show significant variations in mean lookup latency over thetime of the experiment. In figure 1.6(a) it is clear that the latency slowlyoscillate with an about five hours period.

22

0 5 10 15 20 25 30 350

1

2

3

Time (h)

Late

ncy

(s)

(a) Mean lookup latency over time

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

Time (h)

Suc

cess

rat

io

(b) Success ratio over time

0 5 10 15 20 25 30 350

1

2

3

4

Time (h)

Tx

and

Rx

(kB

/s)

(c) Used bandwidth over time

Figure 1.6: Performance and cost over time for Bamboo, 30% weak nodes.

23

Because the addition of bandwidth limitations lead to the oscillation, weexpected to see an increase in dropped packets during the latency peaks.However, when we studied the drop rate we found it rather decreases indi-cating that nodes decrease their sending rates. From figure 1.6(c), we canalso see that the mean bandwidth used is below 4 kB/s for combined receivedand sent data, which is about half the upstream limit of 64 kb/s.

The impact of churn on a DHT can be substantial [18]. It does not onlycause failed lookups due to nodes leaving while forwarding a lookup, but italso causes routing tables to be non optimal. Non optimal routing tables willcause higher lookup latencies in the DHT. Churn can also create an increasein management traffic when newly joined nodes need to synchronize neighborinformation and stored data. Such traffic might congest links between nodes.

To investigate the impact of churn, we ran experiments without any band-width limitation, but let 30% of the nodes churn like weak nodes. Exceptfor the lack of bandwidth limitation, the setup was identical to previous ex-periments. In the measurements from this experiment, we observed somevariations in latency in the first few hours but the latency stabilized.

The results from the experiment indicate that the churn is not solelyto blame. We also ran an experiment with 30% weak nodes that did notchurn and obtained similar results. This experiment performed at a stablelevel throughout the entire experiment, without any significant variations inlatency or success ratio. This result also reduces the risk that a programmingerror in for example Dtour is causing the oscillations.

Since neither churn nor bandwidth limitation on its own caused the net-work to crash we are lead to believe that the cause must be the combinationof the two. Our current hypothesis is that the congestion mechanism imple-mented on top of UDP is causing the oscillation. The main reason for thissuspicion is the decrease in total traffic sent and received during the latencypeaks seen in figure 1.6(c). We find it interesting that added dynamics with a3 minute mean interval can cause dynamics on +5 hour time scales. Becausethe congestion mechanism reacts to dropped packets, we would like to findout what packets are dropped when during the experiment. Unfortunately itseems hard to make Dtour aware of what packets are dropped because it isnot a well defined protocol but serialized objects that are sent between Javamachines. We do not currently know how to solve that problem.

1.6 Conclusions and future work

As we have worked on evaluating a DHT in heterogeneous networks we havefound a need to improve the available tools since both simulation and testbeds

24

have limitations. We have extended existing tools during our work, both bymodifying NS-2 and implementing an emulation library which can be usedon PlanetLab.

Our results both from simulation and PlanetLab experiments indicatethat a DHT could work with a high percentage of bandwidth limited nodes.Even if the time they are attached to the network is short. The performanceobviously suffers, but the system is able to satisfy requests even under ex-treme conditions.

On a longer time scale there are two main directions that seem interestingto pursue. First it would be very interesting to see if Dtour and Vendettacould be used to evaluate other networks than overlay networks. It seemslikely that Vendetta could also be very useful for managing sensor networktestbeds. A tool like Dtour might also be useful when evaluating DTN solu-tions. Extending Dtour with functionality to delay traffic would also enableother interesting uses.

25

Bibliography

[1] BitTorrent client. Online: http://azureus.sourceforge.net/, 2001.

[2] Hari Balakrishnan, Scott Shenker, and Michael Walfish. Peering Peer-to-Peer Providers. In 4th International Workshop on Peer-to-Peer Systems(IPTPS ’05), Ithaca, NY, February 2005.

[3] Sujata Banerjee, Sujoy Basu, Shishir Garg, Sukesh Garg, Sung-Ju Lee,Pramila Mullan, and Puneet Sharma. Scalable grid service discoverybased on uddi. In MGC ’05: Proceedings of the 3rd international work-shop on Middleware for grid computing, pages 1–6, 2005.

[4] Fredrik Bjurefors, Lars Ake Larzon, and Richard Gold. Performance ofpastry in a heterogeneous system. In Proceedings of the fourth IEEEInternational Conference on Peer-to-Peer Computing, 2004.

[5] Yatin Chawathe, Sriram Ramabhadran, Sylvia Ratnasamy, AnthonyLaMarca, Scott Shenker, and Joseph Hellerstein. A case study in build-ing layered dht applications. In SIGCOMM ’05: Proceedings of the 2005conference on Applications, technologies, architectures, and protocols forcomputer communications, pages 97–108, New York, NY, USA, 2005.

[6] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Pe-terson, Mike Wawrzoniak, and Mic Bowman. PlanetLab: An OverlayTestbed for Broad-Coverage Services. ACM SIGCOMM Computer Com-munication Review, 33(3):00–00, July 2003.

[7] Donald Eastlake 3rd and Paul E. Jones. US Secure Hash Algorithm 1(SHA1). RFC 3174 (Informational), September 2001.

[8] Open Source Community. Fasttrack. Online: http://www.fasttrack.nu/,2001.

[9] Sally Floyd and Steve McCanne. ns network simulator. Online:http://www.isi.edu/nsnam/ns, 2003.

27

[10] Michael J. Freedman, Karthik Lakshminarayanan, Sean Rhea, and IonStoica. Non-transitive connectivity and DHTs. In Proc. 2nd Workshopon Real, Large, Distributed Systems (WORLDS 05), San Francisco, CA,December 2005.

[11] Wireless Network Topology Emulator. Online:http://sourceforge.net/projects/wnte/, 2001.

[12] Saikat Guha, Neil Daswani, and Ravi Jain. An experimental study ofthe skype peer-to-peer voip system, 2006.

[13] David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, DanielLewin, and Rina Panigrahy. Consistent hashing and random trees: Dis-tributed caching protocols for relieving hot spots on the world wide web.In ACM Symposium on Theory of Computing, pages 654–663, May 1997.

[14] Daniel Lanner. Comparison of tcp-performance in wireless 3g- and adhoc-networks. Master’s thesis, Uppsala University, 2006.

[15] Erik Nordstrom, Per Gunningberg, and Henrik Lundgren. A testbedand methodology for experimental evaluation of wireless mobile ad hocnetworks. In TRIDENTCOM ’05: Proceedings of the First Interna-tional Conference on Testbeds and Research Infrastructures for the DE-velopment of NeTworks and COMmunities (TRIDENTCOM’05), pages100–109, 2005.

[16] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, andScott Schenker. A scalable content-addressable network. In Proceed-ings of the 2001 ACM SIGCOMM conference on applications, technolo-gies, architectures, and protocols for computer communications, pages161–172, 2001.

[17] Olof Rensfelt and Lars Ake Larzon. A bandwidth study of a DHTin a heterogeneous environment. Technical Report 2007-017, UppsalaUniversity, May 2007.

[18] Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. Han-dling churn in a DHT. In Proceedings of the 2004 USENIX Annual Tech-nical Conference (USENIX ’04), Boston, Massachusetts, June 2004.

[19] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, SylviaRatnasamy, Scott Shenker, Ion Stoica, and Harlan Yu. OpenDHT: apublic DHT service and its uses. SIGCOMM Comput. Commun. Rev.,35(4):73–84, 2005.

28

[20] Robert Ricci, Jonathon Duerig, Pramod Sanaga, Daniel Gebhardt, MikeHibler, Kevin Atkinson, Junxing Zhang, Sneha Kasera, and Jay Lep-reau. The Flexlab approach to realistic evaluation of networked sys-tems. In Proc. of the Fourth Symposium on Networked Systems Designand Implementation (NSDI 2007), Cambridge, MA, April 2007.

[21] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralizedobject location, and routing for large-scale peer-to-peer systems. LectureNotes in Computer Science, 2218, 2001.

[22] Sean Rhea and Dennis Geels and Timothy Roscoe and John Kubiatow-icz. Handling churn in a DHT. Technical Report UCB/CSD-03-1299,University of California, Berkeley, December 2003.

[23] Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, and SoneshSurana. Internet indirection infrastructure, 2002.

[24] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and HariBalakrishnan. Chord: A scalable peer-to-peer lookup service for internetapplications. In Proceedings of the 2001 ACM SIGCOMM conferenceon applications, technologies, architectures, and protocols for computercommunications, pages 149–160, 2001.

[25] Sven Westergren. Notenet - range queries in a DHT. Master’s thesis,Uppsala University, 2007.

[26] Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Gu-ruprasad, Mac Newbold, Mike Hibler, Chad Barb, and AbhijeetJoglekar. An integrated experimental environment for distributed sys-tems and networks. In Proc. of the Fifth Symposium on Operating Sys-tems Design and Implementation, pages 255–270, Boston, MA, Decem-ber 2002.

[27] Open Source Community. Gnutella. Online: http://www.gnutella.com,2001.

[28] Ben Y. Zhao, John D. Kubiatowicz, and Anthony D. Joseph. Tapestry:An infrastructure for fault-tolerant wide-area location and routing.Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.

[29] Ming Zhong and Kai Shen. Random walk based node sampling in self-organizing networks. SIGOPS Oper. Syst. Rev., 40(3):49–55, 2006.

29

Chapter 2

Paper A: A bandwidth study ofa DHT in a heterogeneousenvironment

31

Abstract

We present a NS-2 implementation of a distributed hash table (DHT) mod-eled after Bamboo. NS-2 is used to evaluate the bandwidth costs involvedin using a DHT in heterogeneous environments. Networks are modeled asmixed networks of desktop machines and 3G cellphones. We also documentthe modifications of NS-2 that were needed to simulate churn in large net-works.

2.1 Introduction

In the design of distributed applications, there has been a strong trend duringthe last decade to use the Internet mainly for connectivity and build an over-lay network with its own node identifier space on top of IP. This effectivelydeals with problems that could otherwise occur due to dynamics in IP ad-dress allocations. By not using the IP address as an identifier for the serviceitself, the service can continue to function as long as Internet connectivity ismaintained even if IP addresses change over time.

A common approach to introduce a new identifier space is to use dis-tributed hash tables (DHTs). A DHT is a distributed data structure thatfunctions much like an ordinary hash table, except that the key space isdistributed over several nodes rather than kept together at one single node.When querying a value in a DHT, the query is routed to the node thatmaintains the corresponding part of the key space. Nodes continually ex-change data to keep track of how the responsibility is divided among them.Most DHTs include some degree of replication to deal with nodes that maydisappear without prior notice.

Most existing evaluations of DHTs are done using simulators written forthat specific purpose to enable proper simulation of the data structure itselfwith a focus on how queries are carried out. Modeling of the communicationbetween nodes that collaborate in a DHT tend to be simplistic at best -sometimes it is assumed that all messages sent are instantaneously receivedby the receiver. While this assumption can be argued to be a reasonablesimplification in an environment with fast computers communicating over afixed Internet connection with high bandwidth, it does not hold for moreheterogeneous network environments.

In this report, we document the NS-2 implementation of a Bamboo[11] likeDHT and present simulation results on how it behaves under such conditions.The main contribution is that the implementation allows a more detailednetworking model, compared to simpler simulators.

2.2 Overlays

A recent trend is to use overlays to deploy functionality that the existing net-work infrastructure does not provide. An overlay network is a logical networkbuilt on top of the existing network with its own addressing scheme usingthe Internet as a link layer. Building an overlay network makes it possibleto deploy functionality not available in the current Internet architecture, orto create services that are provided by the users of the service. A user pro-

33

vided service is for instance BitTorrent where a group of users cooperate toprovide efficient mass-distribution of large files. In some sense the users payfor the service by participating in the overlay. The common use of overlaysis as the communication module of an application, sometimes referred to asBYOI (Bring Your Own Infrastructure). BYOI has proved very successfulin file-sharing applications and in some sense in VoIP with Skype. Parts ofthe research community are however suggesting to have overlays as servicesfor multiple applications to share, provided by companies [1]. The benefitwould be that the price of management overhead can be shared among moreparticipants. Also, such a service could be assumed to be more stable thana service where only the users collaborate to provide the overlay. Anotherbenefit, or draw back, depending on conviction, is that a payment systemneeds to be added to the system because users no longer pay for the serviceby participating in it.

2.3 Distributed Hash Tables

A common service provided by overlay networks is a lookup service handlingflat identifiers with a ordinary query-response semantic. Such a service isoften implemented using DHTs (Distributed Hash Tables) [9, 16, 13, 18, 11]. A DHT allows you to insert values connected to keys much like ordinaryhash tables. A key is typically a hash of the value stored or alternatively ahash of some meta data of the value. When the key is inserted it is routedthrough the overlay network until it reaches the node that is responsible forstoring the key. The key can later be used to retrieve the value from theDHT.

The flat address structure often used in overlays, and especially DHTs,is appealing for cases when you want addressing differentiated from yourphysical location in the network. Such a differentiation can for instance bea building block in systems supporting mobile nodes [15] where identifiersshould remain the same regardless of the location of the node.

Despite the flat address space structure on the DHT level, it is still pos-sible to add some form of hierarchy in the application. E.g in [17] we embedthe geographic location of information in the key itself. Other have also builthierarchy on top of a DHT [3].

2.3.1 Bamboo

Bamboo is a DHT implementation first presented in [11]. It is referred to asa third generation DHT, where lessons learned from previous systems have

34

Figure 2.1: The routing table. The white nodes are the middle white node’sleafset if the leafset size is configured according to l=3. The dotted arcs showthe routing table entries

been incorporated in the design. The Bamboo implementation has provedstable when used in OpenDHT [12] ,where it serves a system with gooduptime.

To continue the earlier studies[2] of how an overlay network behaves in aheterogeneous environment, we chose to implement a DHT in NS-2[5]. Webelieve that a lot of the problems seen in [2] is addressed with Bamboo.For example, the problems encountered with Pastry in heterogeneous net-works were mainly caused by management traffic congesting nodes, and anew approach to management traffic were presented in [11].

Network structure

Bamboo uses the routing logic of Pastry but has more developed mechanismsfor maintaining the network structure in a dynamic environment. A big partof network dynamics is that nodes leave and new nodes join the network,which is called churn. Bamboo maintains two sets of neighbor informationin each node (figure 2.1). The leafset consists of successors and predecessorsthat are the numerically closest in key space. When routing a query, it isforwarded to a node which has the key in its leafset. Using the leafset isenough to ensure correct lookups. However if only the leafset was used whendoing lookups, a lookup complexity of log(n) is all that could be achieved. Toimprove the lookup complexity, a routing table is used. The routing table ispopulated with nodes that share a common prefix, and routing table lookups

35

are ordinary longest prefix matching.

The major difference between Pastry and Bamboo is how they handlemanagement traffic. In Pastry, management is initiated when a networkchange is detected, while in Bamboo all management are periodic regardlessof network status. The approach to use periodic updates has been showedto be beneficial during churn [11] since it does not cause management trafficbursts during congestion. Such traffic bursts can further increase networkdisturbances.

The Bamboo system has been evaluated both in simulation and as adeployed system on PlanetLab[4]. However the evaluations have not takenbandwidth or other node specifics into account, only network delay. This isnot a major problem if you want to evaluate scalability and lookup delaysin noncongested networks. The nodes in PlanetLab are typically very strongmachines on academic or other types of very stable, high bandwidth networks,and therefor they are not suited for studying the scenario we are investigating.

2.3.2 Management traffic

In order for a DHT to be able to serve requests and maintain a consistent net-work view among its nodes, it needs to perform network maintenance. Thismaintenance consists of network messages sent between nodes. In this sectionwe will describe the different types of maintenance performed by Bamboo.Periodic management traffic occurs in all layers of the Bamboo system (figure2.2). In the data transfer layer, ping messages are used to measure RTTs(Round Trip Times) to peers. Routing table and leafset information are ex-changed and databases are synchronized. We have used [11, 10] as designdocuments as well as the Java source code from [6].

Neighbor ping

The most basic management traffic type is to make sure that you can stillreach your one-hop neighbors in the overlay. This is normally done with anecho/reply type of communication. In Pastry it is called probes, and othersystems have the same function with different names. The messages sentare not ICMP pings but UDP echo and reply packets. The major designdecisions regarding neighbor pings are the interval which is used to ping andthe number of unanswered pings that should cause a node to treat a neighboras unreachable or, as in Bamboo, as possibly down. In Bamboo the neighborpings are also used to maintain a RTT estimate used for retransmission time-out calculations.

36

The reason why UDP is the preferred transport protocol in Bamboo isthat the overhead of connection oriented communication does not justifythe benefits of reliable transfer. A DHT also has a non symmetric natureregarding neighbor knowledge between nodes, meaning that the fact thatnode A has node B in its neighbor set does not necessarily mean that nodeB’s neighbor set include node A. Because of this asymmetry the number ofnodes that know a certain node will increase with the network size. If TCP isused as the transport protocol, the state that a node needs to keep increasessignificantly as TCP needs both the receiving and the sending nodes to keepstate information. A DHT could benefit from using a transport protocol withproperties like DCCP [7] as mentioned in [11]. DCCP offers a UDP-like, non-reliable datagram transfer with congestion control.

Leafset updates

Changes in node leafsets are propagated using an epidemic approach. Everynode periodically chooses a random node from its leafset and performs aleafset push followed by a leafset pull in response. Both messages involvesending the complete leafset to the synchronizing node where the informationis incorporated. It is important to both push and pull leafsets. Otherwisethere might arise situations where nodes are missed in the leafsets of itsneighbors [10].

Local routing table updates

When a node has another node in its routing table, those two nodes per def-inition share one level. The local routing table updates are used to exchangethe node information in that level. If a node gets information about othernodes that fits into the routing table it probes the nodes to test reachabilityand to get a RTT estimate. If a node is reachable and fits into an emptyfield in the routing table, it gets added. If the matching routing table entryis occupied, the node with the lowest latency is chosen. Other optimizationschemes could be considered, such as optimizing for uptime, but optimizingfor latency is the most common approach used. Having an optimized routingtable does not influence lookup correctness, only lookup latency.

Global routing table updates

Local routing table updates can only improve routing table levels that arenot empty. To improve that, you need to exchange routing table informationwith nodes that you do not yet know of. To find such nodes the routingfunctionality of Bamboo is used. To optimize a certain routing table entry,

37

a lookup is made for a key which shares prefix with that entry. If a suitablenode exists in the network the request will be routed to it, and that nodes isa candidate for the routing entry. Unlike with local updates, global updatescan be used to optimize a specific routing table entry.

Data storage updates

When data is stored in the DHT using the PUT command, the data isrouted through the DHT to the node primarily responsible for storing thedata. When the responsible node gets the data, it caches it within its leaf-set at ’desired replicas’ neighbors in each direction. The caching does notoccur immediately, but is performed by the periodic replication functional-ity described below. The value ’desired replicas’ is a configure parameter,and with the default settings there are 7 copies of the data within the sys-tem. When nodes disappear or joins, the subset of nodes that should storea certain value changes. Therefore there is a need for a mechanism to tryto restore the distributed storage to the wanted state. The default settingof ’desired replicas’, and the resulting 7 copies of each data units within thesystem, causes demands for storage space. If all nodes have equal amountsof keys to store, every node needs to store seven times that amount.

The first maintenance operation made is that a node periodically picksa random node in its leafset and synchronizes the stored keys with it. Asynchronization operation starts with a node picking a node to synchronizewith and requests a synchronization. The other node calculates the set amongits stored keys that it believes should also be stored at the initiating node andsend those keys and the hash values of the data. The other node receives thekeys and hash values, and matches them to what it has stored. If a certaindata unit received is not already stored it requests that data unit from theinitiating node.

The second maintenance operation performed by the data storage layeris to move values that are not longer within a nodes storage range. If a nodehas such a value stored, it performs a new PUT to the place it should bestored before deleting it.

2.4 Implementation

We have implemented a DHT in NS-2 [5] and, in what we believe to be therelevant properties, made it as similar to Bamboo as we could. However,since we did not run the Java code in simulation, differences might existthat we are not fully aware of. We will state the known differences when

38

Figure 2.2: Block diagram of the Bamboo-NS2 implementation

describing the different parts of the system. During the implementationwork we have used the technical report [14] as a reference as well as thesource code, and later the doctoral thesis[10] when it became available. Inthe following text we will refer to our implementation as Bamboo-NS2, andthe original implementation as Bamboo.

The NS2 implementation consists of multiple modules that are constructedto fit the design of NS2, rather than the design of Bamboo (figure 2.2). Thereare however many similarities between which modules Bamboo and Bamboo-NS2 are divided into.

2.4.1 NS-2 specifics

To be able to simulate big networks, we needed to make some simplifications.One simulation specific method is that we have the possibility to build theoverlay network before the actual simulation starts. We will refer to this asbuilding the network offline. In section 2.5 we will further discuss how thisinfluences the evaluation.

As previously mentioned we have not implemented storage of real datain order to save memory, and instead of a faked hash value we use a globallyunique id on every data item that exists in the DHT. Since we have control ofall data that is inserted into the DHT, we believe this to be a valid approach.

When we started to simulate churn, we ran into some problems withmemory leaks trying to free NS-2 objects. This lead us to reuse the sameNS-agents with multiple overlay nodes. First we tried to have multiple NS-

39

nodes for each overlay node, so that when an overlay node went down and a’new’ overlay node came up, it came up on a different NS-node. The reasonthat we did not simply use the same NS-node for the new overlay node isbecause of the node information about the old node that is still in the system.This would cause a new node to receive traffic meant for an old node whichwould take up link bandwidth. We call this kind of traffic ’stale traffic’. Wedid not want to filter out traffic to no longer active nodes at the sendingnode, because in a real life deployment there is no way of knowing whether anode is active or not. The approach with multiple NS-nodes meant that weneeded to simulate much bigger networks since many more physical nodesthan overlay nodes where needed. Even when we used three physical nodesper overlay node stale traffic still turned up at newly joined nodes. Thereforewe needed to find an other method of getting rid of stale traffic.

The second method involved giving every overlay node another globallyunique id (GID), apart from its overlay address, and introducing a directlyindexed lookup table with connection status. Then we modified the NS2routing function to compare next hop IP from the routing logic to the enddestination IP of the packet, and if they are equal it makes a status lookupto see if the destination overlay node is active. If it is not active the packetis simply dropped after it has been logged as stale traffic, and will thereforenot stress the last hop link of a new node.

2.4.2 Packet handler

The packet handler at a Bamboo-NS2 node consists of a list of known neigh-bors. Bamboo implements reliable transfer on top of UDP, using acknowl-edgments which are also used for RTT measurements. If traffic is not flowingbetween nodes, periodic probes are sent to keep the estimated RTT accurate.In Bamboo-NS2 we use the NS-2 class agent, which we connect betweennodes. Agents are closest matched by UDP sockets in Bamboo. To keepthe memory usage low, we connect agents dynamically when needed. Weencountered problems when we tried to free memory after the agents werenot needed anymore. A workaround was to implement an agent pool, whichwe could request agents from in order to reuse them. An agent pair is onlyused to send data one way, because there where implementation benefits fromhaving all traffic to a node go through one agent. We call the sender-sideagents bamboo send agent, because they are of a different class compared tothe receiving side type described in 2.4.4.

We did not use cumulative acknowledgments since we did not want tokeep state at the receiver for every node that communicates with us. We dohowever need to keep a bamboo send agent for each node we communicate

40

with, so the benefit of not using accumulative acknowledgments is limited.In a real deployment, the approach would be more beneficial.

2.4.3 Router

The Bamboo-NS2 router consists of three modules; The routing table, theleafset, and the routing logic. The routing table consists of information aboutnodes spread over the key space, as well as functions to maintain and lookupnode information. When we use the term node information, we refer to astructure which apart from a key value also consists of information of thenetwork connection point of the node.

The leafset consists of ordered node information about the numericallyclosest nodes in key space which are the white nodes in figure 2.1, and func-tions to insert and remove nodes from the list. As previously mentioned, therouting table works like in Pastry. The routing table and leafset are usedby the routing logic to lookup the next hop node when a key is looked up.When a routing request of a key is made to the routing logic, it first checkswhether that key falls within the leafset. If the key is within the leafset, thenumerically closest node is found, and the nodes information is returned asthe next hop. If the looked up key is not within the leafset, a request to therouting table is made, which returns the closest node outside the leafset. Ifno such node exists, the next hop node is the numerically closest node of thetwo leafset nodes that are furthers away, and then the information about theclosest node is return by the routing logic.

2.4.4 Agent

The Bamboo-NS2 Agent is both the listening agent in NS-2 as well as theinterface to the TCL scripts used to run simulations. It is the connectiondetails for the listening agent which is spread through the network for othernodes to connect to.

From the TCL script that defines the simulation, the behavior of theBamboo-NS2 node can be controlled. You can set the word and key length,make PUTs and GETS, connect and disconnect etc. It is in the listeningagents recv() function that all incoming traffic to a node enters. If a newpacket is an acknowledgment, the packet handler is called to remove theacknowledged packet from its buffer, as well as to calculate a RTT estimate.If the packet is not an acknowledgment the packet handler acknowledges thepacket and checks whether it is a new packet or not. If it is a old packet ora PING the only action taken is the acknowledgment. If it is new packet, itis sent to the router to calculate the next hop and generate a new packet to

41

send. If the next hop returned by the router is not null and not the nodeitself, the agent sends the new packet to the next hop node with the help ofthe packet handler module.

When a Bamboo-NS2 node is connected to a NS-2 network node, and ithas joined the overlay network using the join command to the agent, PUTsand GETs can be issued to the agent from the TCL script. A PUT takesa key, an id, and the data size as arguments. The key is where the value isstored, the id is instead of an hash of the data, and the size is how big thedata is. No actual data is put into the system but the size field is used to setthe correct size of network packets during simulation, and the id is used todistinguish between different values. The GET command takes the key valuerequested and records the time. If a GET matches multiple values in theDHT only one is returned. This is not how Bamboo behaves; Bamboo wouldreturn values together with a pointer. The pointer can be used to retrievethe remaining values that matches the GET with repetitive GETs.

To support different measurements of the system, two different GET be-haviors are implemented. The first is the one resembling Bamboo with keysstored and cached, as is later described in the section on the data storing.The second is a special GET where you lookup exact nodes in the networkto evaluate the pure routing functionality of the system without the noise ofkey management.

2.4.5 Data storing

The data storing module in our system does not implement all the function-ality present in Bamboo. The synchronization between nodes is initializedby a node when it sends a list of its keys to another node. The receiving nodebuilds a list of the keys in the received message it does not have, and sendsthat list to request those keys. Keys in the systems have a TTL, but thatis a function we do not use during our tests. A good study of the storageproblem is [10].

In Bamboo an improved synchronization method is used. It is based onMerkle trees [8] and it involves building a tree of hash values over the storedkey values. The best case for this method is when the nodes are completelysynchronized, which will result in the need to exchange one hash value todetermine that. According to [10] the worst case of the Merkle tree approachis only O(n), were n is then number of keys. However, there is no evaluationof the time aspect of synchronization.

42

2.4.6 Other differences to Bamboo

Bamboo uses a concept of possibly down nodes. That is nodes that have notresponded to 4 succeeding pings. The set of possibly down nodes are stillperiodically pinged with a greater period and are considered unreachable. Ifa node in the set answers to ping it becomes a known neighbor again. Abig advantage of this is that it can rejoin a partitioned overlay network. Iffor instance the connection between two continents is cut off, two differentoverlay networks will be formed and their knowledge of each other will fadeaway with 4 succeeding pings. With the addition of possibly down nodes,that you keep trying to reach for a long time, the partition of the network canbe healed. We have not implemented support for treating nodes as possiblydown in our implementation, since we have not been interested in studyingthe influence of the intermediate network on the overlay, only to study theinfluence of connection technologies.

Our implementation does not handle multiple PUTs to the same key inthe same way as Bamboo does. However, we believe that for the sake ofevaluating the performance in heterogeneous environments, the benefit fromsuch a complete implementation is limited, compared to the need for it in adeployed system.

2.5 Evaluation

To evaluate the system in heterogeneous environments, we have set up sce-narios in NS-2. In this section we will first describe the simulation setup,then describe our evaluations of the impact of different network variablesand present the results of each evaluation.

2.5.1 Physical network layout

The physical network layout used in our simulations is modeled with thenodes in clusters connected with very high bandwidth links with long delays(figure 2.3). The reason not to use a more advanced topology created bya topology generator is that we wanted to keep the variables influencingthe simulation as static as possible. By using the same delay on the linksbetween the clusters, we have had an easier task to realize their influence onthe total delay on for example lookups. We used very high bandwidth linksbetween clusters so that they would not introduce packet loss, but only delay.Overlay nodes are not connected to the cluster nodes, only to the NS-nodes,with links into the cluster nodes. The node characteristics are set at the lasthop link into the cluster.

43

Figure 2.3: A NS-2 network layout with 3 clusters presented using NAM,where the clusters model continents

Strong link Weak link Link between clustersdownlink 10 Mb/s 384 Kb/s 100 Gb/suplink 10 Mb/s 64 Kb/s 100 Gb/sdelay 5 ms 115 ms 50 ms

Table 2.1: Physical network specifications

2.5.2 Overlay network layout

The overlay is built “offline” in order to have a fixed, well known startingstate. During the building of the network every node has knowledge aboutevery other node. This will create a network where a node has as many nodesas possible in its routing table. The routing table is however not optimizedfor proximity since RTT measurements are not done before the simulationstarts. When the simulation starts, a node pings all the nodes it has in itsrouting table and leafset. This causes an initial burst of traffic that needs tobe taken into account. We decided to not collect data until the system wasstable.

44

0 20 40 60 80 100 120 140 160 1800

0.5

1

1.5

2

2.5

Look

up d

elay

[s],

MA

win

dow

200

sam

ples

time [s]

500 nodes, no churn

0 %30 %50 %

Figure 2.4: Smoothed lookup delay over the time of simulations for differentpercentages of weak nodes

2.5.3 Stabilization time

The fact that we build the network offline indicates that we start the sim-ulation in an unrealistic state. The main factor of initial instability is thatnodes need to ping their neighbors in order to calculate RTTs. To study howlong it takes for the system to stabilize we periodically performed GETs ona stable system and then plotted a moving average of the lookup times. Thestabilization time is of course dependent on management traffic settings, aswell as overlay network behavior, but we decided to use the settings from [14]since they were tweaked for a system under churn(table 2.2). From figure2.4 we decided that that the initial 80 seconds of the simulation should beconsidered start up time. We also looked at the simulation runs with churnand we concluded that 80 seconds still seemed to catch the initial turbulence(figure 2.5).

2.5.4 Measurements

Making measurements of a DHT is not as straightforward as it might firstseem. The first problem is that the measurement traffic influences the sys-tems performance by adding extra load. It is on the other hand not a realisticscenario to have a DHT without lookups that influence performance. Thereare two directions we could have taken with the lookups. One way is to try tomodel store and lookup traffic realistically, for instance by using a stochas-

45

0 50 100 150 2000

0.5

1

1.5

2

2.5

Look

up d

elay

[s],

MA

win

dow

200

time [s]

500 nodes with churn

0 %

30 %

50 %

Figure 2.5: Smoothed lookup delay over the time of simulations for differentpercentages of weak nodes with churn

tic process that causes bursts in network utilization. However, the burstswould make analysis of lookup delays harder as it would be hard to comparetwo different samples in time. It would be hard because of the difference inmeasurement environment. We choose to use a periodic probing scheme tosimplify the analysis of the data.

In the tests performed on Bamboo in [11, 10], a majority procedure wasused to decide if a lookup was successful or not. 10 nodes requested the samekey, and if they received different answers the minority was considered to bewrong. Since we have global knowledge in simulation we have used singlelookups. Another problem with deciding on success or failure is whether touse a timeout. If a timeout is used you will remove information about howlookup times are distributed and move it to the failure statistics. With a verylong timeout you will get a high success ratio but a higher mean lookup time.In [12] 60 minutes is used, but we have set a timeout of 60 seconds, sincewe do not believe that a lookup that exceeds a minute can be considered asuccess.

The next decision to make about the lookup is whether you should lookupnodes or keys. If you make lookups aiming at nodes, you do not need tointroduce the extra complexity of a data storing system. On the other handyou need to make sure that the requested node is available for lookups duringthe right time. If you do not, you might decide that a lookup has failed whenthere is no way of success. Having to take transit lookups into account when

46

technical report OpenDHTNeighbor ping period 4 20Leafset maintenance 5 10Local routing table maintenance 5 10Global routing table maintenance 10 20Data storing maintenance 10 1

Table 2.2: Management traffic periods in seconds

simulating churn complicates matters. Therefore we use the data storing toallow us to simulate churn more freely and we make lookups for keys ratherthan nodes.

2.5.5 Simulation specifics

Simulations were made with management traffic according to [11], wherechurn was targeted, as well as with settings matching the ones used in thedeployed DHT service OpenDHT [12]. During simulation, 10 of the strongnodes were used as bootstrap nodes. The first scenario used is that nodes aredistributed over 3 clusters as seen in figure 2.3. The links between the clustersare modeled as having extreme high bandwidth but with a intercontinentaldelay. The nodes are connected to one of the clusters with a link that either isa 10Mb/s, low delay link (strong node) or a link with specifications accordingto measurements made of 3G connectivity (weak node). The weak nodes hasa down link bandwidth of 384 Kb/s, an uplink bandwidth of 64 Kb/s and alink delay of 110 ms. Weak nodes are uniformly distributed over the network.

With the choice of NS-2, we sacrificed the possibility to study large net-works (more than approximately 500 nodes), but it does allow us to simulatelink bandwidth and link queue drops.

2.6 Network variables

There are multiple variables that influence the characteristics of the network.In a dynamic overlay, these variables will change over time, but in orderto study how they influence network performance we have kept them fixedduring the course of one simulation. We will present the simulation setupand result for each network variable.

47

100 200 300 400 5000

0.2

0.4

0.6

0.8

1

1.2

Nodes

Del

ay [s

]

00.3

(a) Lookup delay

100 200 300 400 5000

0.5

1

1.5

2

2.5

3

nodes

Mea

n lo

okup

pat

h le

ngth

00.3

(b) Mean lookup path length

100 200 300 400 5000.95

0.96

0.97

0.98

0.99

1

1.01

nodes

succ

ess

ratio

00.3

(c) Success ratio

Figure 2.6: System performance as a function of network size, with 0 percentand 30 percent weak nodes

2.6.1 Size

The network size is the number of participating nodes at a measured time.How the size of a DHT impacts performance is evaluated previously, bothin simulation and on testbeds using emulation[13]. We only study how sizeinfluences the network up to 500 nodes. The reason for varying the size in ourinitial simulations is to justify our decision to use a fixed network size of 500nodes when studying bandwidth usage. The lookup path length complexityof O(log(n)) ensures good scalability properties. The results from the sim-ulations are presented in figure 2.6 where we use lookup delay, lookup pathlength, and lookup success ratio as measures of the systems performance.

From figure 2.6(a) we can conclude that the added weak nodes, and theresulting churn, affects the lookup times much more than the size of thenetwork. The lookup delay for networks with only strong nodes and nochurn is only marginally affected by size, which might seem non-intuitivewhen figure 2.6(b) shows a increase in lookup path length. We believe it tobe caused by the routing table being optimized for communication latencyin combination with how we model the core network. When communicationwithin a cluster is very cheap compared to between clusters, and when therouting tables are optimized for network proximity, an extra overlay hopmight not increase the total lookup delay significantly. For instance in thesimulation with 500 nodes, where the nodes are randomly distributed amongthe three clusters, every node should have more than three candidates foreach top level routing table entry. With three candidates per top level entry,on average one of them should be in the same cluster and thus chosen whenthe routing table is optimized.

When weak nodes are introduced a small increase in lookup delay canbe seen (figure 2.6(a)) but when the size of the network reaches 300 nodesit levels out. We believe it to be caused by the same mechanism as in thecase of a static network. The information of the weak nodes does not spread

48

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

Ratio weak nodes

Look

up d

elay

[s]

(a) Lookup delay

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

Ratio weak nodes

Mea

n lo

okup

pat

h le

ngth

(b) Mean lookup path length

0 0.2 0.4 0.6 0.8 10.95

0.96

0.97

0.98

0.99

1

Ratio weak nodes

Suc

cess

rat

io

(c) Success ratio

Figure 2.7: Influence of heterogeneity on system performance in a 500 nodesnetwork

100 200 300 400 5000

0.05

0.1

0.15

0.2

0.25

Network size

Per

cent

age

stal

e tr

affic

(a) Stale traffic vs. size

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25

Ratio weak nodes

Per

cent

age

stal

e tr

affic

(b) Stale traffic vs. ratio of weak nodes

Figure 2.8: How percentage of stale traffic depends on size and ratio of weaknodes

through the network fast enough to make a big impact, and even when theinformation reaches other nodes it is unlikely that a weak node is the bestcandidate in a routing table.

In figure 2.6(b) we can see that having weak nodes in the network increasesthe mean lookup path length. Since the latency and bandwidth of linksshould not influence lookup path length, the difference is probably caused bythe introduction of churn in the network. Churn causes routing tables to benon optimal, which should cause increased lookup path lengths.

Finally we we can see from figure 2.6(c) that the success ratio of lookupsis constant 100 % for a static network which is what should be expected ina noncongested network. We use a low request rate so the network is notcongested during these experiments.

49

0 100 200 300 400 500−1.5

−1

−0.5

0

0.5

1

1.5x 10

4

Byt

es /

s

nodes

(a) Traffic, 0 % weak nodes

0 200 400 600−4

−2

0

2

4x 104

Byt

es /

s

nodes

(b) Traffic, 30 % weak nodes

0 200 400 600

−2

−1

0

1

2

x 104

Byt

es /

s

nodes

(c) Traffic, 50 % weak nodes

0 100 200 300 400 500

−200

−100

0

100

200

s

(d) Uptime, 0 % weak nodes

0 200 400 600

−200

−100

0

100

200

s

(e) Uptime, 30 % weaknodes

0 200 400 600

−200

−100

0

100

200

s

(f) Uptime, 50 % weak nodes

0 100 200 300 400 500−3

−2

−1

0

1

2

3

(g) Send / received , 0 %weak nodes

0 200 400 600−3

−2

−1

0

1

2

3

(h) Send / received , 30 %weak nodes

0 200 400 600−3

−2

−1

0

1

2

3

(i) Send / received , 50 %weak nodes

Figure 2.9: Traffic distribution, uptime and send / received ratio amongnodes for various ratios of weak nodes

2.6.2 Node capacities

A common assumption, both in simulation and in real world tests, is thatall nodes are created equal. That assumption does not follow the trend ofnetworks where the heterogeneity increases. We choose to make a simplifica-tion of network heterogeneity by introducing what we call weak and strongnodes. A weak node is modeled from a UMTS cellphone, as such phones areprobably the first mobile devices that it makes sense to have as membersin an overlay. The strong nodes are modeled from desktop computers withbroadband connections. We use the term ratio to describe how many percentof the nodes that are weak.

In these simulations we keep network size fixed and vary the ratio of weaknodes. More weak nodes does not only lead to more weak links but also

50

to a more dynamic network. A more dynamic network increases the risk oflookups being lost in transit. Failed lookups have two different reasons. Firsta lookup can be lost if a node leaves the network while the lookup is routedthrough it. Second a lookup fails if it reaches the destination node whenthat node has recently joined and the destination nodes data storage has notyet been synchronized. Bamboo has caching optimizations but we have notimplemented them because we believe that they hide the true performancein an experimental evaluation. Nevertheless they make complete sense in adeployed system.

As we can see in figure 2.7(c) all lookups succeed when no weak nodes arepresent in the network. This is expected because it means that the networkis static. It seems that the success rate has a close to linear relation to theratio of weak nodes which is promising.

Regarding lookup delays (figure 2.7(a)) there is a weak tendency of non-linearity in the results, which we have also seen in other simulations. Webelieve that with a small amount of weak nodes in the network, the weaknodes are unlikely to end up in routing tables, but as the ratio increases moreweak nodes starts to forward traffic.

In figure 2.7(c) we can see that even for 50 % weak nodes the succes ratiois well over 95 % which seems quite good, considered the introduced churn.

2.6.3 Churn rate

A system that is distributed over the Internet will experience churn. Thechurn can be caused by many different things like network problems, nodecrashes or nodes that join and leave in a controlled fashion. We only simulatesingle nodes going up and down.

Whenever a node leaves the network it leaves silently, meaning that allstate is left in the network. Only having silent leaves is the worst case sce-nario, but it is also how Bamboo handles leaves. When nodes leaves silentlythe node information related to those nodes will continue to spread through-out the network for some time. It will however fade out when nodes thatreceive the information unsuccessfully tries to ping the dead node. The pingtraffic to dead nodes, as well as neighbors that try to perform maintenancewith dead nodes, cause what we call stale traffic within the network. Wedefine stale traffic as traffic that is destined for a node that is no longer amember of the network. In figure 2.8(a) the percentage of stale traffic is plot-ted against the size of the network. The figure shows that the percentage ofstale traffic is not increasing with the size of the network. There might havebeen an increase in stale traffic if nodes did not try to ping neighbors beforeadding them to leafsets and routing tables, but since they do, information

51

about down nodes are not redistributed through the network.We have simulated networks with churn and different ratios of weak nodes.

The size of the network in the simulations presented here is at most 500nodes, which is close to the upper limit of what it is feasible to simulatewith the methods and tools we have chosen. Even if larger networks wouldbe interesting to study we believe that 500 nodes is enough to study theperformance of the system, since an initial deployment of a DHT might forinstance be on PlanetLab with some 200 nodes.

Weak nodes come and go in the system while strong nodes are static. Howlong a weak node is connected is determined by a Poisson process. The inter-arrival times model a mean online period of three minutes. Three minutesis a very short period of time but as we model cell phones used by mobileusers, we believe that it is unlikely with many weak nodes that are online forextended time periods.

In figure 2.9 we present a visualization of the results gathered duringthree simulation rounds with different ratios of weak nodes. Each columnof plots presents information about one run. All negative values are weaknodes and all positive values are strong nodes. The top plot shows the meanbandwidth utilization of all nodes in the simulation. The nodes are sorted onthe utilization, and a completely even distribution of used bandwidth wouldlook like a horizontal line. By studying the columns, some relations can beseen. When all nodes are strong the distribution is almost even but withweak nodes that introduce churn the distribution becomes less even. We cansee that there are two major clusters of weak nodes at the extremes of utilizedbandwidth. From studying the uptime plot we can see that the nodes that usethe least bandwidth are nodes that have an uptime less than the maximumuptime. This means that those nodes have joined during the simulationand we believe that the reason for them to have a smaller load is that theinformation about them has not yet spread through the system, which couldbe very beneficial for a heterogeneous system. Weak nodes are typicallyconnected shorter periods and could then get a smaller workload. The otherextreme are the weak nodes that have the highest bandwidth utilization andthe uptime plot gives us the information that they have typically been onlinefor a very short period of time. From the bottom plot we can make theobservation that the ratio between received and sent bytes are very closeto zero which indicates that these nodes have just gone online, sent initialprobes but that they have not yet received much response.

Because of the nature of a Poisson process some very short uptimes willoccur, but extremely short uptime of nodes is not very realistic. A node mightjoin the network in order to make a request and then leave, but we believeit to be unlikely that a node will take the cost of sending probes without

52

0 100 200 300 400 500−1.5

−1

−0.5

0

0.5

1

1.5x 10

4

Byt

es /

s

nodes

(a) Traffic, 0 % weak nodes

0 200 400 600−1.5

−1

−0.5

0

0.5

1

1.5x 10

4

Byt

es /

s

nodes

(b) Traffic, 30 % weak nodes

0 200 400 600

−1

−0.5

0

0.5

1

x 104

Byt

es /

s

nodes

(c) Traffic, 50 % weak nodes

0 100 200 300 400 500

−200

−100

0

100

200

s

(d) Uptime, 0 % weak nodes

0 200 400 600

−200

−100

0

100

200

s

(e) Uptime, 30 % weaknodes

0 200 400 600

−200

−100

0

100

200

s

(f) Uptime, 50 % weak nodes

0 100 200 300 400 500−3

−2

−1

0

1

2

3

(g) Send / received , 0 %weak nodes

0 200 400 600−3

−2

−1

0

1

2

3

(h) Send / received , 30 %weak nodes

0 200 400 600−3

−2

−1

0

1

2

3

(i) Send / received , 50 %weak nodes

Figure 2.10: Traffic distribution, uptime and send / received ratio amongnodes for various ratios of weak nodes

getting the benefit of the response. To minimize the effect of the very shortlived nodes in our analysis we added the condition that a node must have anuptime greater than 5 seconds to be presented, and then plotted the samedata as in figure 2.9 in figure 2.10.

In figure 2.10(e) we still see a cluster of nodes that does not seem to beinfluenced by the extra condition on uptime.

2.7 Discussion

When we realized that a DHT’s management traffic could pose a problem inheterogeneous environments [2], we realized that to evaluate how big theproblem was , we needed to be able to simulate bandwidth. The most

53

common approach when simulating DHTs is to only simulate a static de-lay between nodes. Such a network model will not introduce packet drops,reordering or delay variations due to congestion in the network. To be ableto model a more dynamic network, we needed a more expressive simulator.Our choice of NS-2 was based on the fact that it is a de facto standard withinthe community, and also that it has good supporting tools like topology gen-erators etc.

The choice to simulate a DHT in NS-2 showed time consuming. Imple-menting the DHT functionality from scratch was needed to be able to makesimplifications, and simplifications where needed to be able to study some-what large networks. As memory is a constraint in simulations, we neededto implement some extra support for dynamic allocation of simulation re-sources. Our experience is that NS-2 is not a tool that fits simulations oflarge, dynamic, overlay networks well. For example, we have not been ableto simulate more than 60 minutes of real time when simulating a 200 nodesnetwork with 30 % weak nodes on a 2 GB Ram machine. Fortunately, thattime has been enough for the networks to stabilize so we have been able toget data from stable networks. The way we set up the networks offline 2.5.2also shortened stabilization times compared to real experiments.

In conclusion we think that more bandwidth studies of DHTs are neededas they are becoming more common as building blocks in distributed sys-tems. The simulation approach can be valuable but a simulator which is lessdetailed compared to NS-2 and more complex than the delay only simulatorswould be a good tool for such analysis.

54

Bibliography

[1] Hari Balakrishnan, Scott Shenker, and Michael Walfish. Peering Peer-to-Peer Providers. In 4th International Workshop on Peer-to-Peer Systems(IPTPS ’05), Ithaca, NY, February 2005.


[3] Yatin Chawathe, Sriram Ramabhadran, Sylvia Ratnasamy, AnthonyLaMarca, Scott Shenker, and Joseph Hellerstein. A case study in build-ing layered dht applications. In SIGCOMM ’05: Proceedings of the 2005conference on Applications, technologies, architectures, and protocols forcomputer communications, pages 97–108, New York, NY, USA, 2005.



[6] The Bamboo distributed hash table. Online: http://bamboo-dht.org/,2003.

[7] Eddie Kohler, Mark Handley, and Sally Floyd. Designing dccp: con-gestion control without reliability. SIGCOMM Comput. Commun. Rev.,36(4):27–38, 2006.

[8] Ralph C. Merkle. A digital signature based on a conventional encryptionfunction. In Carl Pomerance, editor, Proceedings of the annual interna-tional cryptology conference(CRYPTO), pages 369–378. Springer-Verlag,1988.

55


[10] Sean Rhea. OpenDHT: A Public DHT Service. PhD thesis, Universityof California, Berkeley, August 2005.





[15] Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, and SoneshSurana. Internet indirection infrastructure, 2002.


[17] Sven Westergren. Notenet - range queries in a DHT. Master’s thesis,Uppsala University, 2007.

[18] Ben Y. Zhao, John D. Kubiatowicz, and Anthony D. Joseph. Tapestry:An infrastructure for fault-tolerant wide-area location and routing.Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.

56

Chapter 3

Paper B: Vendetta - A Tool forFlexible Monitoring andManagement of DistributedTestbeds

57

Abstract

Writing a powerful tool for monitoring and management of a testbedcan have a positive effect when doing research on the testbed. Despite this,many testbeds use primitive scripts for data collection, code updates andother basic tasks.

We introduce Vendetta, a flexible and powerful platform for monitoringand management of distributed testbeds. It is designed to be relatively easyto adapt to different testbeds by having a modular design, being written inJava and defining much of the testbed-specific behavior in two configurationfiles.

The novelty in comparison with similar tools is the integration of a GUIsupporting 3D graphics, flexible monitoring and management into one singletool. We will present the general design of Vendetta and then illustrate howit has been used for monitoring and management of an experimental DHTdeployment running on Planetlab. Experiences from this combination showsthat usage of a tool like Vendetta simplifies testbed management and makesit easier to discover and analyze different phenomena.

3.1 Introduction

The last decade has shown an increasing need of distributed testbeds to sup-port networking research. One reason for this is that experimental resultsfrom an actual deployment - if only in a limited testbed - can reveal phe-nomena that never will appear in a simulator. An important factor is theincreased availability of platforms on which distributed testbeds can be de-ployed. Emulab[6] and Planet-lab[2] are two of the most well-used examplesfor both experiments and deployment of new services.

When creating a distributed testbed, focus tend to be on actual function-ality of the software to be deployed, with less efforts to provide a good userinterface for monitoring and management features. Not uncommonly, codedistribution, experiment synchronization, data gathering and similar tasksare done using customized scripts specific to the testbed. As functional asthese script-based solutions can be, they are not always very user-friendly.Complete data collection for post-mortem analysis is also common in dis-tributed testbeds. This approach typically provide researchers with largeamounts of data on which data mining is done using third-party software.

We introduce Vendetta, which provides a flexible platform for develop-ing user-friendly monitoring and management functionality in distributedtestbeds. It is primarily aimed at medium-sized testbeds used in smallerprojects with limited resources and incentives to develop testbed-specificsoftware with similar functionality. Vendetta is designed to be flexible yetpowerful by allowing users to create testbed-specific modules that work likeplugins. The testbed-specific parts of Vendetta consists of two configurationfiles plus Java code to collect and parse data. It is also possible to construct agraphical canvas to illustrate testbed-specific details in a more intuitive way.The software includes highly customizable functionality for basic monitoringand management tasks that can be used to produce a powerful testbed-specific monitoring/management solution at a relatively low programmingeffort.

This paper is organized as follows. In the following section, we present thedesign and features of Vendetta in more detail. After that, we will present acase where Vendetta is used for monitoring and management of a distributedhash table (DHT) testbed running on PlanetLab. After a discussion aboutthe features and limitations with a platform like Vendetta, the paper is con-cluded with ongoing development efforts and planned future work.

59

Figure 3.1: Vendetta layout for a DHT testbed

3.2 Vendetta

Vendetta is not a testbed in itself. Instead, it is a tool that can be used tomonitor and manage an existing testbed. In order to use Vendetta togetherwith a distributed testbed, two requirements must be met. First, the testbedshould consist of several nodes that can be accessed via the network. Thisshould be put in contrast to testbeds where data is collected post-mortemwhich means that nodes must not be available at all times. Second, it mustbe possible to run a Java client on each node that needs to be monitoredand/or managed 1.

The Vendetta software has two main components - the monitor and thenode client. The monitor is an interactive program that presents a graphicaluser interface to the user, running on the users machine, while the node clientis a small program that runs on all nodes in the testbed. Communication be-tween the monitor and node clients can use either TCP or UDP as transportprotocol, depending on how time-critical and error-tolerant the informationto be sent is.

1We have a C implementation of the client but it is not yet tested in real experiments

60

The node client

The node client runs at all nodes in the distributed testbed and can bestarted from the monitor with a SSH connection.It acts as a middle-manbetween the testbed components running at the node and the monitor. Theapplications are started as processes and the process output is parsed by thenode client. A benefit of this is that applications can be evaluated even ifthey are closed source, but in that case, the resolution of the monitoringis limited to the amount of log information the application outputs. Thenode client collects log data from the testbed and stores it in a local log.Requested log data are sent to the monitor for presentation. It is possible forthe monitor to instruct the node client to filter its log data before sending itto the monitor. However, the node client still stores all log events locally sothat it can be used to do post-mortem analysis of the networks behavior. Tosupport management tasks, the node client can also receive messages fromthe monitor, for example messages telling it to start or stop the monitoredapplication.

It is important to note that the node client does not make any assump-tions about the testbed that it serves. Instead, it act as a kind of message-passing middle-man between the nodes in the testbed and the monitor. Abenefit of having a middle-man software is that if the monitored applicationdies during an experiment, the node client can restart it and notify the themonitor. How data is collected, filtered and communicated is defined in aconfiguration file read by the node client when it starts up. By specifyinganother configuration file, the node client can be reconfigured to interactwith another type of testbed. In section 3.3, we present an example of whata configuration file for the node client can look like in a specific testbed.

The monitor

The monitor is the program that interacts with the testbed and presentinformation to the user. The GUI is divided into three areas. First is thenode list that is read from a node file at startup. The node list containsall nodes that is part of the testbed that you want to monitor. The secondarea is for commands that you can send to one or many nodes. Typicallythe commands are implemented as buttons that might have a form for extraarguments. However some commands are chosen with drop down menus,which is very effective when you know that the possible options are limited.The third area is the canvas area. This area uses most of the screen tovisualize the events received from node clients. The number of, and which,graphical canvases displayed in the canvas area is defined in the monitor

61

configuration file. As canvases are dynamically loaded at runtime, it is notnecessary to recompile Vendetta to support a new canvas.

When the monitor is initiated, it waits for node clients to report . Asnodes clients are heard from, their status is updated in the GUI and theuser can start to interact with them. If the node client software is notrunning on the remote nodes, the user can use the GUI to initiate the clientsoftware. Nodes report to the monitor by sending alive messages. The alivemessages, or rather the lack of them, are also used to determine if a node isunavailable. Because the monitor waits for alive messages and node clientsare not expecting the monitor to respond, the monitor is not a single pointof failure. If the monitor would crash or be closed down, e.g. during longertests, the user only needs to restart the monitor to regain control of thetestbed.

To reduce the amount of data sent over the network, the monitor caninstruct node clients to use a specific filter. It can also request all loggeddata from one or more node clients to get a more complete picture. Theuser can decide whether a certain type of log event should be sent usingUDP or TCP. Typically periodic node state would be sent using UDP whilstimportant but rare events are sent using TCP. The node client also buffers logevents before sending them to reduce the number of datagrams the monitorneed to handle. The buffer is either sent when full, or when a configurableamount of time has passed.

Events can be monitored in several different ways. Live monitoring givesan overview of what is currently going on in the testbed. As all receiveddata is saved, it is also possible to pause the live monitoring, rewind andreplay interesting sequences. When not monitoring events live, the visual-ization speed can be changed to either fast forward past uninteresting eventsequences, or slowly replay the interesting ones.

There are two ways to send commands from the monitor to the nodeclient, either as a pre-defined command that is parsed by a Java class onthe receiving node, or as SSH commands. The Java commands are in cleartext and not tested for origin, therefor we do not accept anything other thanrequests of predefined data units. A button can trigger shell commands overSSH if that flexibility is needed. If a user needs complete control over aremote node, she can start a terminal over SSH to the remote node client byclicking on the selected node and request a terminal. This is substantiallyeasier than logging in on the remote node in a more conventional way.

A playout buffer is used when monitoring live. The length of the play-out buffer is configurable, rather than adaptive. Having an adaptive playoutbuffer might improve the live experience but has not been prioritized as webelieve the replay function is what primarily will be used for analysis pur-

62

Figure 3.2: Canvas showing location of nodes and selected testbed traffic

poses.Beside the monitoring features, the monitor can also perform mainte-nance tasks. Examples include starting up and shutting down testbed nodes,executing remote commands at several nodes, sending command scripts tochoreograph the behavior of nodes, and more. Most management tasks ap-pear as clickable command buttons.

A key principle in the design of Vendetta is flexibility. It is possible toconfigure both the monitor and node clients to fit into almost any distributedtestbed where the participating nodes can run Java programs. In fact, theentire behavior of Vendetta is defined in two configuration files - one for themonitor and one for the node clients. At the monitor, the configuration filedefines what elements to display at the screen, what actions to associate withGUI elements such as buttons, and what nodes to communicate with. In thenode client, the configuration file defines how to collect and parse data to besent to the monitor. In section 3.3, we will present more details on how wehave used Vendetta to interact with a experimental DHT deployment andwhat the corresponding configuration files might look like. This will give anindication of the flexibility offered.

3.2.1 Vendetta and PlanetLab

Not all behavior can be defined in configuration files. For some purposes,it may be needed to write a Java class that performs some specialized task.One example of this is the PlanetLab module included in Vendetta.

PlanetLab[2] is a distributed testbed consisting of several hundred nodes

63

(a) Query path (b) Routing table and key distribu-tion

Figure 3.3: Application specific canvas illustrating a DHT ring

around the world, available through ssh remote login. Running experimentson the PlanetLab include keeping track of what nodes are active, deployingcompiled code to selected nodes, coordinating node actions, collecting resultsand more. Throughout the years, users of the PlanetLab have developed andredeveloped scripts and tools for these tasks.

Motivated with the popularity of the PlanetLab, Vendetta includes mod-ules for monitoring and management of testbeds based on PlanetLab. Forvisualization purposes, a canvas showing the geographic locations of availablenodes can be used to show node events. By associating an event at a nodewith a graphical representation, it becomes possible to create more intuitiveways to parse data. As an example illustrated in figure 3.2, query paths ina distributed system can be visualized on the globe, making it obvious wheninefficiencies occur due to long communication paths.

For management purposes, Vendetta includes a module that ease codedeployment on the Planet-lab. Nodes to monitor or manage are selectedfrom a list or by clicking on the canvas. By clicking on a button in theGUI, a custom command can be executed at selected nodes. Filters to use,graphical canvas associations, commands to execute and events to monitorare all specified in the configuration file that defines the testbed behavior.

3.3 Case: A DHT testbed

In this section, we will present a case where Vendetta is used together witha experimental DHT deployment running on the PlanetLab. We will presentthe testbed-specific components needed, configuration scripts and also discuss

64

what can be achieved when using Vendetta in this way. The purpose of thissection is not to present the testbed in itself, but to give an indication of howVendetta can be configured to interact with it.

The testbed in our case is used to study a DHT-based overlay networkwith support for range queries. As we have implemented range queries inthe DHT routing mechanisms rather than in an application on top of it, it isinteresting to be able to study if actual DHT performance is affected whenadding support for range queries. The software used in the DHT testbed is amodified version of Bamboo[4], which is used by OpenDHT[5] that also runson the PlanetLab. In the process of evaluating our DHT testbed, we need tobe able to do the following:

• Distribute a new version of the DHT code to all nodes in the testbed.This feature is frequently used during the programming phase whennew versions of the code needs to be deployed to all nodes before testingit.

• Dynamically add and remove nodes to/from the DHT. This feature isused during the experiments to mimic churn in the system.

• Study the routing tables and leaf sets at a given node at a specific time.As convergence times of routing tables and leaf sets are interestingmetrics in churn-prone systems, we want to be able to monitor changesin the routing tables and leaf sets.

• Study query paths at different times. This is used to evaluate the per-formance in a churn-prone system where query paths might change onrelatively short time scales.

The first two tasks are standard management tasks that are addressedby adding command buttons to the GUI. When a button is clicked, thecommand associated with it is forwarded to selected client nodes where itis executed. The labels of the command buttons as well as the associatedcommands are defined in the monitor configuration file. This means, thatadding a new command button is usually a matter of adding 4-5 lines of textto the configuration file without having to recompile anything.

The latter two tasks also requires commands to be sent to request currentrouting tables and track queries respectively. However, this information issomething that is much more intuitive to display in a graphical canvas ratherthan letting it appear as log messages. For this reason, we have chosen toimplement a special canvas that illustrate some testbed-specific events.

65

<BGCOLOR>

255

</BGCOLOR>

<CANVAS>

vendetta.overlays.canvases.DHTRingCanvas

vendetta.overlays.GlobeCanvas

</CANVAS>

<TABLECOLUMN>

GUID

Hostname

Bamboo GUID

Node ID

Num Keys

</TABLECOLUMN>

Figure 3.4: Graphical user interface configuration

3.3.1 DHT canvas

Our DHT canvas consists of a DHT ring, which is the common way to rep-resent a one-dimensional address space for DHT networks. On this ring, allnodes in the testbed are represented as points. Figure 3.3(a) shows the traceof a request that is sent into the testbed, while figure 3.3(b) illustrates therouting table for a node on the left side of the ring. The bars in figure 3.3(b)shows the key distribution among nodes in the DHT ring. In the figure, it isclear that keys are not very evenly distributed among participating nodes atthe observed time - something that would have been hard to catch by simplystudying log files. By continuing to watch the canvas over time, it will showthat keys are eventually spread out over a larger node set. The canvas makesit easy to get a first understanding about the convergence times for the keydistribution. All these presentation modes are specific to our testbed andincluded in the DHT canvas. The globe canvas, as seen in figure 3.2, showsthe physical location of the nodes in the testbed, as well as chosen networkevents. A nice feature of the globe canvas is that the user, by clicking, canchoose nodes according to their location. It is also very convenient to high-light a node in the overlay using the DHT ring canvas, and directly get thephysical location visualized.

3.3.2 Monitor configuration

When Vendetta is initiated, it reads its configuration file and present the GUIshown in figure 3.1. The part of the configuration file that decides the look ofthe monitor is presented in figure 3.4. To be able to set the background colorhas been very convenient when we have changed between using monitors andprojectors to present the visualization.

The <CANVAS> tag is used to specify what canvases to use in the GUI - the

66

DHT ring canvas and the globe canvas that have already been introduced.The canvas area is shared equally between the canvases so removing the DHTcanvas from the configuration file would leave all of the top of the screen tothe globe.

The bottom left part of the GUI includes a node table, with the columnsdefined in the <TABLECOLUMN> tag in the configuration file, as well as a statusmessage window.

The <NODECMD>, <OVERLAYCMD> and <LOGFILTER> tags are used to definethe management commands available in the GUI. When the configuration isparsed by Vendetta, it creates the command panels shown in figures 3.8,3.6,and 3.10.

Node client commands

The configuration of the node commands needed in our case study is pre-sented in figure 3.5. We have defined node commands to be commands thatwould be needed for most testbeds, but which are not directly affecting thetested application. Node commands can for example start or stop the remotenode client, update the software running on the remote nodes, or flush theremote message queue. From the configuration file we can see that updatingthe node client software is done using a local script. The script simply con-tains a rsync command. In the configuration of commands the user can use<NODE HOSTNAME> which will be replaced with the hostname of the chosennode. If multiple nodes are chosen it will result in a number of parallel calls.A limited amount of parallel calls are allowed, so if too many nodes are calledat once the calls are buffered to not overload the monitor. To start the nodeclient, a remote shell command is done using SSH. When the node client isstarted, TCP can be used to send commands. For example the command toclear the node clients message buffer is using TCP.

Application specific commands

In the next tab, there are the commands that directly affects the tested ap-plication. In our case it is to start and stop the Bamboo overlay on a node, torequest the routing table, or to request the leaf set. The configuration neededto create the buttons are presented in figure 3.7. From the configuration filewe can see that all overlay commands are sent using TCP to the node client.At the node client messages are parsed by the corresponding Java classes.We use commands over TCP rather than SSH because the SSH daemon onthe PlanetLab nodes can have large response times. Response times of up to20 seconds is not uncommon and when we want to control node behavior in

67

<NODECMD>

label=Start VClient

type=EXEC

msg=ssh uu_bamboo@<NODE_HOSTNAME> daemonize.pl pandora.kicks-ass.org:4444

</NODECMD>

<NODECMD>

label=Kill Monitor Node

type=TCP

msg=CTRL_NODE_DOWN_REQ

</NODECMD>

<NODECMD>

label=pkill java

type=EXEC

msg=ssh uu_bamboo@<NODE_HOSTNAME> pkill java

</NODECMD>

<NODECMD>

label=Update VClient

type=EXEC

msg=TIMEOUT=60000 ./scripts/updateclient <NODE_HOSTNAME>

</NODECMD>

<NODECMD>

label=Update Bamboo

type=EXEC

msg=TIMEOUT=600000 ./scripts/updatebamboo <NODE_HOSTNAME>

</NODECMD>

<NODECMD>

label=Run Node Script

type=TCP

msg=CTRL_SCRIPT

</NODECMD>

<NODECMD>

label=Clear Node Queue

type=TCP

msg=CTRL_CLEAR_QUEUE

</NODECMD>

Figure 3.5: Configuration of node client commands

Figure 3.6: Node commands panel

68

<OVERLAYCMD>

label=Start Bamboo

type=TCP

msg=CTRL_NET_UP_REQ args=run-java args=bamboo.lss.DustDevil args=node.cfg

</OVERLAYCMD>

<OVERLAYCMD>

label=Kill Bamboo

type=TCP

msg=CTRL_NET_DOWN_REQ

</OVERLAYCMD>

<OVERLAYCMD>

label=Request RT

type=TCP

msg=CTRL_RT_REQ

</OVERLAYCMD>

<OVERLAYCMD>

label=Request Leafset

type=TCP

msg=CTRL_LS_REQ

</OVERLAYCMD>

Figure 3.7: Application specific commands

Figure 3.8: Overlay commands panel

69

<LOGFILTER>

Put Started:LE_PUT_STARTED

Put Forwarded:LE_PUT_FORWARD

Put Delivered:LE_PUT_DELIVERED

Got Key from Root:LE_GOT_KEY_FROM_ROOT

num Stored Keys:LE_STORED_KEYS

</LOGFILTER>

Figure 3.9: The set of possible logevents Figure 3.10: Log filter panel

detail the extra delay of SSH is problematic.

Event filters

The configuration file also defines the possible log events, shown in figure3.9. The log events can be either sent to the monitor using UDP or TCP, oronly logged locally by the node client. In the corresponding panel, shown infigure 3.10, there are drop down boxes containing the three different log eventactions. When initiated they do not show the current state of the log event,because the default behavior is defined in the node clients configuration file.If the user change the behavior with the drop down menus they will thenshow the current state.

3.3.3 Working with the monitor

Using the monitor is relatively easy. It is possible to select one or moretestbed nodes to work with by clicking on one of the canvases or highlightingthem in the list of available nodes. When this is done, one can use the overlaycommand panel to let the selected nodes go up or down or request routingtable information from them. Using the node commands panel, it is possibleto run arbitrary commands at the selected nodes, push out new code, clearcaches or ask for a specific script to be used. When scripting commands,it is possible to add timing information to ensure that the commands areexecuted at exactly the right time. From the log filter commands panel, it ispossible to specify what information the node clients on the selected nodesshould forward to the monitor.

70

<VENDETTA>

123.123.123.123:1234

</VENDETTA>

<PING_INTERVAL>

15000

</PING_INTERVAL>

<UDP_TIMEOUT>

8000

</UDP_TIMEOUT>

<LOGPARSER>

vclient.overlays.bamboo.BambooParser

</LOGPARSER>

<LOGEVENT>

type=LE_STORED_KEYS

method=stored_keys

regexp=^DataManagerTest.stored:.*

net=none

</LOGEVENT>

<LOGEVENT>

type=LE_GET_REPLY

method=get_reply

regexp=.* INFO bamboo.dht.Dht: upcall for get (range )?req key=.*

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_GET_RECEIVED

method=get_received

regexp=.* INFO bamboo.dht.Dht: got new recur get (range )?resp key=.*

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_GET_ITERATIVE_QUERY

method=get_iterative_query

regexp=.* INFO bamboo.dht.Dht: sending iterative get req key=.* target .*

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_GET_DONE

method=get_done

regexp=.* INFO bamboo.dht.Dht: iterative get req key=.* done$

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_PUT_STARTED

method=put_started

regexp=.* INFO bamboo.dht.Dht: got putreq: .*

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_GOT_KEY_FROM_ROOT

method=got_key_from_root

regexp=.* INFO bamboo.dmgr.DataManager: got key=.* from root .*

net=none

</LOGEVENT>

<LOGEVENT>

type=CTRL_NET_UP

method=joined_overlay

regexp=.* INFO bamboo.router.Router: Joined through gateway .*

net=tcp

</LOGEVENT>

<LOGEVENT>

type=LE_ASSIGNED_OGUID

method=assigned_oguid

regexp=.* INFO bamboo.router.Router: Bamboo node .* has guid .*

net=no

</LOGEVENT>

Figure 3.11: Node client configuration file for DHT testbed

71

3.3.4 Node client configuration

The configuration file for each node in the testbed is shown in figure 3.11. Inour DHT testbed, we have chosen to use the same configuration file for allnode clients. However, it is of course possible to use different configurationfiles at different clients to give them slightly different functionality.

In the beginning of the configuration file, some basic parameters like thelocation of the monitor and timeout settings are defined. The <LOGPARSER>

tag defines the name of a local Java method that is used to collect data fromthe testbed.

The rest of the configuration file defines different log events that the nodeclient should be able to identify and report back to the monitor. These aredefined using standard regular expressions on log messages generated by ourtested application.

3.3.5 Using Vendetta with the DHT testbed

With the DHT canvas and the configuration scripts in place, we can startup Vendetta with the GUI shown in figure 3.1. Using the general commandspanel, we initiate the node client on all nodes in the testbed via a ssh com-mand. With this done, we can choose what nodes to include in our DHTring by first highlighting them and then clicking the start bamboo commandbutton in the overlay commands panel. As nodes join the DHT ring, theywill show up as dots in the DHT canvas.

To study the routing table of a node, we highlight that node either in theDHT ring or in the table. When we click the request RT command button inthe overlay command panel, the canvas will be updated to show the routingtable as illustrated in figure 3.3(b). We can also filter data from a set of nodesby first highlighting them and then specifying a filter from the log filter panel.From the node commands panel, we can choose to add or remove nodes tothe DHT ring, push out new configuration scripts, instruct nodes to restartthe node client and more.

Events that match our specified filter are displayed as they occur in theGUI. When something interesting is discovered, we can pause the live moni-toring, rewind and replay. This feature makes it possible to study observedphenomena more in detail.

3.4 Discussion

In our DHT tests, we started off with traditional script-based tools for basicdata collection and management tasks. Although this worked quite well, it

72

was far from user-friendly. It was also time-consuming to dig through thecollected data to locate what we were interested in. Pretty soon, we realizedthat a visualization tool would be useful. Doing a quick survey, we couldnot find a suitable tool that was easy to adapt to our testbed. We did findtools for visualization [3, 1], and we did find PlanetLab specific scripts tohelp deploy code and collecting data.

But we did not find an integrated solution to support through applicationdevelopment, code deployment, data collection, and data analysis. Hence,we decided to develop the Vendetta tool and to make it as flexible as possibleto support others in the same situation.

After having used Vendetta for months, it is clear that it greatly improvesthe efficiency of our experiments. Being able to easily control selected nodeswhile monitoring them in real-time gives a better idea about what is goingwrong, and where things are broken when they break. In the initial phase ofour work on DHT, being able to visualize the overlay message paths helpedus find non trivial bugs. Adding new commands to the GUI is the matter ofadding a few extra lines to the configuration file and then restarting Vendetta.

Although we have mainly talked about Vendetta in the context of dis-tributed testbeds, it can also be used for other purposes. By running thenode client on the same machine as the monitor, it is possible to visuallymonitor local events.

The main advantage with Vendetta in comparison to similar framework isthe integration of a GUI supporting 3D canvases to visualize testbed eventswith flexible monitoring and management tasks. Being able to rewind, fastforward and replay certain parts of an experiment is useful when trying tounderstand some weird phenomena that was observed. For management pur-poses, the ability to send command scripts to node clients with informationabout what to do when means that we can choreograph node behavior withgreat precision.

However, there are of course limitations to what Vendetta can do. Duringthe work, we have identified several usages for the framework that will requiresome new features. Most of these will however reside in testbed-specific code.

3.5 Conclusion and Future Work

The Vendetta framework is very much work in progress. Albeit the posi-tive experiences from working with it, we have identified features that willimprove its flexibility even more. Examples of such features include (in no

73

particular order):

• The possibility to control connectivity parameters between nodes inthe testbed. Being able to change bandwidth, delay, jitter, error ratesand other properties of the communication path between two nodesintroduce new possibilities when it comes to studying more challengedusage scenarios.

• Combining the logging and command scripting features, i.e., makingit possible to produce command scripts that will reproduce observedphenomena in the logs. This can be useful for reproducing the samesituation over different set of nodes.

• More canvases, e.g., a dynamic topology graph. This could be usefulfor wireless testbeds where the network topology can change frequentlyin the presence of high mobility.

We have developed a C version of the node client to ease deployment atsmaller systems, e.g., a Linux-based basestation like the Linksys WRT54GL,but this code is not yet tested in real experiments. All code will be madeavailable to the research community under an open-source license. The Cversion is planned to be used in a wireless testbed deployment.

We also plan to try Vendetta on Emulab as the specifications of Emulabmakes us confident that it will work smoothly without modifications.

Acknowledgment

We would like to acknowledge Peter Drugge for implementing Vendetta.

74

Bibliography

[1] Planetlab visualizer. http://www.cs.princeton.edu/nsg/planetlab/

visualizer/, December 2006.


[3] Visualizing connection bandwidths and delays in planetlab. http://

people.cs.uchicago.edu/∼dinoj/vis/planetlab/, December 2006.


[5] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Rat-nasamy, Scott Shenker, Ion Stoica, and Harlan Yu. OpenDHT: a publicDHT service and its uses. SIGCOMM Comput. Commun. Rev., 35(4):73–84, 2005.

[6] Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Gu-ruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar.An integrated experimental environment for distributed systems and net-works. In Proc. of the Fifth Symposium on Operating Systems Design andImplementation, pages 255–270, Boston, MA, December 2002.

75

Chapter 4

Paper C: Evaluating a DHT ina heterogeneous environment

77

Abstract

We have implemented a Bamboo-like DHT in NS-2 to evaluate the impact ofheterogeneity to a distributed hash table (DHT). The scenario we are inter-ested in is when users with mobile phones are full members of a DHT. Suchmembers cause challenging connectivity patterns as well as heterogeneity inaccess technologies. We perform a study of DHT performance when the par-ticipating nodes have large variations in available bandwidth and uptime.The evaluations are made both with simulation and real experiments.

4.1 Introduction

The increasing mobility of, and heterogeneity among, Internet endpoints in-troduce new challenges for network services that need to be investigated.While the fastest access networks gets faster, slower networks with transmis-sion speeds at kbits/s are still around. As it is becoming more popular to usemobile terminals such as cell phones and PDAs to connect to the Internet,services must be able to cope with the limitations in system resources thatapplies to such devices. Connectivity may not only have low bandwidth, butcan also be expensive, intermittent, asymmetric etc. New services that aredeployed should be able to handle this wide range of network properties tosome extent.

During the last few years, a range of p2p-style distributed services us-ing some sort of distributed hash table (DHT) have appeared. Within thenetworking community, several proposals for DHT data structures have beenproposed and evaluated, e.g., [8, 15, 13, 1, 14, 10]. However, most evalua-tions tend to be based on the implicit assumption of fairly powerful end nodesconnected to the network using stable, high-bandwidth access networks.

Before you design new systems to cope with the changing network envi-ronment it is valuable to understand how current designs actually performunder such conditions. Therefore we have studied how Bamboo[4] - whichis a widely used DHT within the research community - performs in a mixedenvironment with mobile nodes.

We performed our initial studies using a NS-2[3] implementation of a DHTfollowing the design of Bamboo. To add random behavior of a real network,which is hard to model when setting up simulation scenarios, we also ranexperiments on PlanetLab[7], emulating mobility and bandwidth limitationswith a lightweight network emulation tool.

The main contribution is an initial understanding of how a DHT wouldperform if mobile phones would particpate as full members. Part of ourcontribution is also our method to evaluate an application layer network ina heterogeneous environment using PlanetLab.

As we were able to run experiments for much longer time than possible inthe simulation-based studies, we observed reproducible phenomenas not seenbefore during our simulations: after a few hours, the lookup times startedto increase significantly . When investigating the cause of the observed phe-nomena closer, we have found that it is not solely node churn or bandwidthlimitations that caused it, but rather a combination of them.

This paper is organized as follows. First, we present related work andthen the scenarios studied are outlined in section 4.3. Then the measuredperformance of the DHT is presented. The paper is concluded with a discus-

79

Strong link Weak link Link between clustersdownlink 10 Mb/s 384 Kb/s 100 Gb/suplink 10 Mb/s 64 Kb/s 100 Gb/sdelay 5 ms 115 ms 50 ms

Table 4.1: Properties of access links in simulation

sion on the results and methodology used.

4.2 Related work

It is common to use simulations to evaluate the performance of DHTs [6],although many evaluations are done without taking bandwidth into account.Instead large networks are evaluated by only modeling network delay. Touse NS-2 to simulate DHTs is not as common, although there are otherimplementations[17].

Evaluations of DHTs that take bandwidth and network queues into ac-count are instead often made using emulation[10]. By using emulation yousacrifice the ability to study very large networks, but it does allow you tomodel queue drops. To our knowledge, emulation testbeds have not beenused to study the effect of heterogeneity to DHTs. We are neither aware ofany work similar to our approach to use PlanetLab as an emulation testbedbut our approach does have similarities to flexlab[12].

4.3 Experiment setup

A limitation of our approach to simulations was the complexity of the systemwhich limited the time scales we could study. This limitation motivated aninvestigation whether the results even from short simulations were generallyapplicable. Our real-network experiments were conducted on PlanetLab[7]with complementary emulation of connectivity and bandwidth variations.

The scenario for our experiment is a heterogeneous network with a mix ofstationary computers with broadband connections, and 3G-type cellphones.We will call them strong and weak nodes respectively. As weak nodes modelmobile users, they join and leave the network often and in an unpredictablemanner. Such behavior is called churn, and puts stress on distributed systemslike DHTs. The churn in our scenario is modeled using a Poisson processwhich creates a mean sessiontime of 3 minutes. This is an extreme amountof churn, but cellphones can not be expected to participate in a DHT much

80

Timeout [s]Neighbor ping period 4Leafset maintenance 5Local routing table maintenance 5Global routing table maintenance 10Data storing maintenance 10

Table 4.2: Management traffic periods in seconds

longer than the time it takes them to do the requests they want from theDHT due to price models and limitations in battery. With a flat-rate pricing,users may be more positive to let nodes stay online for longer periods of time,but we have not taken that into account in our scenarios - hence the quiteheavy churn.

In addition to bandwidth, delay and other network-related properties,each node in a DHT also has a number of properties that are directly relatedto the DHT service, i.e., timeouts, how often management data is sent etc..We used the parameters described in table 4.2 for both our simulations andreal experiments.

4.3.1 Simulation setup

We have performed simulations using our own NS-2 implementation of aBamboo-like DHT. The details about the implementation and our approachto simulations can be found in a technical report[9].

To compare with our PlanetLab experiments later described, we simu-lated networks with a size of 170 nodes as that was the number of nodes wecould use on PlanetLab. Each run is 10 minutes and have different ratios ofweak nodes ranging from 20 to 50%. 10 strong nodes were used as bootstrapnodes which new nodes connected to when joining the network. Nodes in theDHT were evenly distributed over a physical network modeled with threeclusters of nodes, connected by very high bandwidth links with high delay(figure 4.1). The clusters represents different continents, while the links con-necting the clusters have high delays and high bandwidth to model backbonetranscontinental connections. For simulations, we limited the bandwidth anddelay of access links according to table 4.1. These values are chosen basedon results from a previous study of a commercially available 3G service[5].

To limit the time needed to simulate, the overlay network is built offline.This means that we let all nodes have complete knowledge about what nodesparticipate in the DHT. This allows all nodes to have as populated routing

81

Figure 4.1: A NS-2 network layout with 3 clusters where the clusters modelcontinents.

tables and leafsets as possible from the beginning of the simulation - the onlyrestriction is that nodes can not optimize for network latencies as they havenot yet measured it. To start the network in such an optimal state is ofcourse unrealistic, but as simulations have showed that a network started inan ordinary fashion will eventually reach a similar stable state, we believe itto be a acceptable method. With our approach Bamboo will stabilize within80 seconds from when it starts . We want to reduce the stabilization timeas much as possible as we will filter out that period of time later on - bydoing this optimization we will thus get a larger amount of useful data fromthe same amount of simulated time. As the time scales we can simulate arelimited, this makes a significant difference.

After the DHT has stabilized, each node performs a GET operation every10 seconds to measure the performance of the network. For each GET, wemeasure whether the operation is successful and if so, how long it took. Forthe simulations, each node holds about 10 keys that can be requested by othernodes through GET operations. This is a smaller amount of keys comparedto the PlanetLab experiments, where we could insert more keys per node.

4.3.2 PlanetLab setup

The distributed testbed PlanetLab[2] has become a popular approach toevaluate new distributed services and systems. PlanetLab is a collection of700+ Linux machines spread over the world on which researchers can getaccounts to run application-level experiments.

Unfortunately it is hard to create the heterogeneous network environ-ments we want to study with PlanetLab nodes, as you do not have privi-leges to modify the network stack. For this purpose, we have developed alightweight connectivity emulation library called Dtour.

82

Figure 4.2: Dtour design

Dtour

The Dtour design is based on our need to filter an unmodified applicationin user space to mimic network dynamics as perceived by the application.That need is met by implementing a layer between the application and thenetwork stack (figure 4.2). All system calls that involve outgoing networktraffic goes through Dtour where it is filtered. Dtour might drop packets dueto for example emulated bandwidth limitations or loss models.

The design of Dtour is deliberately kept simple. All functionality is im-plemented in a dynamically loaded library without any active threads or dae-mons. This means that we do all filtering and state updates when a libraryfunction is called. If we instead let a separate thread handle the filtering, wecould do state updates continuously, but it would increase the complexity ofDtour. Some operating systems offer the possibility to have shared librariesloaded before the normal system libraries.The library functions in libdtour.soare an entry point into the Dtour system.

Currently, only outgoing traffic is filtered in Dtour, so the strong nodesfilter traffic destined to weak nodes. The path from a strong to a weaknode is limited to 384 kbits/s and all outgoing traffic from a weak node goesthrough a 64 kbits/s bandwidth limiter. We have however added a staticfilter connected to the read() function which logs the amount of receivedtraffic.

Events

When using Dtour, network dynamics are expressed as events. A typicalevent might be that at time t, add a path to the path set, initiated as down.The time can be expressed either in global time or as relative time from whenthe scenario is started. What kind of time you use for to describe events is

83

configurable at runtime. The IP and port numbers can be set to 0, whichmatches all values.

Dtour can react to events in two modes. First you can provide a scenariofile with network events to be loaded when the libdtour.so library is initiated.The event file is parsed and the events are stored sorted per path to minimizelookup time when filtering.

The second mode of Dtour is to use it interactively. If this mode isenabled Dtour polls a named pipe for events to be applied as they are readfrom the pipe. The events written to the pipe are in the same format as inthe scenario file apart from not having a timestamp. The two modes can becombined by providing a scenario file and later, or in parallel, modify thelinks interactively.

Function overriding

When the rule set is loaded, Dtour opens the actual system libraries usingthe dlopen() system call to be able to reach the functions to be overridden.Any number of system calls could be overridden by Dtour but we currentlyoverride the functions that are used to send data.

We have considered to override read(), recvfrom(), etc. but have notyet implemented it. We believe that it will be harder to be completelytransparent to an application if we would like to alter how reads are done.We would probably have to alter the behavior of select() to handle incomingdata without returning to the application.

When we simulated the same scenario we could add extra delay on theweak nodes access links, but that possibility is currently unavailable to nodeson PlanetLab. While we do not make strong nodes churn, they experience alow churn rate caused by the dynamics in the Internet in combination withoccasional crashes of PlanetLab nodes. Due to the dynamics of PlanetLaband the roll out of PlanetLab v4 we limited the size of the DHT networkto about 170 nodes, even though we in simulation could simulate up to 500nodes on short time scales.

4.4 Results

After running the experiments for different ratios of weak nodes, we stud-ied how the PlanetLab measurements compared to our previous results fromsimulations. Like in our simulations, we cut off the start of the measure-ments to reduce the influence of stabilization disturbances. While we in oursimulations found that 80 seconds was enough, it seemed necessary to cut

84

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

Ratio weak nodes

Look

up d

elay

[s]

(a) Lookup latency, simula-tion

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

Ratio weak nodes

Look

up d

elay

[s]

(b) Lookup latency, Planet-Lab

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio weak nodes

Suc

cess

rat

io

(c) Success ratio, simulation

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Ratio weak nodes

Suc

cess

rat

io

(d) Success ratio, PlanetLab

Figure 4.3: Lookup latency and success ratio in simulation and on PlanetLab

off a full hour in the PlanetLab experiments. This is because the PlanetLabnodes do not have full knowledge of each other from the beginning of theexperiment and therefore requires a longer stabilization time - It takes muchlonger to spread information about nodes through the network compared tothe time it takes to optimize routing tables.

4.4.1 Comparing simulations to PlanetLab measure-ments

After removing the stabilization times from the data, the lookup latenciesand the success ratios between simulations and PlanetLab experiments werecompared. Although the match is not perfect, the shapes and trends in thedata can indicate how big the correlation is between the evaluation methods.The mean lookup latencies and success ratios are shown in figure 4.3. Whenwe compare results between simulations and PlanetLab measurements, wecan see that the latencies are close, while success ratios are less similar.

4.4.2 PlanetLab results

When we let Bamboo run for many hours on PlanetLab we found that sys-tems performance started to decrease after sine time (figure 4.4). The per-

85

formance will vary greatly over time with a period depending on ratio ofweak nodes and churn rates. We have made further inquiries into the causeof that but it is beyond the scope of this paper to go into the details of theperformance decrease. It is however caused by the combination of churn andbandwidth limitations.

4.5 Discussion

The results presented in figure 4.3 indicate that the cluster model used is asufficient model to study latencies, but that the lack of cross traffic makesit too simplistic when studying success ratios. For simulation, we use veryhigh bandwidth on the inter-cluster links as we did not want packet loss inthe core network, only on access links. For future simulations, it would beinteresting to introduce bursty losses in the core network according to ourPlanetLab measurements to see if that could make observed success ratiosmore similar. The problem is that we want to study effects of access linksdrops as that is characteristic to weak nodes - by introducing core networkdrops the effect of the heterogeneous access links will be less obvious andtherefore harder to analyze.

An alternative to using PlanetLab is to use an emulation testbed likeEmuLab[16] to evaluate a DHT. It would however mean that we would haveto design physical network scenarios like in simulation. If we used the samecluster model we used in simulation, we could not be sure how much simi-larities between simulation and emulation results was caused by the networkmodel. This is something we do not have to worry about when doing realexperiments on PlanetLab as the network environment is real and we onlyemulate the access links for the weak nodes.

It is also interesting in it self that we can get similar results using twodifferent evaluation methods which we believe gives the actual results highercredibility. We think that the phenomenas we found when running longerexperiments do not make our simulation results less relevant but rather thatit shows the need to use different methods when evaluating a system. We alsofind it interesting that the use of Dtour works well to study an applicationlevel network in heterogeneous environments.

We find it promising that Bamboo actually can handle such an extremeamount of churn and still serve a clear majority of successful lookups. If amajority of the lookups are successful you can improve system correctnessby using parallel lookups as suggested in the evaluation of OpenDHT [11].

86

0 5 10 15 20 25 30 350

1

2

3

Time (h)

Late

ncy

(s)

0%30%50%

(a) Lookups over time for different ratios of weak nodes

Figure 4.4: Performance of the DHT over time

4.6 Future work

We would like to investigate if introducing packet loss in the core networkin simulations could make our simulation show success ratios that better fitthe PlanetLab measurements. We are also working on understanding thechanges of lookup latencies over time.

4.7 Conclusions

To be able to design scenarios on PlanetLab that were comparable with oursimulations we used our connectivity emulation library Dtour. It has provedessential to model heterogeneity with PlanetLab. Not only by allowing usto limit node’s available bandwidth, but also by enabling per packet loggingwhich allowed us to study the traffic distribution within the DHT.

We have found that our simulations of a Bamboo-like DHT producesrelevant results in the case of a heterogeneous network. However, The sim-ulations indicate significantly better success ratio compared to PlanetLabexperiments, but this is something we believe can be improved with betterunderstanding of packet loss in the core network.

The fact that the DHT performs well for many hours indicates that itmight be possible to have mobile phones as full members of a DHT. Eventhough the network suffers from increased delay after a few hours.

While doing PlanetLab experiments we found strange behavior that wedid not see in simulation. We found that Bamboo experienced a big increasein lookup latencies after a quite extended period of operation. We expected

87

it to be caused by the very high churn rate that we used, but with closerexamination it did not seem to be that simple. Although the churn wasobviously adding stress to the DHT, it was not the sole explanation. Neitherwas it the bandwidth limited nodes but combination of churn and weak nodes.Further studies are required to understand the cause of this phenomena inmore detail.

88

Bibliography




[4] The Bamboo distributed hash table. Online: http://bamboo-dht.org/,2003.

[5] Daniel Lanner. Comparison of tcp-performance in wireless 3g- and adhoc-networks. Master’s thesis, Uppsala University, 2006.

[6] Jinyang Li, Jeremy Stribling, Thomer M. Gil, Robert Morris, andM. Frans Kaashoek. Comparing the performance of distributed hashtables under churn. In Proceedings of the 3rd International Workshopon Peer-to-Peer Systems (IPTPS04), San Diego, CA, February 2004.

[7] PlanetLab: An open platform for developing, deploying and accessingplanetary-scale services. http://www.planet-lab.org, 2004.


[9] Olof Rensfelt and Lars Ake Larzon. A bandwidth study of a DHTin a heterogeneous environment. Technical Report 2007-017, UppsalaUniversity, May 2007.

89



[12] Robert Ricci, Jonathon Duerig, Pramod Sanaga, Daniel Gebhardt, MikeHibler, Kevin Atkinson, Junxing Zhang, Sneha Kasera, and Jay Lep-reau. The Flexlab approach to realistic evaluation of networked sys-tems. In Proc. of the Fourth Symposium on Networked Systems Designand Implementation (NSDI 2007), Cambridge, MA, April 2007.




[16] Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Gu-ruprasad, Mac Newbold, Mike Hibler, Chad Barb, and AbhijeetJoglekar. An integrated experimental environment for distributed sys-tems and networks. In Proc. of the Fifth Symposium on Operating Sys-tems Design and Implementation, pages 255–270, Boston, MA, Decem-ber 2002.

[17] Stefan Zoels, Simon Schubert, Wolfgang Kellerer, and Zoran Despotovic.Hybrid DHT design for mobile environments. In AP2PC, 2006.

90

Recent licentiate theses from the Department of Information Technology

2007-003 Thabotharan Kathiravelu: Towards Content Distribution in Opportunistic Net-works

2007-002 Jonas Boustedt: Students Working with a Large Software System: Experiencesand Understandings

2007-001 Manivasakan Sabesan: Querying Mediated Web Services

2006-012 Stefan Blomkvist: User-Centred Design and Agile Development of IT Systems

2006-011 Asa Cajander: Values and Perspectives Affecting IT Systems Development andUsability Work

2006-010 Henrik Johansson: Performance Characterization and Evaluation of ParallelPDE Solvers

2006-009 Eddie Wadbro: Topology Optimization for Acoustic Wave Propagation Prob-lems

2006-008 Agnes Rensfelt: Nonparametric Identification of Viscoelastic Materials

2006-007 Stefan Engblom: Numerical Methods for the Chemical Master Equation

2006-006 Anna Eckerdal: Novice Students’ Learning of Object-Oriented Programming

2006-005 Arvid Kauppi: A Human-Computer Interaction Approach to Train Traffic Con-trol

2006-004 Mikael Erlandsson: Usability in Transportation – Improving the Analysis ofCognitive Work Tasks

2006-003 Therese Berg: Regular Inference for Reactive Systems

2006-002 Anders Hessel: Model-Based Test Case Selection and Generation for Real-Time Systems

2006-001 Linda Brus: Recursive Black-box Identification of Nonlinear State-space ODEModels

2005-011 Bjorn Holmberg: Towards Markerless Analysis of Human Motion

Department of Information Technology, Uppsala University, Sweden

Tools and methods for evaluation of overlay networks...Proceedings of the 4:th Scandinavian Workshop...

Documents

Transcript of Tools and methods for evaluation of overlay networks...Proceedings of the 4:th Scandinavian Workshop...