Explaining BubbleStorm: Resilient, Probabilistic, and Exhaustive...
Transcript of Explaining BubbleStorm: Resilient, Probabilistic, and Exhaustive...
Explaining BubbleStorm: Resilient, Probabilistic, and Exhaustive PeertoPeer Search
Paper by : Wesley W. Terpstra, Jussi Kangasharju,Christof Leng, Alejandro P. Buchmann
Explanation by : Kévin Redon([email protected]berlin.de)
Seminar “Internet Routing”, Teschnische Universität Berlin
SS 2009 (Version 13th July 2009) under CCBYSA, except paper extracted graph
Abstract
Searching through Internet is often done thanks to popular search engines, which are based on the clientserver model. To provide this service to number of users, they need an important infrastructure, performing, resistant, and thus very expensive. On the other hand the peertopeer (P2P) model offers scalability, adaptivity, and robustness while being inexpensive. Generally used for file sharing, new applications using these P2P networks are emerging. The paper BubbleStorm [BS] describes a way to create a P2P network to perform search queries, being exhaustive and resilient.
1 Introduction – Searching : Collecting and Sorting
Internet is getting bigger and bigger. Since a few years information is not rare anymore but overflowing instead. The difficult part went from “finding some data”, to “getting the right information”. Most of the time when surfing on the Internet is spent on searching. The homepage in the web browser is often a search engine, and bookmarks are replaced by key works typed in this search engine. Google is the dominating one, and makes tremendous benefits from it, with some others like Yahoo! or Live Search struggling to keep there market share. Even new search engines still appear, as the recent Bing from Microsoft.Searching information and data over Internet is quite important, but still a young , experimental field, and one of the hardest task. Not just finding some data is important, lots of them exist, but finding and rating them is a critical point.The most know search engines like Google use a centralized system. Google Inc, collects and process information on their server. Keywords are extracted, because searching all the websites at each query is not an options. The users only get the results from them, based on what is extracted and how they match the search query.On the other side, a recent alternative is the search in P2P networks, in a distributed manners, enabling an inexpensive scalability. BubbleStorm is a new proposition in this context. It assures to be a probabilistic, but resilient and exhaustive PeertoPeer search.This document will explain P2P search, how BubbleStorm works, and present tests to show its reliability and resiliency.
1
2 Searching – from Centralized to Distributed
Searching began with the clientserver model, even on the first P2P networks. Recently they are becoming distributed. BubbleStorm is one of these new methods.
2.1 Centralized search in ServerClient networks
The web search engines like Google, Yahoo or MSN live search engine use a centralized system. The companies holds machines which crawl over Internet and process the web sites by picking keywords so to build a database. The user, client side, connects to their servers and sends search queries which are evaluated over their database and the result is returned to the user.Centralized search have two weaknesses : the holder defines the search and evaluation algorithm and thus completely control the search context and results. They also need an important infrastructure to keep up with the number of users and the grows of Internet. ClientServer architecture is not well scalable, and therefor expensive.
2.2 Centralized search in PeertoPeer networks
The peertopeer (P2P) paradigm is opposed to serverclient as the resource is not stored on a central server, but on the peers participating in the network. The peers establish a direct connection between them to exchange the data.The first P2P networks as Napster and eDonkey still use the centralized approach to search. The peers connect to a central server to publish the files they are sharing. When a peer is looking for a file, it sends the search query to the central server, which will look in the list of files from all the peers connected to it, and returns the result. The peer will then directly connect to the other peer holding the data to request it.This system has then the same weaknesses as in serverclient system in the aspect of searching. The bottleneck is again the central server.
2.3 Decentralized search in PeertoPeer networks
Gnutella uses a decentralized approach. There are no servers anymore and the peers are directly connected between them. To perform a search the query is sent to the neighbours. If they did not hold the data, they forward the query to their neighbours.To perform an exhaustive search would mean to flood the entire network. This is not permitted because the query has a maximum number of hops. The massive incoming and outgoing of peers made the network unstable. The first version of Gnutella had
2
Illustration 1: serverclient
Illustration 2: centralized p2p
difficulties because of the routing overhead and important time responses (illustration 3). In the second version, superpeers were introduced. This lead to a structured topology, making it more performing (illustration 4). The super peer are simply the more stable and fastest peers. They manage the file list of the connected peers. The search process is much faster and can be exhaustive. But again this superpeers have the same weaknesses as the servers in the client server model.
2.4 Distributed Hash Tables
Distributed Hash Tables (DHT) acts a layer added over the P2P network [DHT]. It distributes the complete data list of all peers among them. The metadata, like file name or document title are hashed and mapped onto the hash table. Each participating peer is responsible for a certain range of the possible hash values. The link to the data and its metadata are sent to the node responsible for the hash of it. When searching for data, the peer calculates the hash of the query and asks the peer responsible for this hash for the corresponding entries. This system enables an exhaustive search, and intelligent algorithm like Chord, CAN or Kadmelia make it resilient.But the construction of such DHT and the routing over it are not easy to design and implement. The other drawback is that the search query limited to key values from published metadata.
3 BubbleStorm
Now the context is set and the difficulties of searching have been illustrated. Let's see how BubbleStorm works and overcomes them.
3.1 Principles
Bubble Storm offers a new way of searching, using a P2P network. The main idea is to replicate the data through the network, creating a bubble. The search query is also replicated through the network the same way, creating another bubble. Each peer in the query bubble performs the query on the data it stores locally. The rendezvous area is were the query bubble overlaps with corresponding data bubble. The query
3
Illustration 4: Gnutella 2
Illustration 3: DHT chord
Illustration 3 : decentralized p2p
matches on this nodes, the searched data has been found. One important aspect is that the search query is performed locally on each nodes. The query can be performed with different algorithm. It can be full text search, XPath, SQLlike, ...The aim of BubbleStorm is also to define how the datum and queries are spread, so that the search becomes exhaustive with a high probability, resilient to network topology change, and still fast and efficient.
3.2 Topology
Bubble Storm uses a random multigraph topology. Multigraphs are graphs allowing multiple edges between two nodes, and reflective loop edges. It is random because the node connects to other random nodes. Random multigraph are interesting because they cope with heterogeneous networks, are resilient and very likely cycle free, which is an important aspect when spreading replicates.The network is build the following way : a incoming peer has some known peers from a cached list. It chooses a random edges by using a random walk [RW] of length 3(1+log n), with n being the number of nodes in the network.The new peer inserts itself on this edge, between the two end peers of this edge (illustration 7). This leaves the degree (number of edges/connections connected to it) of the original edges the same then before. This join process is done until the wished degree of the node is reached. The degree is proportional to the available bandwidth.When a peer leaves, it reconnects the neighbour peers between, also leaving their degree identical. When a peer crashes (not leaving properly) the degree of the neighbours is decreased by one. As soon as the degree is decreased by two, the affected peer automatically joins a new edge. This increases the degree by two, reaching again the wished number of connections.The join and leave process must be serializable to prevent inconsistency and conflicts. This is done by using TCP connections and a handshake described in the paper.
3.3 Bubbles
As explained, the principles of BubbleStorm is to replicate the datum and query so to create bubbles. This is done by using bubblecasts. The search is then successful if the bubble of datum and query meet each other at the rendezvous points.Let d be the the size of the datum bubble and q be the size of the query bubble. In a random graph, the probability for the datum and query bubbles not to meet is
4
Illustration 5: join and leave process
Illustration 4: BubbleStorm
e−dqn (not demonstrated in the paper). If d and q are chosen so that dq=4 n ,
with n the number of peers in the network, then the probability for the bubbles to have a rendezvous node is 1−e−4≈0.9817 . d and q can be different. The data bubble can be small if the query bubble is big. How n is determined within the P2P network will be seen in the measurement section.
3.4 Bubblecast
Bubblecast replicates datum and queries onto peers over the network. It uses a mix of the flooding and random walk methods, enabling a control over replication.A bubblecast is defines by two parameters : the weight w and the split factor s. When a bubblecast message arrives at a peer the following is done :
– the peer stores the datum or performs the query. How the query is performed is free to every peer.
– it's weight w is decreased by 1– if the rest weight is not 0, then it is equally distributed between s random
neighbours, as bubblecast.
Small loops in the random multigraph are unlikely, but large loops might exist and are hardly detectable. When the bubblecast message arrives a second time on a node, it is ignored. Thus these loops reduce the bubble size. Because they are rare in a random multigraph (as a property of them), they only have little effect, as the results will shows.The paper uses more advanced mathematics so too take advantage of the heterogeneity of the network. At the end, the developer only has first to choose the certainty factor c to define how sure the probabilistic search is exhaustive. The second
parameter to define is ratio RqRd
, where Rd is the rate, in bytes/second, at which
the datum are sent, and Rq for the queries.
3.5 Measurement
For the topology and bubblecast, the number of nodes n has to be known. The measurement protocol uses 24 bytes, which are piggybacked (added) on the keep alive messages sent every 5 seconds.As analogy, measuring the size of the network is like measuring the size of a lake. It is possible to measure the size of a lake by releasing a high number of fishes in the lake, and then counting after a certain time how many fishes are in 1m3, under the assumption that the fishes distribute themselves evenly in the lake.In this method, a unique node has to do the measurement. To decentralize this process [DM] for P2P networks, all fishes from a
5
Illustration 6: bubblecast
Illustration 7: lake measurement with fish spreading
node have a common randomly chosen strength. When spreading, the strong fishes eat the weaker ones. At the end, only the strongest fish swarm survives and indicates the size of the lake.To be able to do multiple measurement, the fishes also have a releasing date (version number) and source (peer IP address).The advantage of this method is that the measurement only needs O(log n) rounds.
3.6 Congestion control
Because the available bandwidth is often asymmetric, the upload rate is lower then the download rate and thus is the bottleneck. And also because parallel traffic might exist, congestion can occur. Congestion control is also an important aspect to have a stable and resilient P2P network.Bubble Storm handles the congestion by having a small queue and sets a priority on each message, except the keep alive messages which are vital for the network maintenance. The priority is calculated depending on the message weight and message bubble size. Message with height weight have more priority. If the priority it to small, the message is dropped. For more details, see the paper.
4 Simulation
The efficiency of BubbleStorm has been demonstrated thanks to simulation, described here.
4.1 Simulator
To simulate Bubble Storm, a complete simulator has been created. It runs on a single computer. The simulation has not been done on a computer network, and does not use a normal network stack, but rather a simplified TCP layer.The peers have a upload and download rate, a last hop latency and a position on Earth. The total message flight time is the sum over : the queue delay, time needed to upload the message, the sender's last hop delay, two times the time required by light between the two peers, a normal random time 5±5ms, the receiver's last hop delay, time needed to download the message, TCP related times.
4.2 Experiments parameters
A network of 1 million peer is simulated. All the peers have per default :– a 10kB/s upload capacity– a 100kB/s download capacity– a last hop delay of 40ms– a degree of 10– an exponential random lifetime within 60 minutes, which does not
correspond to the real world peer lifetime, but represents a worser case– peers join the network following a Poisson distribution, once the target size
has been reached, to compensate the leaving of peers
6
A wiki is simulated. Bubblecast values have :– the split factor s is 2– the certainty factor c is 2– the wiki article data is 2kB– articles are created every 30 user min on average– a search query is 100 bytes– queries are sent every 5 user min on average– the message injection distribution is 80/20. 80% of the messages is send in the
first 20% lifetime of the peer
The experiment timeline is the following :
4.3 Scenarii
Scenarios have been decided to test the performances of the different aspects of the system, as the efficiency and robustness. 11 simulations per scenario are done.
Pure Churn: This is the default environment, as previously defined. The peers leave normally, they do not crash. This environment is used as reference.
Massive Leave: It uses the pure churn environment. But after one minute 50% or 90% of peers leave at the same time, without crashing. This environment tests the robustness and recovering ability.
Churn with Crashes: Same as Pure Churn, but 10% of peers crash instead of leaving the proper way. This also tests the robustness.
Massive Crash: Same as churn with crashes, but this time 5%, 10% and 50% of the peers crashes. This shows the recover ability.
Heterogeneous Network: Same as churn with crash, but the peers are not homogeneous anymore. The peers follow the following partition :
Population (%) Upstream (kB/s) Downstream (kB/s) Last hop (ms)
60 16 128 30
25 32 256 20
10 128 128 1
5 1280 1280 1
This simulates a more realistic network.
7
5 Results
5.1 Scenarios
Pure Churn
As the graphic shows, the success rate is at 98%. The replicates arrive with 99.9% on unique node. The other missing 0.1% are due to the cycles within the topology. The congestion is very low. It takes 2s to have the first query match, and 4.5s to have all results.
Massive Leave
The 50% and 90% peers leaving at the same time make the search result around 1. This is because the values for the bubblecast are still calculated for a bigger network. Thus the replication is high, but the numbers of peers low. They become normal again after the next measurement, and go to 0.98 for 50% and 0.96 for 90% of leave. This false replication parameter effect generates some congestion. It is also because the new nodes sending most of their messages just after having joined the network, due to the 80/20 rule, and again because they use the too high replication values.But after 2 to 4 minutes, the system is stabilised again. This show the robustness and recovery ability.
8
Churn with Crashes
There is very little difference with the Pure Churn, even when the peers crash instead of leaving. This shows the robustness of the system.
Massive Crash
The massive crash causes a complete replication failure, and thus too small bubbles. This is because the crashed noded are not detected and they act as a sink, preventing every replication to pass through them. This is also the cause for the high congestion.But the system is completely healed within 1 min, when the wholes are detected and repaired. This show the recover ability of the system.
Heterogeneous Network
This scenario is the most realistic, and shows that even with a heterogeneous network the search results are over 0.98. But moreover the latencies are very low. This is because the better peers have a higher degree and process more and faster. The heterogeneity of the network has a very positive effect, making the first result come within 600 ± 200ms.
5.2 Comparison
Because the topologies, structures and algorithms are very different, it it hard to compare Bubblestorm with other P2P search network.But when compared with Gnutella (a well known model), Bubblestorm is a bit faster, generates less traffic, and has a high success rate, particularly for growing network, showing the scalability of such a system.
9
6 Conclusion
As seen in the simulations, BubbleStorm is very resilient, being able to recover within minutes from a departure of 90% of the peers, or a crash of 50% of them. It also offers an efficient exhaustive search system, with a success probability over 0.96. But it also shows up as “being fast” with a first response “within 600 ± 200ms” and generally around 2~4s. It also “does not use a lot of bandwidth”, 10kB/s is enough. And finally it scales well and makes a good use of the heterogeneity of a network.BubbleStorm is not a P2P network for file sharing. Replicating a film over a part of the network would be very inefficient. BubbleStorm is a P2P search platform only and might be used for articles in wikis, post in blogs, entries and relation in social network, or simply indicate links or trackers to the searched data. The second reason why it can be efficient for these type of systems, it that the query is not limited, and can be as powerful as need. It could be pattern matching, full text search, SQLlike commands, Xpath, ... . Because the replica maintenance aspect is very application specific, it has not been well defined in this paper, but is discussed in another paper [REP] from the same authors. Another untreated aspect it about the security and possible misuse. First because data is replicated on every node : this data could also be malicious software. And secondly because each node has to perform the query locally, it could return the result it wants, maybe also links to malwares.But even if not everything is yet not well defined, P2P search is a very new research topic, and BubbleStorm already offers some good results and builds a solid base is this field.
References
[BS]: Wesley W. Terpstra, Jussi Kangasharju, Christof Leng, Alejandro P. Buchmann. BubbleStorm: Resilient, Probabilistic, and Exhaustive PeertoPeer Search . In SIGCOMM’07, August 27–31, 2007, Kyoto, Japan.[DHT]: K.H. Yang and J.M. Ho. Proof: A DHTbased PeertoPeer Search Engine. In Conference on Web Intelligence, pages 702–708, 2006.[RW]: R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, and S. Jagannathan Search with Probabilistic Guarantees in Unstructured PeertoPeer Networks. In P2P, pages 165–172, 2005.[DM]: W. W. Terpstra, C. Leng, and A. P. Buchmann. Brief Announcement: Practical Summation via Gossip. In PODC, 2007.[REP]: Christof Leng, Wesley W. Terpstra, Bettina Kemme. Maintaining Replicas in Unstructured P2P Systems. In ACM CoNEXT 2008, December 1012, 2008, Madrid, SPAIN
10