Gossiping Models - Vrije Universiteit Amsterdam...Gossiping Models Formal Analysis of Epidemic...

Gossiping Models

Formal Analysis of Epidemic Protocols

Rena Bakhshi

Gossiping Models

Copyright © 2011 by Rena BakhshiCover design by Jorg EndrullisPrinted and bound by Wohrmann Print Service, ZutphenISBN: 978-90-8570-710-3IPA Dissertation Series 2011-01

The work reported in this thesis has been carried out at the Vrije Univer-siteit Amsterdam under the auspices of the research school IPA (Institutefor Programming research and Algorithmics). The research was fundedby the Vrije Universiteit Amsterdam.

VRIJE UNIVERSITEIT

Gossiping Models

Formal Analysis of Epidemic Protocols

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad Doctor aande Vrije Universiteit Amsterdam,

op gezag van de rector magnificusprof.dr. L.M. Bouter,

in het openbaar te verdedigenten overstaan van de promotiecommissie

van de faculteit der Exacte Wetenschappenop donderdag 13 januari 2011 om 11.45 uur

in de aula van de universiteit,De Boelelaan 1105

door

Rena Bakhshi

geboren te Baki, Azerbeidzjan

promotoren: prof.dr. W.J. Fokkinkprof.dr.ir. M.R. van Steen

Acknowledgements

It is a pleasure to thank those who made the completion of this thesis possible.First and foremost I express my gratitude to my supervisor Wan Fokkink for hisguidance and patience throughout my thesis, for sharing his ideas with me, andall the support he has provided in these four years. Moreover, I want to thankWan and Judi for letting me stay at their house in the beginning of my PhD,when there was no apartment available. I’m also very grateful to my secondpromotor Maarten van Steen for his constant encouragement and for being aninfinite source of inspirational ideas. I’m truly thankful that despite his busyschedule, he always found time for our meetings and discussions.

I thank the members of the reading committee: Roberto Baldoni, Verena Wolf,Holger Hermanns, Spyros Voulgaris, and Boudewijn Haverkort for their time,timely response and valuable comments. I thank Holger and Wan for invitingme to the Dagstuhl seminars, where I met a lot of interesting people in a cosy,friendly atmosphere.

I share the credit of my work, some of it presented in this thesis, with myco-authors Lucia Cloth, Daniela Gavidia, Ansgar Fehnker, Jorg Endrullis, StefanEndrullis, Jun Pang, Jaco van de Pol and Boudewijn Haverkort. I thank themfor the great work together and hope to continue our fruitful cooperation in thefuture.

I had the pleasure to visit NICTA in Australia for three months, and hangaround with Ansgar Fehnker, Rob van Glabbeek, Ralf Huuck, Carroll Morgan,Franck Cassez, Clemens Dubslaff, Jane Gillis, and all others on 5th and 6th floor.I really enjoyed the excellent working environment as well as all lunches, coffees,BBQs, Friday beers, and useful travel tips. I specially want to thank Ansgar, Wanand Jane for helping to organize my visit to Sydney.

In my daily work a wonderful group of colleagues created a warm and cheer-ful environment: Taolue Chen, Jorg Endrullis, Wan Fokkink, Daniela Gavidia,Fatemeh Ghassemi,Clemens Grabmayer, Willem van Hage, Helle Hansen, DimitriHendriks, Maxim Hendriks, Ariya Isihara, Jan Willem Klop, Cynthia Kop, Femkevan Raamsdonk, Anna Tordai, Stefan Vijzelaar, and Roel de Vrijer. Our lively

vii

lunch discussions on numerous subjects have been an important element in my“PhD life”. I enjoyed sharing an office with Willem, Helle, Anna, Taolue, andCynthia. I want to also thank Caroline Waij, Fenny Bosse, Ilse Thomson, ShirleyChedi, Caroline van der Oort, Femke van den Bosch and Elly Lammers for beingso helpful with formalities at the VU.

I took part in several IPA events, PhD courses at the VU, summer schools,Dagstuhl seminars, and Lorentz workshops. I enjoyed very much socializingwith the people I met during these events, including Melanie Eijberts, Alexan-dra Silva, Adam Koprowski, Lacramioara Astefanoaei, Stephanie Kemper, andPaul van Tilburg. Special thanks to Melanie for sharing with me happiness andsorrows, showing me Amsterdam, and for the friendship since we met at the PhDsuccess course.

Beyond Computer Science and the VU, I’m very thankful to my friends Rox-ana, Ulja, Jen, Fuad, Elkhan, Ali, Donnie and many others who supported me inmany respects all these years, and for being such a nice friends. Guys, you rock!

I specially thank my family Leyla, Rafik, Rasim, Nargiz, Rashid, Natasha, Igor,Marina, and my family-in-law Karin, Manfred, Stefan, Luzi, Maria, Katrin, Robby,Madlen, Bernd, for their love and support. Finally, to Jorg for being so loving andunderstanding, for sparing time to listen to my “thoughts aloud”, for his patience,for designing my wonderful thesis cover, and much more...

Contents

Contents I

1 Introduction 11.1 Requirements for Gossip Protocols . . . . . . . . . . . . . . . . . 21.2 Formal Analysis Techniques . . . . . . . . . . . . . . . . . . . . . 41.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 51.4 Model-Based Analysis Techniques . . . . . . . . . . . . . . . . . . 61.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.7 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . 131.8 Origins of the Chapters . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Background 172.1 A Generic Gossip Protocol . . . . . . . . . . . . . . . . . . . . . . 172.2 A Gossip Protocol for Information Dissemination . . . . . . . . . 19

3 Pairwise Exchange Model without Failures 233.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Experimental Observations . . . . . . . . . . . . . . . . . . . . . . 243.4 An Analytical Model of Information Dissemination . . . . . . . . 273.5 Another Expression for Pdrop . . . . . . . . . . . . . . . . . . . . . 343.6 Optimal Size for the Exchange Buffer . . . . . . . . . . . . . . . . 363.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 373.8 Time Complexity of Experiments . . . . . . . . . . . . . . . . . . 403.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Pairwise Exchange Model in Lossy Networks 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

I

4.2 Our Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 A Model of Information Spread with Lossy Channels . . . . . . . 484.4 The Probability of Dropping an Item . . . . . . . . . . . . . . . . 504.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 554.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Comparison of Performance Models 595.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 PRISM Model Checker . . . . . . . . . . . . . . . . . . . . . . . . 615.3 The Basic PRISM Model . . . . . . . . . . . . . . . . . . . . . . . 615.4 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 Modelling with Differential Equations . . . . . . . . . . . . . . . 695.6 Difference Equation Model . . . . . . . . . . . . . . . . . . . . . . 775.7 Fractions and Probabilities . . . . . . . . . . . . . . . . . . . . . . 795.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.10 Appendix: Nondeterministic PRISM Model . . . . . . . . . . . . . 83

6 Mean-field Framework 856.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2 Gossiping Time Protocol . . . . . . . . . . . . . . . . . . . . . . . 876.3 Mean-field Modelling and Convergence . . . . . . . . . . . . . . . 886.4 A Mean-field Model for the Shuffle Protocol . . . . . . . . . . . . 956.5 A Mean-field Model for Basic GTP . . . . . . . . . . . . . . . . . . 1056.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7 Automating the Mean-Field for Dynamic Networks 1177.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1187.3 Modelling Framework . . . . . . . . . . . . . . . . . . . . . . . . 1197.4 Mean-field Modelling . . . . . . . . . . . . . . . . . . . . . . . . . 1237.5 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.6 Framework for Dynamic Networks . . . . . . . . . . . . . . . . . 1327.7 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.9 Appendix: Specification of Basic GTP Protocol . . . . . . . . . . . 1407.10 Appendix: Specification for Mobile Case . . . . . . . . . . . . . . 1417.11 Appendix: Specification for Dynamic Case . . . . . . . . . . . . . 143

8 Concluding Remarks 1458.1 Analytical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 1458.2 Industrial Importance . . . . . . . . . . . . . . . . . . . . . . . . 149

9 Summary of the Thesis 151

II

9.1 Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Bibliography 153

III

1Introduction

The emergence of the Internet as a computing platform asks for new classes ofalgorithms that combine massive distributed processing and inherent decentral-ization. Moreover, these algorithms should be able to execute in an environmentthat is heterogeneous, changes almost continuously, and consists of millions ofnodes. Massive parallel computing on the Internet also demands a degree of self-organization; the number of devices and amount of software is simply too largeto be managed by humans.

Gossip protocols (sometimes referred to as epidemic or random-walk proto-cols) have shown to be a sensible paradigm for developing stable and reliablecommunication mechanisms that scale up to massively parallel environments. Ina gossip protocol, nodes exchange data similar to the way a contagious diseasespreads. That is, a participating peer can select, according to some probabil-ity distribution, other peers to exchange information with. For example, sensornodes collectively monitor temperature inside one building. Every node mea-sures the temperature of its area, and periodically communicates with a randomnode in its neighborhood to exchange local temperature value. A node thenaverages all received values.

Gossip protocols have originally been proposed for database replication [75],but more recently also for failure detection [174], and resource monitoring [173].They are employed in wired as well as wireless environments. In a massively par-allel setting, the gossip mechanism should be used at very high speeds, yieldinga new generation of protocols that have an unusual style of probabilistic relia-bility guarantees, regarding scalability, performance, and stability of throughput.Surveys [83, 85, 115] provide an introduction to the field.

When a large number of nodes interact in a connected environment, variousphenomena occur that are not explicable in terms of the behavior of any singleagent. It is necessary to understand these phenomena in order to keep the over-all systems both stable and efficient. Distributed algorithms and protocols thatrun steadily and reliably in small-scale settings, tend to lose those properties asnumbers of users, the size of the network and transaction processing rates all

1

2 Chapter 1 — Introduction

increase. Typical problems are disruptive overloads and routing changes, periodsof poor connectivity and throughput instability. Failures rise in frequency simplybecause the numbers of participating components are larger [175].

Gossip protocols tend to contain several design parameters that can influencethe nonfunctional properties of these protocols, for example, performance, ro-bustness, and fault tolerance. Values of these parameters are usually determinedempirically, without a proper understanding why the protocol performs well forthese values, and without any certainty that these values are close to optimalor robust choices. Thorough experimental analysis in [123] has shown that theemergent behavior of gossip protocols may vary substantially by changing only afew design parameters.

In practice, properties of gossip protocols are usually diagnosed by emulatingsuch systems on a single workstation. However, in principle, owing to their oftenrelatively simple structure, gossip protocols lend themselves very well to formalanalysis in order to predict their behavior with high confidence.

Next section gives an overview of the different types of requirements for gos-sip protocols. Thereafter, we discuss the available spectrum of analysis tech-niques for probabilistic systems, and related work for gossip protocols.

1.1 Requirements for Gossip Protocols

Requirements for gossip protocols can be divided into three classes: general,functional and nonfunctional requirements. These will be discussed in the cur-rent section. We use terminology from [123, 176].

1.1.1 General Requirements

Gossip protocols are meant to satisfy the following general requirements:

• Simplicity: The protocol is simple and easy to deploy. For example, in a wirelessnetwork, a node should be able to join the system without executing a complexprocedure (“plug-and-play”).

• Scalability: Each node continues to perform its operations at almost the samerate irrespective of the network size. For example, the local knowledge (neigh-bors list) of a node does not increase with the network size.

• Symmetry: In a large-scale network, all nodes play identical roles. Hence,there is no single point of failure. Randomization, for example, random peerselection, tends to fit into this requirement, because each node typically runsthe same algorithm.

1.1.2 Functional Requirements

Functional requirements describe properties of the outputs of a system andhow certain input is transformed into specific output. We classify several require-

1.1. Requirements for Gossip Protocols 3

ments on gossip protocols, and distinguish between global and local properties.

1. Global properties of the system:

• Connectivity: This can, for example, be expressed as a minimum numberof links between nodes whose removal will result in the partitioning of thenetwork graph. The connectivity of a graph is an important measure ofhow proper it is maintained in the presence of churn, because partitioningof a network graph creates difficulties for information dissemination.

• Convergence: One can distinguish between convergence of the system pa-rameters to some values (for example, achieving a certain accuracy in theestimates of the aggregate function values) and convergence of the systemstructure to some particular type of graph.

2. Local properties of nodes:

• Degree distribution: The degree of a node is the number of its neighbors inthe network graph. This concept is interesting because of its relationship tothe correct system operation on a graph in the presence of node churn, itseffect on patterns of epidemic spread, and its importance in the distributionof resource usage of nodes.

• Clustering coefficient: The clustering coefficient of a node expresses the ra-tio of the number of links between the node’s neighbors to the number of allpossible links between them. Intuitively, it shows how many neighbors of anode are neighbors among themselves. Analysis of this property is interest-ing because a high clustering coefficient affects information dissemination,as the number of redundant messages increases. It also affects the self-healing capacity by increasing the number of links to the node neighbor-hood from the other nodes, thus decreasing the probability of partitioning.

• Shortest path length: The shortest path length between two nodes is theminimum number of edges that must be traversed to go from one nodeto the other. The average path length is the average of all shortest pathlengths between any two nodes in a graph. The shortest and average pathlength give information on the time and communication costs to reach anode.

1.1.3 Nonfunctional Requirements

Nonfunctional requirements regard the quality (for example, performance,maintainability, fault tolerance) and economics (for example, timing, cost) ofsystem behavior. The following high-level nonfunctional requirements can beidentified for gossip protocols:

Time complexity: The number of time units it takes (at worst or on average) fora gossip protocol to “infect” every node in the network, for example, for datadelivery to all nodes, or for computation of an aggregation function output.


Message complexity: The total number of gossip messages (at worst or on aver-age) exchanged over the network during an execution.

Information dissemination: For a given time span, there is a a high probabilitythat a piece of information is shared with all processes.

Robustness: The ability of a gossip protocol to maintain correct system operationin the face of massive node crashes and node churn.

Graceful degradation: Large numbers of node or link failures in the system mayaffect its operation. However, performance, functionality and reliability ofgossip protocols should not drop rapidly as the number of failures increases.

Elasticity: Robustness of the overall system behavior in the face of largely vary-ing node capabilities in terms of memory, bandwidth and connectivity.

Self-organization: The nodes should be able to organize themselves in unpre-dictable circumstances without external interventions. For example, in gos-sip protocols a network graph forms overlays that are adaptable to networkand environmental changes.

Specifically in wireless networks, nodes communicate through error-prone ra-dio channels and typically also have limited computational capabilities. Specialdesign issues then include energy usage, mobility, transmission power, mem-ory usage and latency. Network properties such as message reliability and nodereachability may in that case influence the behavior of the protocol.

Next, we give an overview of the different approaches that can be taken toformally analyze gossip protocols.

1.2 Formal Analysis Techniques

When analyzing a system we generally aim to obtain a better understandingof, and gain further confidence in the system’s behavior, to detect possible errors,and to improve its design. A complication in the analysis of gossip protocolsis that they are meant to work in very large networks, and for ad hoc wirelessnetworks even with lossy channels. Properties of such protocols are generallydiagnosed by emulating such networks.

The formal specification of systems helps to make explicit the underlying as-sumptions (like the synchronization primitives), which tend to remain hiddenin implementations or simulation exercises. Also such a specification can be an-alyzed using (semi-)automated formal verification techniques. There is a richhistory of the use of such techniques for verifying a variety of desirable proper-ties for a wide range of systems. The efficiency of a particular formal analysistechnique depends on the system under study.

Gossip protocols in general contain several design parameters that heavilyinfluence their behavior, such as, the number of protocol cycles, message for-warding strategies, message delays, or cache usage and size. Formal analysistechniques can help in the search for optimal values of such parameters.

Rigorous formal analysis techniques for gossip protocols have so far hardlybeen applied in large-scale settings.The main scope of the thesis is to investigate

1.3. Experimental Evaluation 5

which formal analysis techniques can, in principle, be employed efficiently inthe analysis of gossip protocols, for wired as well as wireless systems. In theremainder of this section, we will provide an overview of the existing analysismethods, their use and limitations, as well as a comparison of related work onthe formal verification of gossip protocols.

Figure 1.1: Spectrum of validation

Figure 1.1 depicts the spectrum of analysis techniques, ranging from exper-imental work with a real system implementation up to rigorous mathematicalanalysis. Real system statistics and simulation techniques are based on experi-ments performed on the system and on collecting data statistics either from therunning system by monitoring it at real time or from a discrete-event simulationof the system. Usually, these approaches are used to study the behavior of a par-ticular implementation (instance) of the system. The other approaches require aformal modelling of the system. Typically, they are used to verify specific proper-ties in a more general context. Although these methods often require particularassumptions to be made, their advantage is that they can be used before a systemis being implemented, and that in principle they are not costly (in comparison tofull-scale experiments on a real system).

Of course, there are pros and cons of the use of experimental and model-based formal analysis techniques for gossip protocols. We will discuss them laterin the chapter, outlining the challenges.

1.3 Experimental Evaluation

In practice, properties of gossip protocols are often diagnosed by emulation,and by performing simulations (see, for example, [94]). Monte Carlo simulationis a widely used statistical sampling method. An implementation of a gossipprotocol is executed over and over, each time using a different set of randomvalues, drawn from some probability distribution, to make random choices inthe protocol. At the end, a Monte Carlo simulation produces distributions ofpossible outcome values. Commonly used discrete-event simulators in the area of


communication systems are ns-2 [53, 143], OPNET [153] and GloMoSim [182,183]; but often a customized simulator is built in for example Java or MATLAB.

Experimentation is the major source of gaining first insight. The reason isthat in reality performance of gossip protocols depends on many factors: charac-teristics of the network, certain distributions (capacity, node degree, etc), usagescenarios, user models, incentives, etc.

However, different simulators can produce vastly different results, even forsimple systems (see, for example, [56]). The reason is that simulators employdifferent models for the medium access control and physical layers. Results ofsimulators often say as much about the simulated system as about the particularlower-level implementation of the simulator. Also the employed random numbergenerators have an (unpredictable) impact; for instance, [5, 172] question thecredibility of this type of simulations. Moreover, different simulation analyses ofgossip protocols make different assumptions about the underlying model, whichmakes comparison of results difficult. For example, [56] and [58] both evaluatethe flooding protocol, but sending and receiving is perfectly synchronized in [56],while [58] assumes a random waiting period between sending and receiving.

Few attempts have been made to implement systems that use one of the ex-isting gossip protocols, that is, Astrolabe [173], Tribler [158], ARRG [79] andHEAP [89]. Notably, in [79], it is also shown that a gossip protocol that behaveswell in a simulated environment, may not behave well when emulated or trulyimplemented, especially in an environment where nodes can crash. The reasonis again that the simulations are based on assumption that do not always hold inrealistic environment.

1.4 Model-Based Analysis Techniques

The formal specification of a system helps to obtain not only a better (moremodular) description, but also a clear understanding and an abstract view of thesystem. Formal analysis techniques, often referred to as formal verification, aresupported by (semi-)automated tools. They can detect errors in the design thatare not so easily found using emulation or testing, and can be used to establishthe correctness of a design. The most effective way to apply formal methods isactually during the design of a system, rather than after-the-fact, as is, unfortu-nately, often the case.

Formal models need to be realistic. An experimental evaluation can help toobtain a first insight in the behavior of a system and to identify which character-istics need to be included into the model.

There are two main approaches to formal verification. The first approachinvolves a rigorous mathematical analysis of the properties of the system, us-ing results from, for instance, calculus and probability theory. Such an analysiscan be supported semi-automatically by means of MATLAB or a theorem prover.MATLAB is an easy to use mathematical tool, but introduces imprecision due to

1.4. Model-Based Analysis Techniques 7

numerical approximations. On the other hand, theorem prover requires a lot ofeffort from the user but supports general mathematical reasoning about the sys-tem. Important theorem proving tools are Isabelle/HOL [151], PVS [155] andCoq [42].

The second approach is model checking, which consists of a systematic andfully automatic exploration of the state space of the system specification. Theexplorative nature of the approach in principle requires that the state space isfinite. However, recent work also addresses symbolic model checking techniquesfor infinite-state models [162].

1.4.1 Rigorous Mathematical Analysis

Rigorous analysis techniques for gossip protocols are built on sound mathe-matical foundations and draw inspiration from the mathematical theory of epi-demics [17, 83]. These analysis techniques are in general used to verify specificproperties of a gossip protocol. Therefore, such a study is usually done on a sim-plified system model of the actual protocol: one has to decide which characteris-tics of the protocol should be studied (see Section 1.1), and which parameters ofthe protocol should be modelled in order to study these characteristics.

Gossip protocols are intrinsically probabilistic. For instance, a node may ran-domly select a “gossip” partner or a data item for the exchange with anothernode. Or it may be the case that when a node receives a message, then with someprobability p it forwards the message to all (or some) of its neighbors, while withprobability 1 − p the message is purged. A key property is that if the probabilityp is sufficiently large, and the network sufficiently dense, then the probabilityof successful information spread remains close to 1, while the number of sentmessages is relatively small compared to flooding.

Thus, the mathematical foundations underlying the modelling and verifica-tion of gossip protocols can be found in probability theory, combinatorics, mathe-matical physics, optimization theory, spectral theory, statistical mechanics, graphtheory and so on.

Mathematical analysis can be combined with simulations to validate the re-sults and understand the system behavior. A strong point of mathematical anal-ysis is that often it scales well with respect to the size of a network. However, itrequires a lot of effort, it can only be used to analyze a limited class of properties,and the assumptions that are invariably made to simplify the analysis often affectthe accuracy of the results [49]. For example, the analyses in [125, 130, 131] relyon the assumption of full knowledge of group membership, ignoring its practicalinfeasibility.

Hand-crafted Markov chains

Markov chains can be used for modelling a variety of aspects of gossip proto-cols. Markov chains allow to capture the evolution of gossip-based systems; each


state of the Markov chain describes one state of the system. However, a state ofthe Markov chain does not represent a state of the whole system, but only a stateof the system with the subset of parameters that are modelled. From one stateof the Markov chain to another, there are probabilistic transitions correspond-ing to the probabilistic evolution of the system. There are two types of Markovchains, distinguished by the transitions occurring at any time (continuous-timeMarkov chain) or only for defined steps (discrete-time Markov chain). By ana-lyzing the different possible sequences (and their probabilities), it is possible toobtain a global insight in the operation of the protocol. The Markov chain de-scribing the system evolution converges to a useful distribution over all possiblesystem states, from which interesting protocol properties can be concluded. Theinterested reader is referred to [105, 161].

The examples of the use of Markov chains for gossip and related protocols in-clude studies on gossip-based membership management [3, 34, 46, 100], gossip-based distributed aggregation [50, 51, 78, 129, 149], gossip-based informationdissemination [65, 72, 82] and network topology change [91]. Other methodsof probability theory has further been applied to analyze gossip-based informa-tion dissemination [44, 60, 61, 92, 125, 141] and gossip-based resource location[130, 131].

Deterministic Approximation

For large-scale networks, analysis can be performed by measuring a set ofnodes in a possible state over time instead of modelling every node individually.Then, the evolution of such a system can be approximated by a deterministicequation.

Epidemical models fit into this framework, for example, see [6, 17, 164, 166,179]. In a deterministic model of epidemics, individuals in the population aresplit into different groups, each of which represents a specific state of the epi-demic. The transition rates from one group to another are expressed as deriva-tives, and the model is described as a system of differential equations.

1.4.2 Model Checking

Model checking is an exhaustive state space exploration technique that isused to validate formally specified system requirements with respect to a formalsystem description [64]. Such a system is verified for a fixed configuration; so inmost cases, no general system correctness can be obtained.

Using some high-level formal modelling language, an underlying state spacecan be automatically derived, be it implicitly or symbolically. The system re-quirements are specified using some logical language, like LTL, CTL or exten-sions thereof [116]. Well-known and widely applied model-checking tools areSPIN [114], CADP [86], Uppaal [38] (for timed systems), and PRISM [113] (for

1.4. Model-Based Analysis Techniques 9

probabilistic systems). The system specification language can, for example, bebased on process algebra, automata or Petri nets.

Model checking suffers from the so-called state space explosion problem,meaning that the state space of a specified system grows exponentially withrespect to its number of components. The main challenge for model checkinglies in modelling large-scale dynamic systems. To overcome the state explosionproblem and to speed up the verification process, various state space reductiontechniques have been proposed. For instance, combinations of symbolic verifi-cation techniques with explicit state space exploration (symbolic model check-ing), verification of properties on a smaller abstract model of the system underscrutiny, possibly obtained after bisimulation reduction, parallelization of verifi-cation algorithms, partial exploration of the state space (bounded model check-ing, on-the-fly model checking, partial order reduction), and efficient state repre-sentation (BCG and SVC formats), have been proposed to make model checkingpractically feasible.

Initial model checking approaches used as underlying mathematical model afinite-state automaton, that is, a model with neither explicit time nor probabili-ties. Recently, model-checking techniques have been proposed for system modelsincluding both time and probabilities, possibly in combination with nondeter-minism. In view of the probabilistic features in gossip protocols, we focus onmodel checking of probabilistic models.

Probabilistic and Stochastic Model Checking

In probabilistic and stochastic model checking, the underlying system modelis not represented by an automaton, but instead as a stochastic process of somesort, mostly a finite-state Markov chain (discrete- or continuous-time). Often,these Markov chains are extended with state labels and transition labels, so-called action names. These Markov chains are mostly specified using some high-level formalism, like stochastic process algebra [63] or stochastic Petri nets [13,35].

Gossip protocols may require models that, in addition to pure probabilisticchoices, also allow for nondeterministic choices. That is, it is possible in a givenstate of a system to non deterministically move to another state with some prob-ability. Here, Markov decision processes can be applied [160]. The key idea to aMarkov decision process is to allow a set of probability distributions in each stateinstead of a single distribution as in Markov chains. The choice between thesedistributions is made externally and non deterministically, either by a schedulerthat decides which sequential subprocess takes the next step (as in, for example,concurrent Markov chains), or by an adversary that influences or affects the sys-tem. Probabilistic choices are internal to the process and made according to theselected distribution.

The system requirements of interest are again specified through logical ex-pressions over the paths that can be taken through the model. For that pur-


pose, the logics are extended to include a notion of time and probability. Promi-nent examples of such logics are CSL [15, 16], CSRL [104] and asCSL [14] forcontinuous-time models, and pCTL [102] for discrete-time models.

Where traditional model checking algorithms lean heavily on determiningreachability of certain states or state groups (or the nonreachability), in proba-bilistic and stochastic model checking also the time until some states are reached(or avoided) plays a major role. Furthermore, reachability of states is expressedin terms of a probability (no mutually exclusive yes or no) and a timebound.As an example, certain states might be highly probably reached for short timeperiods, but not for longer time periods. Stochastic model checking relies onalgorithms for reachability analysis, as well as on numerical algorithms for de-termining long-term and transient behavior in Markov chains. Although suchalgorithms are well understood, their implementation requires care, especially ifvery large models are to be addressed.

Stochastic and probabilistic model checking has been applied in a wide va-riety of case studies, ranging from workstation cluster availability [106] to theevaluation of power-saving methods [152] and the analysis of wireless (sensor)networks [144].

Gossip protocols with their probabilistic and asynchronous behavior fit wellto the model classes supported by the known model checking techniques. How-ever, the key to successfully verifying gossip-based systems lies in coping withtheir scale. Some of the optimization techniques to deal with large networks forgeneral modelling techniques have been adapted to probabilistic model check-ing, in particular: abstraction [55, 128, 137, 139], distribution [39, 40, 98] andMarkovian bisimulation [127].

Approximate and Statistical Probabilistic Model Checking

An alternative approach to cope with state explosion for probabilistic systemsis found in approximate probabilistic model checking. The main idea of thisapproach is to apply Monte Carlo sampling techniques [87, 101]; the resultingprobabilities are accurate only with respect to some accuracy criterion.

Approximate probabilistic model checking [98, 109] is an approximationmethod for the logic restricted to time-bounded safety properties (“positive”LTL). Monte Carlo model checking [97] is based on a randomized algorithm forprobabilistic model checking of safety properties for general LTL model checking;Monte Carlo model checking uses the optimal approximation algorithm of [68].

In so-called statistical probabilistic methods (for example, [180]), statisti-cal hypothesis testing is used instead of randomized approximation schemes.The approach of [181] describes a model-independent procedure for verifyingproperties of discrete-event systems based on Monte Carlo simulation and sta-tistical hypothesis testing. The procedure uses a refinement technique to buildstatistical tests for the satisfaction probability of CSL formulas. The statisticalmethod of [165] concentrates on model checking of black-box probabilistic sys-

1.5. Challenges 11

tems against specifications given in a sub-logic of CSL.Similar to the idea of approximation-based probabilistic model checking, [84]

combines probabilistic model checking with Monte Carlo simulations for the per-formance analysis of probabilistic broadcast protocols in a wireless network. Inparticular, this study shows results for reliability and reachability properties un-der different assumptions, such as message collision, lossy channels and unreli-able timing, and their impact on the results.

A case study in [73] presents the modelling of a sensor network using ap-proximate probabilistic model checking. Another case study [55] presents theresults of an analysis of the MAC protocol for sensor networks using approximateprobabilistic model checking. eXtended Reactive Modules (XRM) [74] have beenproposed for modelling wireless sensor networks to generate RM models suitablefor PRISM and approximate probabilistic model checking.

Approximate probabilistic model checking could serve as a good compromisebetween probabilistic model checking and simulation. They do not provide an ex-haustive search to verify a given property, and as a result they do not suffer fromthe state explosion problem that much. Still in practice they can provide ratheraccurate probabilistic estimates. But approximate probabilistic model checkingis still coming off age, and needs to be further developed.

1.5 Challenges

The formal analysis of gossip protocols is a rather unexplored research field,with many challenges and open problems. A more insightful and systematicmethodology should be developed that targets gossip protocols. The assumptionsmade for simplifying such an analysis should be restricted as much as possible,as otherwise the analysis itself becomes unrealistic.

Markov chains give a precise mathematical description, but the analysis istime-consuming and can be used only for a restricted class of properties. Itwould be worthwhile to use theorem proving tools in order to support such amathematical analysis, like in [108].

Probabilistic model checking techniques are convenient to use, as they arebased on verification algorithms. But formally modelling a gossip protocol stillrequires considerable effort, and can introduce mistakes by itself (if the modeldeviates from the actual protocol). Also the verification algorithms are very muchunder development, and probabilistic model checking is, even more than stan-dard model checking, suffering from the state explosion problem. This compli-cates the analysis of gossip protocols considerably, as they are supposed to beapplied in large-scale networks. The use of optimization techniques, like ab-straction and distributed verification, will form important ingredients for modelchecking to become practically of interest for the evaluation of gossip protocols.


1.6 Objectives

We outlined and published the challenges from Section 1.5 in [18], followingthe Lorentz Center workshop in December 2006 devoted to gossip-based com-puter networking. All these challenges and problems related to them are coveredin this thesis. In particular, we address the problem of analyzing gossip protocolsin peer-to-peer and ad hoc networks. Moreover, we aim at developing analyticalframeworks for these protocols. We now formulate the objectives (in a cursivetypeface) and summarize how they are addressed in the next chapters.

Is there a level of abstraction for modelling a gossip protocol to allow for op-

timization of the protocol parameters? Can the performance of the protocol beexplained through the model? Can the model clarify the relationship betweenthe protocol parameters? To address all these questions, we develop the formalmodel of a gossip protocol based on the pairwise interaction between two gossip-ing nodes. Although this model fulfills all the above criteria, it accurately predictsthe behavior of the protocol only for a certain set of network configurations, dueto the assumptions made for simplifying the modelling task.

Thus, the next question we address in this thesis is the following: is it possible

to revise our model relaxing the assumptions made previously? We start with ouranalytical model and extend it with a component based on statistical analysis.The resulting model scales well compared to the models produced by traditionalprobabilistic verification techniques, and accurately predicts the performance ofthe protocol in very large networks. Moreover, we are able to observe the impactof different network configurations on the protocol without relying on a pureexperimental approach.

Does it make sense to use traditional probabilistic verification techniques for such

a large-scale application? To increase the network size that can be modelled incommonly used probabilistic model checker, we use our abstract analytical modelas the basis for the formal protocol specification. Gossip protocols exhibit non-deterministic behavior. This brings us to the question of whether the resolutionof the nondeterminism can significantly affect a chosen property of the protocolunder study. In general, we are interested in exploring the impact of modellingchoices that are underspecified in a protocol. To that end, we build and com-pare models produced by different modelling frameworks, including traditionalepidemics-like equational models and probabilistic model checking.

In order to fight the state space explosion problem, analysis of a discrete system,such as a system executing a gossip protocol, can make use of a continuousapproximation. Notably, the mean-field method approximates a distribution ofnodes in the set of possible states over discrete time for the case that the numberof interacting nodes tends to infinity. We demonstrate that the mean-field methodis suitable for gossip protocols by developing a modelling framework.

The mean-field method is hardly used for communication networks because amean-field model tends to abstract away the underlying topology, and computingof the transition probabilities matrix is a time-consuming, error-prone procedure.

1.7. Organization of the Thesis 13

Therefore, we extend our mean-field framework to allow for the performance eval-

uation of dynamic gossip networks. We identify a class of stochastic systems ofnodes for which the construction of the mean-field model can be automated, anddevelop a tool that performs automatic computation depending on the user-inputspecification.

All analytical results presented in this thesis are supported by large-scaleMonte Carlo simulations using PeerSim [122], and self-implemented Java-basedand Python-based simulators.

1.7 Organization of the Thesis

The thesis is organized in eight chapters, and Figure 1.2 depicts relationsbetween them.

(0, 0) (0, 1)

(1, 0) (1, 1)

P(10|01

)P(01|10

)

P (11|10)

P (10|11)

P(0

1|1

1)

P(1

1|0

1)

P (00|00) P (01|01)

P (10|10) P (01|01)

dx

dt

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

rep

lica

s (

% o

f n

od

es)

rounds

simulation (average)model

N → ∞

A B ⇒

(0, 0) (0, 1)

(1, 0) (1, 1)

P(10|01

)P(01|10

)

P (11|10)

P (10|11)

P(0

1|1

1)

P(1

1|0

1)

P (00|01)

P (00|00) P (01|01)

P (10|10) P (01|01)

Ch. 5Ch. 3Ch. 5

Ch. 6 & 7

Ch. 4

Figure 1.2: Pictorial representation of the thesis structure

Chapter 2: Background on Gossip Protocols. The next chapter introduces afamily of gossip protocols, analyzed in this thesis. It first surveys the applicationdomain of gossip protocols, and compares them, giving examples of commonmeasures of interest for various applications of gossip. This survey is followedby the description of the shuffle protocol, a gossip push-pull protocol for shar-ing data in a distributed network. Two properties of this protocol are specified:


how fast an observed datum is replicated through a network, and how fast theobserved datum covers the network.

Chapter 3: Pairwise Exchange Model without Failures. This chapter ana-lyzes the shuffle protocol. It introduces a novel approach of analyzing gossipprotocols, modelling the protocol behavior at an abstract level as pairwise nodeinteractions. The analysis relies on the assumption of no communication fail-ures. With this model, we compute the optimal value of the protocol parametersto obtain fast replication.

Chapter 4: Pairwise Exchange Model in Lossy Networks. This chapter ex-plores the impact of lossy communication channels on the shuffle protocol. Itchallenges the simplifying assumptions made for the formal model from the pre-vious chapter, namely uniform distribution of data in the network. We presentan extended model, one component of which is based on statistical analysis.

Chapter 5: Comparison of Performance Models. This chapter presents andcompares different models of the shuffle protocol, produced in different tradi-tional modelling frameworks. Using the pairwise interaction model, we buildseveral formal models with the probabilistic model checker PRISM, and directlywith differential equations. The chapter explores the impact of modelling choicesmade for the parts of the protocol that are underspecified.

Chapter 6: Mean-field Framework. This chapter demonstrates that the exist-ing mean-field method for very large networks (N → ∞) is suitable for gossipprotocols. The chapter introduces a mean-field framework for the evaluation ofgossip protocols, accompanied by two case studies, the shuffle protocol and thegossip clock synchronization protocol GTP. The mean-field results for the shuf-fle protocol are compared with the differential equation model developed in theprevious chapter.

Chapter 7: Automating the Mean-Field for Dynamic Networks. This chap-ter focuses on the automation of mean-field analysis, and demonstrates how anetwork with pairwise communication between nodes can be modelled and ana-lyzed. This chapter extends the results of the previous chapter with the possibilityto encode physical node positions and to model dynamic networks. It determinesa class of stochastic systems of nodes for which the construction of the mean-fieldmodel can be automated.

Finally, Chapter 8: Concluding Remarks summarizes the main results of thisthesis, and discusses future work.

A general introduction into the field is provided by Chapter 1. Backgroundinformation, including a description of the shuffle protocol used as the main case

1.8. Origins of the Chapters 15

study in this thesis, is presented in Chapter 2. Together with Chapters 1 and 2,every other chapter can be read separately, and includes its own related work,conclusions and future work. The research questions, formulated earlier, areaddressed in Chapters 3–7, respectively.

1.8 Origins of the Chapters

Chapter 1 is based on a joint invited publication with Francois Bonnet, WanFokkink, and Boudewijn Haverkort. It appeared in [18] in the aftermath of theLorentz Center “Gossip-based Computer Networking” Workshop, held in Leidenin December 2006.

In Chapter 2, a survey of gossip protocols and the shuffle protocol descriptionare composed of parts of [21, 25]. Section 2.2 of the same chapter is basedon [94].

Chapter 3 is mostly based on the journal paper [28], which is an extendedversion of the conference paper [29]. Section 3.5 appeared as [26] in the contextof a card dealing procedure.

Chapter 4 is currently submitted for publication.Chapter 5 is a composition of two papers, the journal paper [28] with Daniela

Gavidia, Wan Fokkink and Maarten van Steen, and the conference paper [25]with Ansgar Fehnker.

Chapter 6 contains the material of joint work with Lucia Cloth, Wan Fokkink,and Boudewijn Haverkort, published first as a preliminary paper [20] in the post-proceedings of the Lorentz Center “Two Decades of Probabilistic Verification –Reflections and Perspectives” Workshop in Leiden in November 2007. It later ap-peared as an extended conference version [19], and, finally, as an invited journalversion [21].

Chapter 7 appeared as a conference paper [23].Since most of the material presented in the thesis is the results of collabo-

rations, as described above, I use “we” instead of “I” throughout the thesis toacknowledge this fact. The publications that are not a part of this thesis include[24, 27, 30].

2Background on Gossip

Protocols

In this chapter, we give background on gossip protocols at an algorithmiclevel. We start with a short introduction on gossip applications, summarizingmeasures of interest for each application. We also present a background on theshuffle protocol, originally introduced in [94] for the dissemination of informa-tion in a wireless environment. This protocol is used for the case studies in thenext chapters. Gossip protocols are often used for information dissemination. Wechoose the shuffle protocol as a prime example of this application.

2.1 A Generic Gossip Protocol

In gossip protocols, the information exchange between nodes can be imple-mented as one of the following policies: only the node that initiates a gossipsends local state data to its partner (push), a node-initiator requests state datafrom its gossip partner (pull), or both nodes send their state information to eachother (push-pull). In this thesis, we mainly consider push-pull gossip protocols.

while true do

wait (∆t time units)

p← RandomPeer();prepare(s);send s to p;

sp ← receive(·);s← Update(s, sp);

while true do

sp ← receive(·);prepare(s);send s to sender(sp);

s← Update(s, sp);

(a) active thread (b) passive thread

Figure 2.1: A gossip protocol

Figure 2.1 illustrates the skeletonof a generic push-pull gossip protocol.Each node has a local state s and ex-ecutes two different threads, an activeand a passive one. The active threadperiodically initiates a state exchangewith a random peer p by sending it amessage containing the local state s,after which it waits for a response. Thepassive thread waits for a message sentby an initiator and replies to it with itslocal state. The random peer selection

17

18 Chapter 2 — Background

is based on the set of neighbours as determined by a membership protocol (e.g.,[123]).

For a pair of nodes A and B, where A is the active node and B is the passiveone, we describe the protocol from the point of view of each participating node.In particular, node A picks a neighbor B at random (method RandomPeer())after a not necessarily constant (discrete) time span of length ∆t, and initiates astate exchange (gossip) with it. It does so by sending (part of) its local state sto B, and waits for B’s response. Upon receipt of the response, node A updatesits local state (according to the method Update(s, sp)). In response to beingcontacted by A, node B sends (part of) its local state to A and updates its localstate accordingly (method Update(s, sp)).

Applications of Gossip

The method Update is protocol specific. It updates the local state of a nodebased on the previous local state, and the state information received from therandom gossip partner. In gossip-based information dissemination protocols (asin, e.g., distributed news service protocols [94, 120]), a finite list of data items(e.g., news items), called the cache, composes the local state of a node. Thegeneric operation prepare(s) in Figure 2.1 is replaced by an operation s ←RandomItems(). The method Update merges the list of old items with the listof received items. The measures of interest of these protocols include the num-ber of copies of a data item in the network after some time, and the amount oftime needed for the data to spread in the network.

In gossip-based membership management protocols, a finite set of peer ad-dresses, called the partial view, comprises the local state of a node. The methodUpdate (as in [3, 177]) creates a new state through a sample of the union of theold and the received views. The performance metrics of these protocols includea distribution of the partial view size, and the number of nodes reached in thepresence of node failures.

In probabilistic broadcasting (e.g., [184]), the state of a node is a flag thatrecords whether the node is infected. The method Update sets the state to in-fected if the received state is infected. The performance of these protocols canbe measured by the time until all nodes have been infected.

In gossip-based distributed aggregation (e.g., [121]), the state of a node isa numeric value, which can be any parameter of the environment, such as atemperature or the current load. All values at nodes contribute to an aggregatevalue, computed using some aggregation function, for instance, average, sum,etc. The method Update simply returns the result of the aggregation function.For these protocols, a general measure of interest is the convergence of results ofthe aggregation function, but other measures depend on the aggregation functionchosen.

We refer to [132] for a survey on gossip applications.

2.2. A Gossip Protocol for Information Dissemination 19

2.2 A Gossip Protocol for Information Dissemination

We use the shuffle protocol, introduced in [94], as the main case study inthe thesis. The shuffle protocol is, at heart, a simple push-pull gossip protocolwhich can be used also in wired networks. The protocol disseminates data itemsof general interest throughout a network of small devices.

The network consists of a collection of nodes, each of which contributes a lim-ited amount of storage space (which we will refer to as the node’s cache) to storedata items. Nodes executing the shuffle protocol initiate a shuffle periodically. Inorder to execute the protocol, the initiating node needs to contact a gossip part-ner and shuffle data items from its cache with the partner. Such a random partneris delivered by an underlying layer that keeps track of the neighborhood mem-bership. In a wired environment, this service could be provided by, for instance,a peer sampling service [123] running at each node. For wireless environments,the neighborhood is determined by the radio connectivity between nodes.

Items can be published by any user of the system and are propagated throughthe network. An item is a piece of information, and for each item several copiesmay exist in the network. As items are gossiped between neighboring nodes,replication may occur when a node has available storage space to keep a copy ofan item it just gossiped to a neighbor.

The behavior of the protocol is probabilistic in nature. A node chooses arandom partner to communicate with, and it exchanges a random subset of dataitems from its cache. Additionally, the shuffle protocol exhibits nondeterministicbehavior in that the order in which the nodes initiate a gossip is not determined.

2.2.1 Protocol Assumptions

All nodes have a common agreement on the frequency of gossiping. However,there is no agreement on when to gossip. In terms of storage space, we assumethat all nodes dedicate the same amount of storage space to keep items locally,and that all items are of the same size. Therefore, we say that each node has acache of fixed size c. When shuffling, each node sends a fixed number s of the citems in the cache.

2.2.2 Protocol Description

Every node maintains a finite list of data items in a cache. The shuffle protocolis executed at nodes initiating a contact and their contact partners. Both nodesexchange a random subset or the whole set of items from their local caches inone contact. Each node agrees to keep the items received from its gossip partner.Taking into account the size of a cache, a node might have to discard some itemsfrom the cache in favor of the items received during an exchange. The items tobe discarded are picked from the ones that have been sent to the peer. Since the

20 Chapter 2 — Background

while true dowait (∆t time units)B := randomPeer()sA := itemToSend(cA);send sA to B;

sB := receive(·);cA := itemKeep(cA\(sA\sB), sB\cA);

while true do

sA := receive(·);sB := itemToSend(cB);send sB to sender(sA);cB := itemKeep(cB\(sB\sA, sA\cB);

(a) initiating node (b) contacted node

Figure 2.2: The shuffle protocol algorithm

peer, in its turn, has agreed to store these items in its cache, discarding itemsdoes not lead to the loss of information in the network.

The shuffle algorithm is illustrated in Figure 2.2. The caches of nodes A andB are denoted by cA and cB, respectively. An exchange message (or buffer) sentfrom A to B is denoted as sA, and a message from B to A is denoted as sB. Thesize of the caches cA and cB is c, the size of the exchanged messages sA and sB

is s, and the total number of different items in the network is n.A node A periodically initiates a shuffle with a random neighbor B. It sends B

a message sA, containing a copy of s random items from its cache cA, and waitsfor a response from B. Upon receipt of sA from A, node B sends message sB

with s random items from its cache cB to A. Node A and node B then eliminateall received items that are already in their caches from sB and sA, respectively.A adds the remaining received items (sB\cA) to the cache. It adds them byreplacing items in its cache cA that were sent to, but not received from B. Thisis the set sA\sB, and its elements may be replaced randomly, while elements incA\(sA\sB) will be kept. This ensures that not only the total number of itemswill not exceed c, but also that the items sent from B to A do not get lost. In asimilar fashion node B adds the items in sA\cB to its cache, by replacing itemsin sB that were not in the received message sA. Figure 2.2 depicts pseudo-codefor the algorithm in the style of the skeleton in Figure 2.1. Note that the codefor the node that initiates the contact (i.e. Figure 2.2(a)) differs from the codeof the contacted node (i.e. Figure 2.2(b)).

We refer to [94] for a more detailed description. Figure 2.3 gives an exampleof two nodes A and B executing the shuffle protocol and gossiping with eachother with protocol parameters n = 8, c = 5 and s = 3.

2.2.3 Properties

We study the performance of the protocol focusing on two properties that canbe observed in large deployments: i) the number of replicas of an item in thenetwork, and ii) the coverage achieved by an item over time.

2.2. A Gossip Protocol for Information Dissemination 21

ABs \c

BAc’

Ac’

B

b

c

f

B

h

d

b

c

d

f

h

d

h

b b

c

g

b

a

A

b

c

f

g

B

d

1

3

g

e

c

b

a

A

c

b

a

A

c s s c

g

e

ccBA A B

B

s sA B

b b

c

gd

h

h

d

b

b

c

f

B

d

b

h h

B A

4

2

b

a

A

b

c

g

g

d

h

c

e

d

s \c

e

ABc \(s \s )

A A Bs \c

BA

b

c

g

c \(s \s )B B A

s \c

Figure 2.3: Illustration of the shuffle algorithm with n = 8, c = 5 and s = 3.(1) Selecting random data. (2) Exchanging data. (3) Determining which data tokeep. (4) Merging cache with received data. Black items are the ones that nodesdecide to discard.

Replication This property is defined as the number of nodes that hold a copyof an observed item d in their local cache at a given time. With every shuffle,a copy of the item can be created (or discarded), if it is in the cache of thegossiping node. Owing to the limited storage space c at each node and randomchoice of the exchanged items, the number of copies of all data items constantlyfluctuates in the network, with all items having the same chance to be replicatedor discarded. Thus, in a network with n different items, the average fraction ofnodes that hold a replica of an item converges to c

n.

Coverage This property is defined as the fraction of nodes that have held anobserved item d since it was introduced into the network until a given time. Dueto the periodic nature of the protocol, copies of an item continually move throughthe network. Over time, the fraction of nodes discovering item d grows, eventu-ally converging to 1. The speed of convergence is influenced by the parametersof the protocol: n, c and s.

3Pairwise Exchange Model

without Failures

As pointed out in the introduction, gossip protocols usually contain designparameters that influence the nonfunctional properties of these protocols suchas performance. The relationship between the values of these parameters andprotocol performance can be inferred through extensive large-scale Monte Carlosimulations for a wide range of system configurations.

In this chapter, we develop an analytical model of information disseminationfor the shuffle protocol, assuming no communication failures. We define twoproperties of the protocol to validate our model: (i) how fast an item is repli-cated through a network, and (ii) how fast the item covers the network. Withthis model we determine the optimal size of the exchange buffer to obtain fastreplication. Our results are confirmed by large-scale simulation experiments.

3.1 Introduction

We develop an analytical model of the shuffle protocol, described in the pre-vious chapter. Recall that nodes executing the protocol periodically contact eachother, and exchange data items. Concisely, a node initiates a contact with arandom neighbor, pulls a random subset of items from the contacted node, si-multaneously pushing its own random subset of items.

The main focus of this study is a thorough probabilistic analysis of informationdissemination in a large-scale network using the shuffle protocol. Our modellingframework for a gossip protocol differs from the ones presented by others in thatit does not follow the traditional modelling using the mathematical theory ofepidemics [83]. Instead, the behavior of the protocol is modelled at an abstractlevel as pairwise node interactions. When two neighboring nodes interact witheach other (gossip), they may undergo a state transition (exchange items) witha certain probability. The transition probabilities depend on the probability that

23

24 Chapter 3 — Pairwise Exchange Model without Failures

a given item in a node’s local storage has been replaced by another item after theexchange.

The analytical expressions for these probabilities are derived. One of theexpressions depends not only on the number of items, message size, and localstorage size, but also on the number of items both gossiping nodes have in com-mon, in particular, how many of such items the contacted node receives duringthe exchange. The expression is complex because it incorporates the expectedvalue of the number of such items, and a connection between the parameters isnot obvious. We also determined a close approximation that is expressed by amuch simpler formula, as well as a correction factor for this approximation al-lowing for precise error estimations. Thus we obtain a better understanding ofthe emergent behavior of the protocol and how parameter settings influence itsnonfunctional behavior.

We investigated two properties characterizing the protocol, namely, the num-ber of nodes that have ‘seen’ a given item over time (coverage), and the numberof replicas of this item in the network at a certain moment in time (replica-tion). Using the values of the transition probabilities, we determined the optimalnumber of items to exchange per gossip, for a fast convergence of coverage andreplication. All our modelling and analysis results are confirmed by large-scalesimulations, in which simulations based on our analytical models are comparedwith running the actual protocol. We believe that the developed model is an ac-curate, realistic formal model that can be used to optimally design and fine-tunea given gossip protocol. In this sense, our main contribution is demonstrating thefeasibility of a model-driven approach to developing real-world gossip protocols.

3.2 Assumptions

As mentioned earlier, we use the shuffle protocol (described in Chapter 2) forour case study. For our analysis in this chapter, we assume that:

• nodes do not join or leave the network, and do not fail;• nodes communicate over perfect, error-free channels.

The gossip exchange can then be seen as an atomic procedure. Thus, on theimplementation level, once a node initiates an exchange with another node, thispair of nodes cannot become involved in another exchange until the current ex-change is finished.

3.3 Experimental Observations

This section focuses on an important aspect that we have observed during ourextensive simulation study of the shuffle protocol: the tendency of items to repli-cate to even levels. In other words, a newly published item generates replicasin the network over time until its number of replicas reaches a level comparableto the number of replicas of other items. This comes as a consequence of the

3.3. Experimental Observations 25

random selection of items to gossip and to keep in local storage. By applyingrandom selection, no item is favored over the others resulting in a fair division ofthe storage space. That is, once the system has reached equilibrium, each itemin the network will have, on average, the same number of replicas (N ·c

n).

Figure 3.1 shows a set of experiments where the distribution of replicas forall the items in the system is tracked over several gossip rounds. In all cases,the network consists of N = 2500 nodes with a cache size of c = 100 and eachnode sends s = 50 items when it gossips. The number of different items that areinserted in the network is n = 500. Each graph shows three curves correspondingto the following initial scenarios:

1. Nodes are arranged in a 50× 50 grid. The insertion of items to be gossipedoccurs simultaneously at round 0. All nodes start with an empty cache,except for 500 randomly chosen nodes. A unique item is placed in the cacheof each of these 500 nodes. As a result, at round 0 the network contains 500different items and each of these items has a single replica.

2. Nodes are arranged in a 50× 50 grid. The insertion of items to be gossipedoccurs at randomly chosen times before round 100. At round 100, all 500items will be present in the network. However, items that were insertedearlier will have more replicas due to having been shuffled for a longer time.

3. Topology with higher density at the center. The insertion of items to be gos-siped occurs at randomly chosen times before round 100. In this topology,nodes near the center have more neighbors to gossip with.

As can be seen from the graphs, the different initial conditions result in dif-ferent replication patterns early on. However, note how as time progresses thereplication patterns become more and more similar. By round 350, it is clear thatitems have on average N ·c

n= 500 replicas, regardless of the initial conditions

of the experiment. This convergence to an equilibrium where the storage space(N · c) is evenly divided between the number of items present (n) is a result ofthe repeated execution of the protocol.

In the shuffle protocol, there is no loss of information during the gossip ex-change. We assume that the shuffle operation between two nodes is atomic andsince nodes swap items, no item can possibly disappear once inserted into thenetwork. For this reason, once the system has reached equilibrium in terms ofreplication of items and assuming that there is a path between any two nodes inthe network, the replicas will continue to be shuffled, moving from one node toanother in a random fashion. Under these conditions, it is reasonable to assumethat the probability of finding any given item in a node’s cache is the same for allitems in the networks. To be more specific, based on our observations we makethe assumption that items are uniformly distributed throughout the network andthat any item can be found in a node’s cache with probability c

n. This is the start-

ing point for our probabilistic analysis. The observation of uniform distributionis not entirely unexpected. Similar observations about uniform distribution afterrepeated shuffling have been made regarding the shuffling of decks of cards in[8, 37].


0

2

4

6

8

10

12

14

16

0 200 400 600 800 1000

nu

mb

er

of

ite

ms

number of replicas

1. grid, insert at round 02. grid, insert at random

3. dense center, insert at random

round 100

13

0

2

4

6

8

10

12

14

16

200 300 400 500 600

number of replicas



round 150

1

0

2

4

6

8

10

12

14

16

200 300 400 500 600 700 800

nu

mb

er

of

ite

ms

number of replicas



round 200

1 2

3

0

2

4

6

8

10

12

14

16

0 200 400 600

number of replicas



round 250

0

2

4

6

8

10

12

14

16

200 300 400 500 600 700 800

nu

mb

er

of

ite

ms

number of replicas



round 300

2

0

2

4

6

8

10

12

14

16

200 300 400 500 600

number of replicas



round 350

3.4. An Analytical Model of Information Dissemination 27

3.4 An Analytical Model of Information Dissemination

We analyze dissemination of a generic item d in a network in which the nodesexecute the shuffling protocol.

3.4.1 Probabilities of State Transitions

(0, 0) (0, 1)

(1, 0) (1, 1)

P(10|01

)P(01|10

)

P (11|10)

P (10|11)

P(0

1|1

1)

P(1

1|0

1)

P (00|00) P (01|01)

P (10|10) P (01|01)

Figure 3.2: Symbolic represen-tation for caches of gossipingnodes.

This section presents a model of the shuf-fle protocol that captures the presence or ab-sence of a generic item d after shuffling of twonodes A and B. There are four possible statesof the caches of A and B before the shuffle:both hold d, either A’s or B’s cache holds d,or neither cache holds d.

We use the notation P (a2b2|a1b1) for theprobability that from state a1b1 after a shufflewe get to state a2b2, with ai, bi ∈ 0, 1. Theindices a1, a2 and b1, b2 indicate the presence(if equal to 1) or the absence (if equal to 0)of a generic item d in the cache of an initiatorA and the contacted node B, respectively. Forexample, P (01|10) means that node A had dbefore the shuffle, which then moved to thecache of B, afterwards. Due to the symmetryof information exchange between nodes A and B in the shuffle protocol,

P (a2b2|a1b1) = P (b2a2|b1a1).

Figure 3.2 depicts all possible outcomes for the caches of gossiping nodes asa state transition diagram. If before the exchange A and B do not have d (a1b1 =00), then clearly after the exchange A and B still do not have d (a2b2 = 00).Otherwise, if A or B has d (a1 = 1∨ b1 = 1), the shuffle protocol guarantees thatafter the exchange A or B still has d (a2 = 1∨ b2 = 1). Therefore, the state (0, 0)has a self-transition, and no other outgoing or incoming transitions.

We now determine values for all probabilities P (a2b2|a1b1). They are ex-pressed in terms of probabilities Pselect and Pdrop. The probability Pselect expressesthe chance of an item to be selected by a node from its local cache when en-gaged in an exchange. The probability Pdrop represents a probability that an itemwhich can be overwritten (meaning it is in the exchange buffer of its node, butnot of the other node in the shuffle) is indeed overwritten by an item received byits node in the shuffle. Due to the symmetry of the protocol, these probabilitiesare the same for both initiating and contacted nodes. In Section 3.4.2, we willcalculate Pselect and Pdrop. We write P¬select for 1− Pselect and P¬drop for 1− Pdrop.

State (0, 0). Before shuffling, neither node A nor node B have d in their cache.


a2b2 = 00: neither node A nor node B have item d after a shuffle because neitherof them had it in the caches before the shuffle:

P (00|00) = 1

a2b2 ∈ 01, 10, 11: cannot occur, because none of the nodes have item d.

State (0, 1). Before shuffling, a copy of d is only in the cache of node B.a2b2 = 01: node A does not have d because node B had d but did not select it

(to send) and, thus, B did not overwrite d, i.e., the probability is

P (01|01) = P¬select

a2b2 = 10: only node A has d because node B selected d and dropped it; that is,the probability is

P (10|01) = Pselect · Pdrop

a2b2 = 11: both nodes A and B have a copy of d because node B selected d andkept it; that is,

P (11|01) = Pselect · P¬drop

a2b2 = 00: cannot occur as completely discarding d is not possible in the pro-tocol; that is, if either nodes send an item, its partner keeps this copy aswell, and if an item is not among the selected for a shuffle, the item is notreplaced by another one (see Section 2.2.2).

State (1, 0). Before shuffling, d is only in the cache of node A. Due to the sym-metry of nodes A and B, this scenario is symmetric to the previous one with

P (a2b2|10) = P (b2a2|01).

State (1, 1). Before shuffling, d is in the cache of node A as well as in the cacheof node B.a2b2 = 01: only node B has d because node A selected d and dropped it and

node B did not select d; that is,

P (01|11) = Pselect · Pdrop · P¬select

a2b2 = 10: this outcome is symmetric to the previous one:

P (10|11) = P¬select · Pselect · Pdrop

a2b2 = 11: after the shuffle both nodes A and B have d, because:∗ nodes A and B had d but both did not select it, i.e., P¬select · P¬select;∗ both nodes A and B selected d (thus, both kept it),i.e., Pselect · Pselect;


∗ node A selected d and kept it and node B did not select d: Pselect ·P¬drop ·P¬select;∗ symmetric case with the previous one: P¬select · Pselect · P¬drop.

Thus,

P (11|11) = P¬select · P¬select + Pselect · Pselect + 2 · Pselect · P¬select · P¬drop

a2b2 = 00: cannot occur, discarding of an item is not permitted by the protocol(see Section 2.2.2).

3.4.2 Probabilities of Selecting and Dropping an item

The following analysis assumes that all node caches are full (that is, the net-work is already running for a while). Moreover, we assume a uniform distribu-tion of items over the network; this assumption is supported by experiments inSection 3.3 as well as in [94, 123].

Consider nodes A and B engaged in a shuffle, and let B receive the exchangebuffer SA from A. Let k be the number of duplicates (see Figure 3.3), i.e., theitems of an intersection of the node cache CB and the exchange buffer of itsgossip partner SA (i.e., SA∩CB). Recall from Section 3.2 that CA and CB containthe same number of items for all A and B, and likewise for SA and SB; we usec and s for these values. The total number of different items in the network isdenoted as n.

The probability of selecting an item d in the cache is the number of selecteditems (i.e., s) divided by the total number of items in the cache (i.e., c):

Pselect =s

c.

Thus, the probability that an item d in the cache is not selected is:

P¬select = 1− Pselect =c− s

c.

n

SA k CB

Figure 3.3: k items in SA ∩ CB

SA

SBbs

CB

Figure 3.4: s items in SA ∩ SB


Consider Figs. 3.3 and 3.4. The shuffle protocol demands that all items inSA are kept in CB after the shuffle. This implies that: a) all items in SA\CB

will overwrite items in SB ⊆ CB , and b) all items in SA ∩ CB are kept in CB.Thus, the probability that an item from SB will be overwritten is determined bythe probability that an item from SA is in CB, but not in SB. Namely, the itemsin SB\SA provide a space in the cache for items from SA\CB. We would liketo express the probability Pdrop of a selected item d in SB\SA (or SA\SB) to beoverwritten by another item in CB (or CA). Due to symmetry, this probability isthe same for A and B; therefore, we calculate only the probability that an itemin SB\SA is dropped from CB. The expected value of this probability dependson how many duplicates a node receives from its gossip partner a:

E[Pdrop] =

s∑

k=0

(P|SA∩CB|=k

drop· P|SA∩CB|=k) if s + c 6 n

s∑

k=(s+c)−n

(P|SA∩CB|=k

drop · P|SA∩CB|=k) otherwise

where P|SA∩CB |=k is the probability of having exactly k items in SA ∩ CB, and

P|SA∩CB |=k

dropis the probability that an item in SB\SA is dropped from CB given k

duplicates in SA ∩ CB. The case distinction is because if s + c > n, then clearlythere are at least (s + c)− n items in SA ∩ CB.

From the(n

s

)possible sets SA, we compute how many have k items in com-

mon with CB. Firstly, there are(

c

k

)ways to choose k such items in CB . Secondly,

there are(n−c

s−k

)ways to choose the remaining s−k items outside CB. So in total,(

c

k

)·(n−c

s−k

)possible sets SA have k items in common with CB . Hence, under the

assumption of a uniform distribution of the data items over the caches of thenodes, b

P|SA∩CB |=k =

(c

k

)(n−c

s−k

)(n

s

) .

The expected value of P|SA∩CB |=k

dropisc:

E[P|SA∩CB |=k

drop ] =

k∑

s=0

P|SA∩SB |=s

drop· P|SA∩SB |=s if s + k 6 c

k∑

s=(s+k)−c

P|SA∩SB |=s

drop · P|SA∩SB|=s otherwise

aThe other case is presented for the sake of completeness.bHere we use a generalization of the usual definition of binomial coefficients to negative integers.

That is, for all m and l ≥ 0,(m

l

)= (−1)l

(−m+l−1

l

). See also [111].

cThe other case is presented for the sake of completeness.


where s is the number of items in SA ∩ SB (see Figure 3.4). The case distinctionis because if s + k > c (with k the number of items in SA ∩ CB), then clearlythere are at least (s + k)− c items in SA ∩ SB.

Among the s items in SB , there are s items also in SA, and thus only the s− s

items in SB\SA can be dropped from CB. P|SA∩SB|=s

dropis the probability that an

item in SB\SA is dropped from CB, given s items in SA ∩ SB:

P|SA∩SB |=s

drop =

0 if s = ss−ks−s

otherwise

P|SA∩SB |=s is the probability of having exactly s items in SA ∩ SB:

P|SA∩SB|=s =

(s

s

)(c−s

k−s

)(

c

k

) .

The intuition behind this expected value of P|SA∩SB |=s is similar to the one of

P|SA∩CB |=k. From the(

c

k

)possible sets SA, we compute how many have s items

in common with SB. That is, there are(s

s

)ways to choose s items in SB, and(

c−s

k−s

)ways to choose the remaining k − s items outside SB.

Let us assume s + c ≤ n and s + k ≤ c. Then, substituting in the expressionfor E[Pdrop] in case s + c ≤ n, and noting that in the summand k = s the factor

P|SA∩SB|=s

drop is equal to zero, we get:

E[Pdrop] =

s−1∑

k=0

(c

k

)(n−c

s−k

)(n

s

)k∑

s=0

s− k

s− s

(s

s

)(c−s

k−s

)(

c

k

)

=n− c(

n

s

)s−1∑

k=0

((n− c)− 1

(s− k)− 1

) k∑

s=0

(c−s

k−s

)(s

s

)

s− s(3.1)

The probability of keeping an item d in SB\SA⊆CB can be expressed as P¬drop =1− Pdrop.

3.4.3 Simplification of Pdrop

In order to gain a clearer insight into the emergent behavior of the gossipprotocol we make an effort to simplify the formula for the probability Pdrop of anitem in SB\SA to be dropped from CB after a shuffle. Therefore, we re-examinethe relationships between the k duplicates received from a neighbor, the s items

of the overlap SA ∩ SB , and Pdrop. Let’s estimate P|SA∩CB|=k

dropby considering

each item from SA separately, and calculating the probability that the item is aduplicate (i.e., is also in CB). The probability of an item from SA to be a duplicate(also present in CB) is c

n. In view of the uniform distribution of items over the

network, the items in a node’s cache are a random sample from the universe of n


data items; so all items in SA have the same chance to be a duplicate. Thus, theexpected number k of items in SA ∩ CB can be estimated by s · c

n. The expected

number s of items in SA ∩SB can be estimated by k · sc, because only the k items

in SA ∩ CB may end up in SA ∩ CB; sc

captures the probability that an itemfrom CB is also selected to be in SB. It follows that the probability of an item inSB\SA to be dropped from CB after the shuffle is

Pdrop =s− k

s− s=

s− s · cn

s− s · cn· s

c

=n− c

n− s.

The complementary probability of keeping an item is

P¬drop = 1− n− c

n− s=

c− s

n− s.

These estimates are valid for general s ≤ c ≤ n.Substituting the expressions for Pselect and the simplified Pdrop into the formu-

las for the transition probabilities in Figure 3.2, we obtain:

P (01|01) = P (10|10) = c−sc

P (01|11) = P (10|11) = sc

c−sc

n−cn−s

P (10|01) = P (01|10) = sc

n−cn−s

P (11|11) = 1− 2 sc

c−sc

n−cn−s

P (11|01) = P (11|10) = sc

c−sn−s

1e-200

1e-180

1e-160

1e-140

1e-120

1e-100

1e-80

1e-60

1e-40

1e-20

1

30 60 90 120

rela

tive e

rror

exchange buffer size

c=250 n=700n=1100n=1500

1e-15

1e-10

1e-05

1

1 3 5 1e-15

1e-10

1e-05

1

1 3 5 0

1e-300

1e-250

1e-200

1e-150

1e-100

1e-50

1

60 120 180

exchange buffer size

c=500 n=1400n=2200n=3000

1e-15

1e-10

1e-05

1

1 3 5

1e-15

1e-10

1e-05

1

1 3 5

Figure 3.5: The relative error of the difference of the accurate Pdrop and its ap-proximation, for c = 250 (left) and c = 500 (right) and different values of n.

In order to verify the accuracy of the proposed simplification for E[Pdrop], wecompare the simplification and the accurate formula (3.1) for different valuesof n. We plot the difference of the accurate Pdrop and the simplification, forcache sizes c = 250 and c = 500 (Figure 3.5). These figures show that thesimplification gives a close approximation of the accurate formula for Pdrop (notethe log y-scale).


3.4.4 Correction Factor

We now investigate how closely the simplified formula of Pdrop, that is n−cn−s

(here referred to as S(n, c, s)) approximates formula (3.1) (here referred to asE(n, c, s)). We compared the difference between these two formulas using aJava package based on common fractions, which provides lossless calculation[95]. We observe that the inverse of the difference of the inverse values of bothformulas, i.e.,

ec,s(n) =(E(n, c, s)−1 − S(n, c, s)−1

)−1,

exhibits a certain pattern for different values of n, c and s.For s = 1 and arbitrary values of n and c, E(n, c, 1) = n−c

n, whereas S(n, c, 1) =

n−cn−1 . This leads us to investigate the correction factor θ as in E(n, c, s) = n−c

(n−s)+θ.

For s = 1 clearly the factor θ = 1. Yet, for s > 1 the situation is more complicated.We therefore calculate the first, the second and other (forward) differencesd overn.

For s = 1, the result of the first difference of the function ec,1(n) is 1. How-ever, for s = 2 the first difference is, e.g., if c = 4:

e4,2(7)− e4,2(6) = 3.5, e4,2(8)− e4,2(7) = 4, e4,2(9)− e4,2(8) = 4.5

and so on. Thus, we observe that the second difference of ec,2(n) is 12 . By cal-

culating higher differences for s > 2, we conclude that the s-th difference of thefunction ec,s(n) is always 1

s.

Moreover, at the point n = 0 the first, . . . , s-th differences of the function ec,s

exhibit a pattern similar to the Pascal triangle [163]. That is, for d ≥ 1 the d-thdifference is

(∆d ec,s)(0) =1

s ·(s−1

d

)

(assuming(a

b

)= 0, whenever b > a). Knowing the initial difference at point

n = 0, we were able to use the Newton forward difference equation [1] to derivethe following formula for n > 0:

E[Pdrop] =n− c

(n− s) + 1γ

, (3.2)

where

γ =

s−1∑

d=0

(n

d

)

s ·(s−1

d

) =

(n

s

)

(n− s) + 1·

s−1∑

d=0

1(n−d

(s−1)−d

) (3.3)

dA forward difference of discrete function f : Z → Z is a function ∆f : Z → Z with ∆f(n) =

f(n + 1) − f(n) (cf. [1]).


In this equation the sum is finite because due to the observation that the s-thdifference is constant 1

s, all higher differences are 0.

Thus, the correction factor is θ = 1γ

. Extensive experiments with Mathemat-

ica and MATLAB indicate thatn− c

(n− s) + 1γ

and formula (3.1) indeed coincide

numerically. We can also see from Figure 3.5 that the correction factor is small.In the next section, we will prove these two formulas are equal to each other. Themethodology applied in the current section will be further discussed in Chapter 8.

3.5 Another Expression for Pdrop

There is another expression for Pdrop. According to the assumption, SA is auniform selection from CA, which, in turn, is a uniform selection from the entirepopulation of n items. SA is chosen uniformly at random from CA, and SB ischosen from CB . Using this, we revise the probabilities in Section 3.4.2. Theprobability P|SA∩SB |=s of having exactly s items in SB ∩ SA is:

P|SA∩SB |=s =

(s

s

)(n−s

s−s

)(n

s

) ,

and the probability of dropping the item is:

E[Pdrop] =s∑

s=0

(P|SA∩SB|=s

drop· P|SA∩SB |=s).

Recall the shuffle protocol. After the exchange, node A uses random items fromSA to fill its cache (CA\SA) ∪ SB up to c items. That is, exactly |(CA\SA) ∩ SB|items are randomly chosen from SA\SB. Basically, the items in SA\SB (that arealso in CA) are replaced by the items in SB\CA. Since the numbers of items inSB\SA and in SA\SB are equal, a one-to-one correspondence can be arrangedby random selection of pairs from the sets. If an item in SB\SA is also in CA, thepair item from SA\SB is kept, and dropped, otherwise. Thus, the probability ofan item in SA\SB to be dropped depends only on the probability that a pair itemin SB\SA is not in CA. In view of the uniform selection, every item in SA\SB

has a fair chance of being replaced by an item in SB\CA. Moreover, an item inSB\SA has a chance to be any item in CA\SA. Hence, the probability that theitem is not in CA is n−c

n−s.

Thus, the probability P|SA∩SB |=s

dropof an item in SA\SB to be dropped from

CA, given s items in SB ∩ SA, is:

P|SA∩SB|=s

drop =

0 if s = sn−cn−s

otherwise

3.5. Another Expression for Pdrop 35

Putting the pieces together we obtain a formula E[Pdrop] for every s ≤ c ≤ n:

E[Pdrop] =

s−1∑

s=0

(s

s

)(n−s

s−s

)(n

s

) · n− c

n− s=

n− c

(n− s)(n

s

)(

s∑

s=0

(s

s

)(n− s

s− s

)− 1

)

=e n− c

n− s·(

1− 1(n

s

))

(3.4)

Formulas (3.1) and (3.4) are exact solutions of the counting problem, consti-tuting a combinatorial proof of their equality. Formula (3.2) has been derived byextensive numerical experiments. Its equality to the formula (3.4) (and transi-tively, to (3.1)) can be proven algebraically by applying the well-known Gosperalgorithm [96, 157].

Proposition 3.1. For n > 0, s > 0 and s ≤ n, the following equality holds:

s−1∑

d=0

(n

d

)(s−1

d

) =s

n− s

((n

s

)− 1

)(3.5)

Proof. We first modify the fraction of the binomials as follows:

(n

d

)(s−1

d

) =

(n

d

)(

s−1d−1

)· 1

d(s−1−d)

=s− 1− d

d·(n

d

)(

s−1d−1

) (3.6)

Instead of computing the sum on the left hand side of (3.5), Gosper’s algo-rithm [96, 157] allows us to get a result of the sum by solving the first-orderlinear recurrence equation

qd · f(d + 1)− rd−1 · f(d) = pd (3.7)

where f(d) is an unknown rational function of d, and qd, rd, pd are polynomialsin d.

Let ad =(n

d)(s−1

d ). The summand ad in (3.5) is a hypergeometric term. Thus, we

apply Gosper’s algorithm to see if the sum on the left hand side of (3.5) can beexpressed as a hypergeometric term. That is, given the hypergeometric term ad,

to find a hypergeometric term Zs =rs−1 · f(s)

ps

· as, such that:

s−1∑

d=0

ad = Zs − Z0

eVandermonde’s identity states that∑

k

i=0

(m

j

)(n−m

k−j

)=(n

k

)holds for binomial coefficients.


The term ratioad+1

ad

=n− d

s− 1− d, also defined by Gosper as

ad+1

ad

=pd+1

pd

qd

rd

, is

rational in d as expected. To find a nonzero rational solution f(d) of the first-order recurrence (3.7), we choose qd = −(n− d), rd = −(s− d− 1), pd = 1. Theequation

−(n− d) · f(d + 1) + (s− d) · f(d) = 1

has a nonzero solution f(d) = − 1

n− s. We substitute the solution f(d), the

summand as, and the polynomials rs and ps into the expression for Zs:

Zs(3.6)=

1

n− s

(n

s

)s(

s−1s−1

) =s

n− s

(n

s

), and Z0 =

s

n− s.

Hence,

s−1∑

d=0

(n

d

)(s−1

d

) = ZS − Z0 =s

n− s

((n

s

)− 1

)

Through simple algebraic manipulations, we finally obtain the following equal-ity:

n− c

(n− s) + ss−1∑

d=0

(n

d

)(s−1

d

)

=n− c

n− s·(

1− 1(n

s

))

(3.8)

3.6 Optimal Size for the Exchange Buffer

We study the optimal value for fast convergence of replication and coveragewith respect to an item d. Since d is introduced at only one node in the network,one needs to maximize the chance that an item is duplicated. That is, the proba-bilities P (11|01) and P (11|10) should be maximized (then P (01|11) and P (10|11)are maximized as well, intuitively because for each duplicated item in a shuffle,another item must be dropped).

We give a rigorous argument for this claim in the case of a fully connectednetwork. Let α be a replication of the distinguished item at a given moment, and0 ≤ α ≤ 1. In the long run, α converges to c

n. Suppose that a shuffle takes place.

The chance that after the shuffle there is one more copy of the distinguished itemin the system is

(P (11|01) + P (11|10)) · α · (1− α) = 2 · P (11|01) · α · (1− α)


Here α · (1− α) expresses the chance that exactly one of the nodes in the shufflecontains a copy of the item. Likewise, the chance that after the shuffle a copy ofthe item has been removed from the system is

(P (01|11) + P (10|11)) · α2 = 2 · P (01|11) · α2 = 2 · n− c

c· P (11|01) · α2

So after a shuffle, the change in the number of copies of the item in the systemis on average

2 · P (11|01) · α · (1 − α)− 2 · n− c

c· P (11|01) · α2 = 2 · P (11|01) · α · (1− n

c· α)

So as long as α < cn

, maximizing P (11|01) maximizes replication of the item.

50

55

60

65

70

75

80

85

90

0 250 500 750 1000

optim

al exchange b

uffer

siz

e

items (n)

Figure 3.6: Optimal value of exchange buffer size, depending on n.

These probabilities both equal sc

c−sn−s

; we compute when the s-derivative ofthis formula is zero. This yields the equation

s2 − 2ns + nc = 0

Taking into the account that s ≤ n, the only solution of this equation is

s = n−√

n(n− c)

We conclude that this is the optimal value for s to obtain fast convergence ofreplication (see Figure 3.6). This will also be confirmed by the experiments andanalysis in the following sections.


In order to test the validity of the analytical model of information spreadbased on the shuffle protocol presented in the previous section, we follow an


experimental approach. This section presents the comparison of properties ob-served while running the shuffle protocol in a large-scale deployment with sim-ulations of the model under the same conditions. These experiments show thatthe analytical model indeed captures information spread of the shuffle protocol.Note that a simulation of the analytical model is much more efficient (in compu-tation time and memory consumption) than a simulation of the implementationof the shuffle protocol, see Section 3.8.

The experiments simulate the case where a new item d is introduced at onenode in a network, in which all caches are full and uniformly populated by n =500 items. The protocol has been implemented in an event-based simulator thattakes as input the topology of the network to determine which pairs of nodescan gossip. The experiments are performed on a network of N = 2500 nodes,arranged in a square grid topology (50×50), where each node can communicateonly with its four immediate neighbors (to the North, South, East and West). Thisconfiguration of nodes is arbitrary, except for the requirement of a large numberof nodes for the observation of emergent behavior. Our aim is to validate thecorrectness of our analytical model, not to test the endless possibilities of networkconfigurations. The model and the shuffle protocol do not make any assumptionsabout the network. The network configuration is provided by the simulationenvironment and can easily be changed into something different, e.g., anothernetwork topology. For this reason, the large grid has been chosen for testing,although other configurations could have been possible. In the experiments thatfollow, after each gossip round, the following measurements are performed:

(i) the total number of occurrences of d in the network (replication), and(ii) how many nodes in total have seen d (coverage).

Simulations with the shuffle protocol Each node in the network has a cachesize of c = 100, and sends s items when gossiping. In each round, every noderandomly selects one of its neighbors and shuffles. In order to make a fair com-parison with the simulations with the model, the nodes gossip for 1000 roundsbefore initiating the measurements of the properties. After this start up periodof 1000 rounds, items are replicated and the replicas fill the caches of all nodesfulfilling the uniform distribution requirement of the model. At round 1000, itemd is inserted into the network at a random location. From that moment on itsreplication and coverage are recorded.

Simulations with the model For the simulations with the model, n, c and sare only system parameters. Instead of maintaining a cache, each node in thenetwork maintains only a variable that represents whether it holds item d or not(state 1 or 0, respectively). Nodes update their state in pairs according to thetransition probabilities introduced before (Figure 3.2). This mimics an actualexchange of items between a pair of nodes according to the shuffle protocol.While in the protocol this results in both nodes updating the contents of their


caches, in a simulation using the analytical model updating the state of a noderefers to updating only one variable: whether the node is in possession of theitem d or not. To sum up, we use transition probabilities to update the state ofone variable. Since we do not need a startup time for the simulations with themodel, at round 0 we set the state of a random node to 1 (while all the otherhave state 0) and track the state of the nodes for the remainder of the simulation.

0

100

200

300

400

500

600

0 200 400 600 800 1000 1200 1400

nu

mb

er

of

rep

lica

s

rounds

s = 95s = 50s = 10

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 1200 1400

co

ve

rag

e (

% o

f n

od

es)

rounds

s = 95s = 50s = 10

1e-05

0.0001

0.001

0 1 2 3 4 5 6 7 8 9 10

rounds

s=95 (diff) 1e-05

0.0001

0.001

rep

lica

tio

n d

iff

(sh

uff

le-m

od

el)

s=50 (diff) 1e-05

0.0001

0.001

s=10 (diff)

0

0.0002

0.0004

0 1 2 3 4 5 6 7 8 9 10

rounds

s=95 (diff) 0

0.0002

0.0004

0.0006

co

ve

rag

e d

iff

(sh

uff

le-m

od

el)

s=50 (diff) 0

0.0002

0.0004

s=10 (diff)

Figure 3.7: Grid 50 × 50, range 1 with n = 500, c = 100. Top: Replication (left)and coverage (right) of item d for the shuffle. Bottom: difference between theshuffle and the model for replication (left) and coverage (right).

Figure 3.7 shows the behavior of both the shuffle protocol in terms of repli-cation and coverage of d (upper row of Figure 3.7, the left and the right graphs,respectively), and the difference between the results of the protocol and the ana-lytical model (lower row of Figure 3.7), for various values of s. Each curve in thetop graphs represents the average and standard deviation calculated over 100runs. The experiments with the model calculate Pdrop using the simplified for-mula n−c

n−sdescribed in Section 3.4.3. It can be seen very clearly from the lower

graphs that the difference between the results obtained from the shuffle and themodel is very small. Note that measure unit of y-axis is the fraction of network.

We note that in all cases, the network converges to a situation in which thereare 500 copies of d, meaning that replication is 500

2500 = 0.2; this agrees with thefact that c

n= 100

500 = 0.2. Moreover, our experiments show that replication and


coverage display the fastest convergence when s = 50; this agrees with the factthat n−

√n(n− c) = 500−

√500 · 400 ≈ 50 (cf. Section 3.6).

The characteristics of the replication and coverage are influenced by thetopology of the network. It is the topology of the network that will dictate thelikelihood of any two nodes to gossip. In order to model the properties of theprotocol, it is crucial that we know the probabilities of a node in a given state (0or 1) to interact with another node in a given state. For this reason, we chooseto model the properties for a fully-connected network, where a node can gossipwith any other node in the network. This topology allows us to easily calculatethe probability of a node in state 1 to interact with a node in state 0 or to interactwith another node in state 1. Knowing this, we concentrate on applying the statetransition probabilities that we calculated in Section 3.4. The aim of this sectionis to provide an example of how the transition probabilities can be used to modelthe properties of dissemination for a specific topology.

3.8 Time Complexity of Experiments

In this section we discuss some topics/issues that arose during the develop-ment and testing of the model for the shuffle protocol.

During the experimental phase of this work, we observed a remarkable dis-parity between the time required to run an experiment with the shuffle protocolor with the model. In both cases, the experiment was the same in terms of prop-erties being measured and parameters used (cache size c, exchange buffer size s,number of different items n and network size N).

The difference in execution times can be traced back to the two differentalgorithms executed at each simulated node. From a node’s point of view, theshuffle protocol requires the selection of items to send to the gossip partner andthe selection of items to keep for the next round. The first operation can be donein linear time, O(c). The second operation requires checking if the incomingitems are already in the cache and removing entries from the cache if space isneeded. These steps can be done in O(c · log c) time. With the second operationdominating the execution time, we can estimate the time complexity for oneround of the shuffle to be O(c · log c).

Now, let us look at the time complexity of the model. Unlike the protocolimplementation, the model requires very little state, namely the value of theparameters and a variable to indicate the presence or absence of item d. Ateach round, each simulated node and its gossip partner only have to determinetheir current state (a1, b1) and transition to a new state (a2, b2) according to theappropriate transition probabilities. The transition probabilities themselves donot change during the simulation (since they depend solely on the parametersc, s and n), so they are precomputed at the initialization phase. As a result, theexecution of the model for each node has a constant time complexity, O(1).

The defining factor in the execution time of the simulations with the shuffle

3.9. Related Work 41

protocol is the size of the cache. With a cache size of 100 for all of our exper-iments, the execution times for our simulations with the shuffle protocol andthe model differed by approximately two orders of magnitude. Considering thatlarge networks are needed to clearly observe emergent behavior and that this be-havior evolves over many rounds, the value of having a model becomes evident.Being simply parameters in the model, the cache size and exchange buffer sizehave no effect on the memory requirements and execution time of the simula-tion, thus freeing computational resources to experiment with larger networksand different topologies.

3.9 Related Work

Two areas of research are most relevant to the model, presented in this chap-ter: rigorous analysis of gossip (and related) protocols, and results from math-ematical theory of epidemics [17, 69]. The results from epidemics are oftenused in the analysis of gossip protocols [83] (e.g., the traditional gossip paperby Demers et al. [75]). Thus, the scope of the overview is restricted to the mostrelevant publications from the area of (analysis of) gossip protocols.

Several works have focused on gossip-based membership management pro-tocols. Allavena et al. [3] proposed a gossip-based membership managementprotocol and analyzed the evolution of the number of links between two nodesexecuting the protocol. The states of the associated Markov chain are the numberof links between pairs of nodes. From the designed Markov chain they calculatedthe expected time until a network partition occurs. This case study also includesa model of the system under churn. A goal of that paper is to show the effect ofmixing both pull and push approaches.

Eugster et al. [82] presented a lightweight probabilistic broadcast algorithm,and analyzed the evolution of processes that gossip one message. The statesof the associated Markov chain are the number of processes that propagate onegossip message. From the designed Markov chain, the authors computed thedistribution of the gossiping nodes. Their analysis has shown that the expectednumber of rounds to propagate the message to the entire system does not dependon the out-degree of nodes. These results are based on the analysis assumptionthat the individual out-degrees are uniform. However, this simplification hasshown to be valid only for small systems (cf. [123]).

Bonnet [46] studied the evolution of the in-degree distribution of nodes exe-cuting the Cyclon protocol [177]. The states of the associated Markov chain arethe fraction of nodes with a specific in-degree distribution. From the designedMarkov chain the author determined the distribution to which the protocol con-verges.

There are a number of theoretical results on gossip protocols, targeted todistributed aggregation. In these protocols, a set of data is distributed over thenodes of a network and the nodes compute an aggregate of the data set. Kempe


et al. [129] proposed a push-only gossip-based aggregation protocol for a fullyconnected network. In this paper, the authors used Gaussian mixture modelling[76, 145]. A performance of the protocol has been measured by how quicklya data item originating with a node diffuses through a network (for uniformgossip). Each node locally maintains an aggregation vector vt,i. A state of theassociated Markov chain is the fraction of the vector node i sends to other node.From the designed Markov chain, the authors studied the convergence rate. Inaddition, the authors showed that the diffusion speed for flooding corresponds tothe mixing time of a random walk on the network. Validation of the theoreticalresults with practical experiments is left as future work.

The protocol [129] has been further tailored by Boyd et al. [50] to work on anarbitrarily connected network. In their analysis, the Markov chain is defined by aweighted random walk on the graph. Every time step, a pair of nodes (connectedby an edge) communicates with a transition probability, and sets their valuesequal to the average of their current values. A state of the associated Markovchain is a vector of values at the end of the time step. The authors consideredthe optimization of the neighbor selection probabilities for each node, to findthe fastest-mixing Markov chain (for fast convergence of the algorithm) on thegraph.

Jelasity et al. [121] proposed a push-pull solution for aggregation in largedynamic networks, supported by a performance analysis of the protocol. A stateof the system is represented by a vector, the elements of which correspond tothe values at the nodes, a target value of the protocol calculated from the vectorelements, and a measure of homogeneity characterizing the quality of local ap-proximations. The vector evolves at every step of the system according to somedistribution. In the analysis, the authors considered different strategies (e.g.,neighbor selection) to optimize the protocol implementation, and calculated theexpected values for the above mentioned protocol parameters.

Deb et al. [72] studied the adaptation of random network coding to gossipprotocols. The authors analyzed the expected time and message complexity oftwo gossip protocols for message transmission with pure push and pure pull com-munication models.

Busnel et al. [54] have studied a gossip-based membership protocol, wherenodes exchange a part of their neighborhood lists like in the shuffle protocol.The authors have shown that the system executing the protocol, converges to aconfiguration, where caches of all nodes have uniformly distributed items (links).

In [61] and its sequel [60], the speed of the convergence of push-pull gossiphas been calculated for network graphs with respect to their connectivity.

3.10 Conclusions

This chapter has demonstrated that it is possible to model a gossip protocolthrough a rigorous probabilistic analysis of the state transitions of a pair of nodes

3.10. Conclusions 43

engaged in the gossip. It has been shown, through an extensive simulation study,that the dissemination of a data item can be faithfully reproduced by the model.Having an accurate model of node interactions, the following results have beenproduced:

• After finding precise expressions for the probabilities involved in the model,we provide a simplified version of the transition probabilities. These sim-plified, yet accurate, expressions can be easily computed, allowing us tosimulate the dissemination of an item without the complexity of executingthe actual shuffle protocol. These simulations use very little state (onlysome parameters and variables, as opposed to maintaining a cache) andcan be executed in a fraction of the time required to run the protocol.

• The model reveals relationships between the parameters of the system.Armed with this knowledge, we successfully optimized one of the parame-ters (the size of the exchange buffer) to obtain the fastest convergence ofthe observed properties.

While gossip protocols are easy to understand, even for a simple push-pull pro-tocol, the interactions between nodes are unexpectedly complex. Understandingthese interactions provides insight into the mechanics behind the emergent be-havior observed in gossip protocols. We believe that understanding the mechan-ics of gossip is the key to optimizing (and even shaping) the emergent propertiesthat make gossip appealing as communication paradigm for distributed systems.

Next Steps

In this chapter, we have considered a simple push-pull gossip protocol thatwas originally proposed for wireless environments. After successfully analyzingthis protocol at an abstract level (pairwise node interactions) independent of net-work topology or the characteristics of the environment itself, we are interestedin the following possibilities for future study:

• We would like to investigate how our transition probabilities can be appliedto model the properties of the protocol. As a first step, we can model theproperties for a fully-connected network.

• In order to have a more realistic representation of a wireless setting, wewould like to incorporate the presence of lossy links in our network. Thecharacteristics of the lossy links would model an underlying MAC layer. Inother words, we would like to relax the assumption of atomic transactionsfor the shuffle protocol. This would result in new possible transitions in thestate transition diagram (Figure 3.2) due to loss of messages.

We will cover both items in the next chapters.

4Pairwise Exchange Model in

Lossy Networks

Previous studies of gossip-based information dissemination protocols, includ-ing the one presented in Chapter 3, have relied on the assumption that nodesinteract with each other through perfect, lossless communication channels. Inthis chapter, we drop this simplifying assumption and explore the impact of lossychannels on a gossip protocol with a push-pull information exchange. To analyzethe protocol in the presence of message loss, we introduce into the formal modela component based on statistical data from Monte-Carlo simulations. We success-fully validate the obtained model against simulations. As opposed to the modelsproduced by traditional probabilistic verification methods, our analytical modelscales well with the network size, and captures the performance of the proto-col in very large networks. Our results show that the presence of message loss,coupled with specific topology configurations, significantly impact the expectedbehavior of the protocol.

4.1 Introduction

We now consider ad hoc networks, where nodes are continually communicat-ing with each other for information exchange over an unreliable medium. On theone hand, gossip protocols are considerably more robust compared to determin-istic protocols due to randomness and the distributed nature, also in the presenceof failures and data loss. On the other hand, since the work of Demers et al. [75],gossip protocols often follow bidirectional data exchange (push-pull) for betterperformance. As observed in [79], the performance of these protocols is usuallyevaluated under the assumption of a perfect, error-free communication medium.Thus, data exchange operations between nodes are previously considered to beatomic operations, e.g., [28, 52, 78, 94, 120, 140, 167, 177]. That is, if a nodeinitiates a gossip with another node, both nodes base their local decisions uponeach other’s data, and in some protocols even guarantee the preservation of each

45

46 Chapter 4 — Pairwise Exchange Model in Lossy Networks

other’s data. In practice, however, implementing data exchange as an atomic op-eration is hard to achieve assuming communication over unreliable media [170].

This chapter investigates the impact of a realistic environment with lossy com-munication channels on a push-pull-based gossip protocol. We take the analyticalmodel of the shuffle protocol introduced in Chapter 3 as the starting point of ourstudy. This model is revised under the assumption that the shuffle operation isnot atomic. The messages sent by a node to its gossip partner get lost with acertain probability, due to the unreliability of the communication channels.

We present the analytical model of a gossip protocol in the presence of mes-sage loss. An exchange of items between nodes is modelled as a state transition.We consider every state transition and compute the corresponding probabilities.

Like in the previous pairwise model, an important part of state transition ma-trix is the probability Pdrop that an observed item in a node’s local storage hasbeen replaced by another item after the gossip. Since it is not feasible to coverall possible network topologies in one formula, the computation of this proba-bility is based on the statistical data that can be obtained from the Monte Carlosimulations of the protocol. To be more precise, we deconstruct the expressionfor Pdrop into its main components, and identify the ones that depend on specificscenario configurations, i.e., message loss and topology combinations. Such acomponent is the probability of an observed item found in one of the caches tobe also found in the other. Instead of finding analytical expression for every spe-cific configuration, which would result in the set of infinitely many expressions,we use statistical data sampling to obtain the value for this probability, therebyisolating the modelling framework from the scenario-specific components.

Furthermore, we present in more details the following discoveries:

• Introducing message losses in the network model affects the emergent be-havior of the protocol in a specific manner. With the introduction of lossycommunication channels, the correlation between cache contents of neigh-boring nodes increases, and the degree of the correlation depends on thenetwork topology.

• The smaller the radio range of nodes, the stronger the effect of message losson the emergent behavior of the protocol. For fully connected networks,message loss does not have an impact on the distribution of data over thenetwork, which remains uniform (as observed when there are no losses).

Related Work

So far, there has not been much work that studied the performance of gossipprotocols in the presence of message loss. There is previous work on a simplegossip-based membership protocol with nonatomic protocol actions in the pres-ence of message loss [100]. To overcome the problem that a message from apeer can get lost, the authors proposed a push-based gossip protocol, whereinonly the node initiating a gossip sends a message to its random peer, and imme-diately removes the sent data from its cache. Moreover, a node sends only two

4.2. Our Assumptions 47

ids (e.g., IP addresses and ports) of itself and a random one from its local cache.In the shuffle protocol, however, each gossiping pair exchanges a random subsetof the data items. The authors proved the correctness of this simple push-basedprotocol modelling it by graph transformations, similar to the approach in [67].The protocol performance is analyzed assuming up to 1% of the message loss.The protocol properties analyzed include the expected number of neighbours anode has and the uniformity of the neighborhood lists. The paper has presentedpure theoretical findings and did not include experimental evaluation.

Another work [84] combined model checking and Monte Carlo simulationsfor the performance analysis of a very simple probabilistic broadcast in the pres-ence of message loss. The scope and the framework of that paper differs fromours; it aims at studying how different modelling choices impact the performanceof the probabilistic system. The authors simply compare the results of modellingfor a small network of nine nodes (producing the state space of the model) tothe results of Monte Carlo simulations for a network of 1000 nodes.

A survey on consensus-based distributed algorithms [77] summarizes an ob-servation regarding the effects of intermittent or lossy links for probabilistic con-sensus algorithms. The authors conclude that the topology directly affects theconvergence rate, and in the connected network, convergence of consensus al-gorithms is not affected by lossy or intermittent links, and convergence speedsdegrade gracefully. The paper suggests that the same observation is applicable togossip protocols, but no analysis is carried out to support the claim.

4.2 Our Assumptions

We consider the shuffle protocol for our case study, and refer to Chapter 2 forthe description. In contrast to the previous model, we now assume that commu-nication channels are lossy. That is, every message sent has a positive probabilityto be lost due to a disturbance of the communication medium, for example, in awireless network. Moreover, message losses occur with the same probability andare independent. We do not assume that the shuffle procedure is atomic. Whena node has sent a message, it does not know whether the message is deliveredsuccessfully.

A B

(a)

A B

(b)

A B

(c)

Figure 4.1: Scenarios of communication with lossy channels: (a) loss of therequest message, (b) loss of the reply message, (c) communication without loss.

There are three general cases in pairwise communication with respect to mes-


sage delivery, as depicted in Figure 4.1: (a) node A initiates a gossip with B bysending a message, but the message is lost, (b) node A successfully initiates agossip with B, but a message from B is lost on its way to A, and (c) a gossipingpair successfully receives messages from each other.

4.3 A Model of Information Spread with Lossy Channels

We analyze the spread of a data item d in a network of the nodes executingthe shuffle protocol. In this analysis, we assume that the shuffle operation isnonatomic and communication channels are prone to losses.

We consider our analytical model of the shuffle protocol from Chapter 3, thatcaptures the presence or absence of a generic item d before and after the shufflebetween two nodes A and B. The model has four possible states of the caches ofA and B before and after the shuffle: when both hold d, either of the caches holdsthe item d, and neither cache holds d. These correspond to the states (0, 0), (0, 1),(1, 0), and (1, 1) in the state transition diagram, shown as Figure 4.2. Unlike inChapter 3, in our current model, the state (0, 0) is no longer isolated, since thereis a possibility to remove the only copy of d from the cache of the node B, in thescenario shown in Figure 4.1(b).

(0, 0) (0, 1)

(1, 0) (1, 1)

P(10|01

)P(01|10

)

P (11|10)

P (10|11)

P(0

1|1

1)

P(1

1|0

1)

P (00|01)

P (00|00) P (01|01)

P (10|10) P (01|01)

Figure 4.2: State diagram forcaches of gossiping nodes.

Transitions from one state to another arelabelled by the respective transition probabil-ities P (a2b2|a1b1), where a1b1 is the state be-fore a shuffle, and a2b2 is the state after theshuffle, with ai, bi ∈ 0, 1. The bits a1, a2

and b1, b2 indicate the presence (if equal to 1)or the absence (if equal to 0) of a generic itemd in the cache of an initiator A and the con-tacted node B, respectively. Note that thesetransition probabilities differ from the onescalculated for the pairwise interaction modelin Chapter 3.

We express all transition probabilitiesP (a2b2|a1b1) of the state diagram in terms ofthree probabilities: 1) the probability Pselect

of an item to be selected by a node from itscache for an exchange, 2) the probability Pdrop that an item is replaced by an-other one, received by its node in the shuffle, and 3) the probability Ploss thata message is not delivered to a node due to channel loss. We write P¬select for1− Pselect, P¬drop for 1− Pdrop, and P¬loss for 1− Ploss .

State (0, 0). Before shuffling, neither node A nor node B have d in their cache.a2b2 = 00: neither node A nor node B have item d after a shuffle because neither

of them had it in the caches before the shuffle: P (00|00) = 1.a2b2 ∈ 01, 10, 11: cannot occur, because none of the nodes have item d.

4.3. A Model of Information Spread with Lossy Channels 49

State (1, 0). Before shuffling, d is only in the cache of node A.a2b2 = 01: this outcome occurs when both request from A and reply of B are

successful and node A selects and drops d; that is, the probability is

P (01|10) = (P¬loss )2 · Pselect · Pdrop

a2b2 = 10: node B does not have d because: (a) A did not select d or (b) Aselected d, but the request message got lost on the way to B.

P (10|10) = P¬select + Ploss · Pselect

a2b2 = 11: both nodes A and B have a copy of d because: (a) either both nodesreceived the gossip messages and A selected d and kept it, or (b) A selectedd and the reply message from B got lost; that is,

P (11|10) = (P¬loss )2 · Pselect · P¬drop + P¬loss · Ploss · Pselect

a2b2 = 00: cannot occur as A would only drop d if it received a reply, whichimplies that B would keep d.

State (0, 1). Before shuffling, a copy of d is only in the cache of node B.a2b2 = 01: only B has d because the message from A got lost or, B received the

message, and: (a) it did not select d or (b) B selected d, kept it, and replygot lost; i.e. the probability is

P (01|01) = P¬loss(P¬select + Ploss · Pselect · P¬drop) + Ploss

a2b2 = 10: both request and reply messages were successfully delivered and Bselected and dropped d, which amount to the probability

P (10|01) = (P¬loss )2 · Pselect · Pdrop

a2b2 = 11: both messages were delivered successfully, and B selected and keptd; that is, P (11|01) = (P¬loss)

2 · Pselect · P¬drop.a2b2 = 00: neither of nodes have d because B selected and dropped d, but node

A did not receive the reply message. P (00|01) = P¬loss ·Ploss ·Pselect ·Pdrop.

State (1, 1). Before shuffling, d is in the cache of node A as well as in the cacheof node B.a2b2 = 01: only node B has d because both messages were successfully received,

A selected and dropped d while B did not select d.

P (01|11) = (P¬loss)2 · Pselect · Pdrop · P¬select

a2b2 = 10: only node A has d because node B selected d, dropped it, and nodeA did not select d and either (a) did not receive a message from B, or (b)received the message from B: P (10|11) = P¬loss · P¬select · Pselect · Pdrop.

a2b2 = 11: after the shuffle both nodes A and B have d, because:a) the message from A did not arrive at B, i.e. Ploss ;b) the message from A successfully received by B, but the reply message got

lost, and:


⋆ A did not select d while B (i) did not select d as well or (ii) selected andkept d. P¬loss · P¬select · Ploss · (P¬select + Pselect · P¬drop)

⋆ node A selected d: P¬loss · Pselect · Ploss

c) both nodes received each other messages, and:⋆ A did not select d while B (i) also did not select it or (ii) selected and

kept it: (P¬loss)2 · P¬select · (P¬select + Pselect · P¬drop)

⋆ (i) A selected and kept d while B did not select d or (ii) both nodesselected d. (P¬loss )

2 · Pselect · (P¬select · P¬drop + Pselect)Hence, P (11|11) = 1− (2− Ploss · (3 − Ploss)) · Pselect · P¬select · Pdrop.

a2b2 = 00: discarding of an item by both nodes cannot occur.

Hence, we obtain the resulting expressions for the transition probabilities inFigure 4.2, summarized in Figure 4.3.

P (00|00) = 1

P (10|01) = (P¬loss )

2 · Pselect · Pdrop

P (11|01) = (P¬loss )

2 · Pselect · P¬drop

P (00|01) = P¬loss · Ploss · Pselect · Pdrop

P (10|10) = 1 − P¬loss · Pselect

P (01|10) = (P¬loss )

2 · Pselect · Pdrop

P (01|11) = (P¬loss )

2 · Pselect · P¬select · Pdrop

P (10|11) = P¬loss · P

¬select · Pselect · Pdrop

P (01|01) = P¬loss(P¬select + Ploss · Pselect · P

¬drop) + Ploss

P (11|10) = Pselect · P¬loss · (1 − P¬loss · Pdrop)

P (11|11) = 1 − (2 − Ploss · (3 − Ploss )) · Pselect · P¬select · Pdrop

Figure 4.3: Revised transition probabilities

Again, our analytical model of the shuffle protocol uses the probabilitiesPselect, Pdrop and Ploss as building blocks. While Ploss is an input parameter thatexpresses the reliability of the communication channels, Pselect and Pdrop are de-rived based on the behavior of the protocol. Pselect describes the random selectionof items from the cache; an item has s

cchance of being selected, regardless of

message loss. The calculation of Pdrop is complex, and we discuss it in the follow-ing section.

4.4 The Probability of Dropping an Item

In this section we analyze how message loss affects one of the building blocksof our model: Pdrop. In our earlier work (assuming perfect communication, seeChapter 3), we had already found an expression for Pdrop, relying on the assump-tion of uniform distribution of items. Here, we revisit Pdrop and derive a generalformula without making any assumptions about the distribution of items. Lateron, we will explore how message loss affects the distribution of items, and willdetermine that it is the coupling of message loss and the topology of the networkthat affects Pdrop. Having isolated the component of Pdrop that is affected by themessage loss/topology coupling, we will propose the use of statistical data tocalculate that specific part of the Pdrop expression.

4.4. The Probability of Dropping an Item 51

When a node selects an item to be sent to a neighbor, there is a probabilityPdrop that the item will be dropped from the node’s cache after the gossip ex-change. The selected item may be dropped only when there is a need to createspace for an item received from a gossip partner. Therefore, the probability Pdrop

depends on the relationship between a) the new items received from the gossippartner (for which the node needs space, the shaded area in Figure 4.4(b), re-ferred to as A1), and b) the items that the node has selected to send to the gossippartner and is allowed to discard (the shaded area in Figure 4.4(c), referred to asA2). In order to find an expression for Pdrop, we need to calculate the probabilityof an item being in A1 and the probability of an item being in A2, which we willdenote as P (A1) and P (A2), respectively.

CA

CB

c

c = CA ∩ CB

(a)

CB

SB

SA

A1 = SA \ CB

(b)

SA

CB

SB

A2 = SB \ SA

(c)

Figure 4.4: a) Items in common for caches of both nodes A and B; b) Itemssent by A that are not in the cache of B; c) Items sent by B that are not in theexchange buffer of A.

Given two nodes, A and B, that engage in a gossip exchange, we can repre-sent their caches as sets CA and CB, and the sets of items they exchange witheach other by SA and SB, respectively. The sets CA and CB may have commonelements (items), see Figure 4.4(a). We define Pinx as the probability of an itemfound in one of the caches to be also found in the other, i.e., Pr(d∈CB | d∈CA)for any item d, one of n items in the network. Knowing Pinx, we can formulateP (A1) as Pselect ·(1− Pinx) and P (A2) as Pselect ·(1− Pselect · Pinx). The probabilityPdrop can then be expressed as:

Pdrop =P (A1)

P (A2)=

1− Pinx

1− Pselect · Pinx

(4.1)

In order to calculate a value for the probability of dropping an item, Pdrop, it isnecessary to have a value for the probability Pinx. For the case of CA and CB

being random samples of a population of n items, that is, when the system isin the stable state, we can analytically deduce that Pinx = c

n. In other words,

under the assumption that items are uniformly distributed over the network, wecan find an expression for Pinx. In our earlier work modelling the shuffle pro-tocol (i.e.,Section 5.5 under the assumption of perfect communication channelswe discovered that repeated execution of the protocol results in a uniform dis-tribution of items. In the next section, we look at the effect that lossy channelshave on the distribution of items.


4.4.1 Uniform Distribution of Items: Does it Still Hold?

As already mentioned in Section 4.3, the calculation of Pdrop in the presenceof message loss requires us to speculate about the contents of the gossip partner’scache. Assuming that the items in the caches of both gossip partners are randomsamples of the totality of items in the network, we can easily estimate Pinx ascn

and, therefore, have an analytical expression for Pdrop ≈ n−cn−s

. In this section,we verify whether the uniform distribution of items, observed under no messageloss, is still a valid assumption.

The Importance of Randomness

In order to understand the effect of message loss on the theoretical level,consider one node in the network, executing the shuffle protocol, with a neigh-borhood of b nodes.

CA CBc

c = CA ∩ CB

A B

−−−−−−−−−−→CA CBc ∪ SA

c = CA ∩ CB

Figure 4.5: Correlation of the caches increases due to message loss

We examine the scenario, depicted in Figure 4.1(b), where a reply messagefrom a node to the initiating node is lost due to channel failure. When two nodesA and B gossip their local items to each other, the probability that the messagefrom B to A is lost, is P¬loss · Ploss .

If the message of B is not delivered to A, items c ∪ SA will be commonto cache CA and cache CB after the shuffle since B, unaware of the failure,purges the sent items SB, and A, which received no reply, keeps its SA items(see Figure 4.5). Over time neighboring nodes have a growing intersection ofitems in their cache. If a node has a small neighborhood, the random samplebecomes more biased towards the collection of items left in the neighborhood.Limited by the access to the storage space of the neighborhood, a set of neighborsmaintains only a subset of all items. For the fully connected graph, since it is theentire collection of the items every node has access to, the uniform distributionassumption remains. A larger neighborhood reduces the probability of repeatedcommunication between two nodes, i.e. (1

b)i to contact each other i times, and

due to the increased communication range, there is a faster exchange betweendistant areas of the network.

Experimental Observations

To support our theoretical discovery that message loss affects the uniformdistribution of items observed under no message loss, we conduct simulation

4.4. The Probability of Dropping an Item 53

experiments. The setup of the experiments is the following: each simulationexperiment has a startup period of 1000 rounds during which N = 2500 nodesgossip n = 500 different items under no message loss. We use three differentnetwork topologies: a fully connected network, a square grid with each internalnode having 4 neighbors and a network where every node has 4 randomly chosenneighbors (outdegree 4). By the end of the startup period, the items have beenreplicated, achieving uniform distribution. That is, every item has a probabilitycn

= 100500 = 0.2 of being present in a given node’s cache. After the startup period

has finished, the communication channel is set up to fail with a probability Ploss .Under this new scenario, we record, for a given item x, the fraction of shufflesbetween two nodes that both have x, over the total number of shuffles whereat least one of the nodes has x. With this information, we calculate, per round,the probabilities P (11), P (10) and P (01), which represent the probability thata gossip exchange involves both nodes having item x, only the initiator havingitem x, or only the gossip partner having item x, respectively.

Figure 4.6 shows the average values of P (11), P (10) and P (01) over 1000rounds for different values of Ploss using the three topologies mentioned ear-lier. As expected, for the case of no message loss (Ploss = 0) the probabilitiessuggest a uniform distribution of items in the network. We can analytically de-duce their expected value, which matches the experimental results, with P (11)as c

n· c

n= 0.04 and P (10) = P (01) as c

n· (1 − c

n) = 0.16. However, as the

probability of message loss increases, we observe some changes in P (11), P (10)and P (01). For the case of the fully connected network, the probabilities remainstable, suggesting that the uniform distribution of items remains unaffected bymessage loss. With the other topologies, however, as message loss increases thenumber of gossip interactions between nodes that have item x also rises. Sinceitem x is representative of all items in the network, the graphs suggest that, as aconsequence of message loss, gossip partners have more items in common thanthey would have if items were uniformly distributed. In other words, due to mes-sage loss, a node is more likely to have items in common with a neighbor thanwith another random node in the network. Hence, the topology of the under-lying network now plays a role in defining the probability of dropping an itemafter a gossip exchange, Pdrop.

4.4.2 Calculating Pdrop Based on Statistical Data

By now, we have established that the occurrence of message loss affects thedistribution of items in a network of gossiping nodes. While a network operatingunder no message loss converges to a uniform distribution of items (i.e., eachnode holds a random sample of items), when message loss occurs, the likelihoodof a node having the same items as its gossip partners increases. That is, thesample held by a node is highly correlated to the samples held by the node’sneighbors. As a result, under message loss, replicas of an item are more likely tobe found within the same neighborhood. It follows that the structure of neigh-


0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ploss

P(01)

fully connectedgrid range 1outdegree 4

(a)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ploss

P(10)


(b)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ploss

P(11)


(c)

Figure 4.6: Probability that a gossip exchange occurs between a) a node thatdoes not have item x and a node that has the item, b) a node that has item x andone that does not, and c) two nodes have item x, for different topologies.

borhoods plays an important role in determining the distribution of items, which,in turn, determines the probability of an item found in a node’s cache to be alsofound in the gossip partner’s cache, Pinx. Our calculation of Pdrop depends onfinding a value for Pinx.

For the case of a fully connected topology, where a node’s neighborhood isthe whole network, we can analytically derive Pinx. Having established earlier inthis section that uniform distribution of items is maintained in a fully connectednetwork (regardless of message loss), it follows that every node has the sameprobability c

nof having a given item. Therefore, for a node that has item x, the

chance of its gossip partner also having item x is cn

. That is, Pinx is on averagecn

. For other topologies, the analysis will undoubtedly be more complex. We optfor obtaining Pinx from statistical data collected from experiments. With Pinx,we can proceed to calculate Pdrop, obtaining the final building block of the modelneeded for validation.


We can calculate Pinx from the probabilities P (11), P (10) and P (01) mea-sured experimentally in the previous section:

Pinx =P (11)

P (10) + P (11)

The left graph of Figure 4.7 shows the values for Pinx calculated for each ex-periment using a different Ploss . Based on these values, we compute Pdrop using(4.1), as seen in right graph of Figure 4.7. As expected, the calculated val-ues show that, in the face of message loss, different topologies yield differentprobabilities of dropping an item, with Pdrop dropping more harshly in the moreclustered topologies, which suffer more from neighboring nodes having similaritems as message loss increases.

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Pin

x

Ploss


0.68 0.7

0.72 0.74 0.76 0.78

0.8 0.82 0.84 0.86 0.88

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Pd

rop

Ploss


Figure 4.7: Probabilities Pinx (left) and Pdrop (right), for different topologies.


The experiments in this section simulate the case where a new item d is in-troduced by one node in the network in which all caches are full and uniformlypopulated by n = 500 items. Each experiment takes as input the topology of thenetwork to determine which pairs of nodes can gossip. In all cases, we use a net-work of N = 2500 nodes arranged in either of the three topologies mentioned inthe previous section (a fully connected network, a square grid or a graph whereevery node has 4 randomly chosen neighbors). In the experiments that follow,after each gossip round, we measure the total number of occurrences of d in thenetwork (replication), and how many nodes in total have seen d (coverage).

Simulations with the shuffle protocol Each node in the network has a cachesize of c = 100, and sends s = 50 items when gossiping. In each round, everynode randomly selects one of its neighbors and shuffles. In order to make a faircomparison with the simulations with the model, we let the nodes gossip for1000 rounds with Ploss = 0, to ensure that none of the n = 500 initial items is


0

100

200

300

400

500

600

0 500 1000 1500 2000

nu

mb

er

of

rep

lica

s

rounds

Ploss = 0.2Ploss = 0.4Ploss = 0.6

0

500

1000

1500

2000

2500

0 500 1000 1500 2000

nu

mb

er

of

no

de

s r

ea

ch

ed

rounds


20 40 60 80

100

0 500 1000 1500 2000

rounds

Ploss=0.6 (diff)std. dev.

20 40 60 80

100

replic

ation d

iff (s

huffle

-model)

Ploss=0.4 (diff)std. dev

20 40 60 80

100


100

200

300

400

0 500 1000 1500 2000

rounds


100

200

300

400

covera

ge d

iff (s

huffle

-model)


100

200

300

400


Figure 4.8: Grid, range 1: Top: Replication (left) and coverage (right) of itemd for the shuffle. Bottom: difference between the shuffle and the model forreplication (left) and coverage (right).

lost in the startup period before initiating the measurements of the properties.After this startup period of 1000 rounds, items are replicated and the replicas fillthe caches of all nodes. At round 1000, Ploss is set to the desired value and itemd is inserted into the network at a random location. From that moment on, wetrack its replication and coverage.

Simulations with the model For the simulations with the model, n, c and sare system parameters set to 500, 100 and 50, respectively. Instead of maintain-ing a cache, each node in the network only maintains a variable that representswhether it holds item d or not (state 1 or 0, respectively). Nodes update theirstate in pairs according to the transition probabilities introduced before, see Fig-ure 4.2. This mimics an actual exchange of items between a pair of nodes ac-cording to the shuffle protocol. While in the protocol this results in both nodesupdating the contents of their caches, in a simulation using the analytical modelupdating the state of a node refers to updating only one variable: whether thenode is in possession of item d or not. To sum up, we use transition probabilities


0

100

200

300

400

500

600

0 200 400 600 800 1000

nu

mb

er

of

rep

lica

s

rounds


0

500

1000

1500

2000

2500

0 200 400 600 800 1000

nu

mb

er

of

no

de

s r

ea

ch

ed

rounds


20 40 60 80

100

0 200 400 600 800 1000

rounds


20 40 60 80

100

replic

ation d

iff (s

huffle

-model)


20 40 60 80

100


100 200 300 400 500

0 200 400 600 800 1000

rounds


100 200 300 400 500

covera

ge d

iff (s

huffle

-model)


100 200 300 400 500


Figure 4.9: Topology with outdegree 4: Top: Replication (left) and coverage(right) of item d for the shuffle. Bottom: difference between the shuffle and themodel for replication (left) and coverage (right).

to update the state of one variable. Since we do not need a startup time for thesimulations with the model, at round 0 we set the state of a random node to 1(while all the others have state 0) and track the state of the nodes for the re-mainder of the simulation. Pdrop is calculated using equation (4.1), as describedin Section 4.4.2.

Figs 4.8 and 4.9 show the behavior of the shuffle protocol (top row) and howit compares to the analytical model (bottom row) in terms of replication (left)and coverage (right) of d, for different values of Ploss . For each value of Ploss , 100simulation runs (for both the shuffle protocol and its model) were executed. Dueto message loss, it is possible for item d to disappear after being introduced intothe network (usually in the first few rounds). As Ploss increases, this situationis more likely. In the graphs, we only take into account the successful runs, i.e.,where the item did not disappear, but spread.

The top rows of Figs 4.8 and 4.9 show the average and standard deviation ofthe successful runs with the shuffle protocol. We compare this data with the av-erage of the successful runs of the model and present the difference (in absolutevalue) in the bottom rows of Figs. 4.8 and 4.9. We include the standard devia-


tion for the shuffle in the bottom row for comparison. It can be observed veryclearly that the difference between the results obtained from the shuffle and themodel fall well within the standard deviation of the shuffle results, confirmingthe ability of the model in reproducing the average behavior of the protocol.

We note that in all successful runs (despite message loss), the network con-verges to a situation in which there are roughly 500 copies of d, indicating thatafter repeated execution of the protocol d receives a fair share of the storagespace in the network; 2500 · 100 cache slots divided between 500 items. Also, asexpected, due to random gossiping item d is eventually seen by all nodes in thenetwork, when coverage reaches 100%.

4.6 Conclusions

In this chapter, we have presented the analysis of a gossip-based informationdissemination protocol in the presence of transient communication failures. Forour analytical framework, we introduced a hybrid method to compute Pdrop thatcombines both rigorous modelling of the protocol and statistical data samplingfrom large-scale Monte Carlo simulations for different network topologies. Tobe more precise, we derive analytical expressions for components of our frame-work that are invariant with respect to the scenarios we want to model. Havingdecomposed our model into its basic components, we identify the ones that areaffected by specific scenario configurations (in this case, message loss and topol-ogy combinations). Finding analytical expressions for these components (in thiswork, only Pinx) would require modelling each specific scenario configuration,reducing the applicability of the framework to those very specific configurations.Instead, we isolate the framework from the scenario-specific component and usestatistical data sampling to obtain a value for it.

We opt for this combined approach since it allows us to build a frameworkthat captures the behavior of the gossip protocol without requiring us to incor-porate scenario-specific elements into the model. The challenge in this approachlies in being able to decompose the model into its minimal components, in sucha way that the ones which are dependent on particular scenarios (for which ex-pressions that encompass all scenarios cannot be derived) can be isolated andcomputed separately. In effect, we strive for a golden mean between high-levelmodels such as for epidemics showing only the emergent behavior and the low-level models of the protocol that depend on particular implementation settings.

With respect to gossip-based dissemination, our study revealed that the uni-form distribution of data has been a valid simplifying assumption for networkswith perfect, error-free communication media. However, for networks with lossycommunication channels, this assumption is valid only if every node can gossipwith any other node in the network, but not for other types of network graphs.

5Comparison of Performance

Models

In Chapter 3, we developed an analytical model of information disseminationfor the shuffle protocol. With this model we now analyze how fast an item isreplicated through a network, and how fast the item covers the network. Wedetermine formulas that capture the dissemination of an item in a fully connectednetwork.

We also build several formal models using the probabilistic model checkerPRISM. Despite of the well-known state space explosion problem, we were ableto analyze a network of up to 15 nodes. In addition, we implement two typesof equations in MATLAB, a discrete one and its continuous alternative, as well asthe algorithm itself in the peer-to-peer network simulator PeerSim.

By comparing different modelling frameworks, we further explore the im-pact of modelling choices, such as different scheduling policies and the notionof rounds. The evaluation of distributed protocols, especially gossip protocols, isdifficult and a comparison of different evaluation techniques is greatly desired,since the evaluation techniques vary a lot and are based on different assumptions.The comparison of different results allowed us to discover hidden assumptions,which clarifies the interpretation of the obtained results.

5.1 Introduction

With increasing use of distributed algorithms in applications such as dis-tributed data bases or wireless sensor networks comes the need for methods toanalyze their behavior before deployment in a specific application. Most alterna-tives to testing the distributed algorithm in a testbed rely to some extent on ananalysis of a model of the system; be it analysis by simulation or be it analysis bymodel checking.

We compare different models used to analyze a distributed algorithm for shar-ing data in a distributed network. The algorithm examined in this chapter is the

59

60 Chapter 5 — Comparison of Performance Models

shuffle algorithm, at the heart of the shuffle protocol, described in Chapter 2.So far, we analyzed the spread of a distinguished data item, in a distributed

storage space maintained by the algorithm. An exchange of data between nodesoccurs pairwise; a node selects one node to communicate. If none of the twonodes has the data item, neither will have it after they exchange data. If eitheror both possess the data item, at least one of the nodes will possess it after theexchange. The probability with which nodes possess the item after communica-tion depends on the number of different data items, exchange message size andlocal storage size. Moreover, this probability depends on whether both or onlyone of the nodes possess the item before communication.

We now focus on two properties of the protocol: i) replication, i.e. the numberof nodes that hold the data item currently, and ii) coverage, i.e. the number ofnodes that have seen the data item in the past. We adopt the notion of rounds,often used in the evaluation of gossip protocols (e.g. [94, 123, 177]), although itis not a part of these protocols. In each round every node initiates one exchangeof data. This means that it might also be selected by another node as partnerin an information exchange. Note that the order in which nodes initiate dataexchange is arbitrary. This raises a natural question whether the order in whichnodes initiate data exchange can affect the coverage property.

We build formal models for the probabilistic model checker PRISM, imple-menting different scheduling policies and notion of rounds. The first PRISMmodel is nondeterministic and covers all possible schedulers, giving best- andworst-case probabilities. The other PRISM models employ a random scheduler,a round-robin scheduler, an “eager” scheduler that gives preference to nodesthat possess the item, and finally a “reluctant” scheduler that gives preference tonodes without the item. The latter two PRISM models are used to determine ifimplementable schedulers exist that get close to the best- and worst-case bounds.All these models are based on transition probabilities from Chapter 3.

A distributed network often lacks the ability to enforce that all nodes progressat an even pace. However, it is not uncommon (especially in an implementationfor simulation) to assume that communication occurs in rounds throughout thenetwork. Therefore, we additionally investigate the effect of omitting the notionof round by assuming that all nodes progress with an even chance.

We then derive analytical expressions for replication and coverage as differ-ential equations also based on the transition probabilities from Chapter 3, usedas the rates with which nodes exchange data items. These results are comparedwith the other results of this chapter. To that end, we implemented two equationsof coverage in MATLAB – a discrete version as a difference equation and its con-tinuous alternative in the form of a differential equation. Moreover, we compareresults from an implementation of the shuffle protocol running on a peer-to-peernetwork simulator PeerSim [122]. Even for small networks, all models produceddifferent results, although these results are based on rigorous formalizations ofthe algorithm (in the case of PeerSim, on an actual implementation). The com-parison of the results of different formalisms allowed us to discover hidden as-

5.2. PRISM Model Checker 61

sumptions in these alternative modelling frameworks, which helps to clarify theinterpretation of the obtained results.

The network size in each model (meaning PRISM model, equation and imple-mentation) is restricted to the least upper bound (supremum) that the respectivemodelling framework could handle. Despite the well-known state space explo-sion problem, we were able to specify a network of 10 nodes (for a round-basedPRISM model) and of 15 nodes (for a PRISM model without rounds) with PRISM.However, this is rather restrictive when compared to the equational expressionswith a few thousand nodes. The results for the shuffle algorithm using the dif-ferential and difference equations were obtained under the assumption that thenetwork is completely connected. The simulation and the PRISM models use thesame assumption, even though both would work without this assumption.

5.2 PRISM Model Checker

For formal modelling and analysis of the shuffle protocol we first use theprobabilistic model checker PRISM (version 3.2), developed at the Universityof Oxford [112]. PRISM supports three types of probabilistic models: Markov

decision processes (MDP), discrete-time Markov chains (DTMC), and continuous-

time Markov chains (CTMC). In this chapter, we consider MDPs and DTMCs formodelling.

The PRISM input language is a state-based language for describing probabilis-tic state transition systems. In a PRISM model, a system is regarded as a set of“communicating” modules. Each module consists of a set of guarded commands.A command is composed of the guard (a predicate on the state variables) and aprobabilistic update relation (the probabilistic assignment to variables). Modulescan synchronize on shared variables or on common actions labels.

Once specified, the PRISM model checker constructs an internal representa-tion of the system model – a Markov-style transition system for the compositionof specified modules – which can then be analyzed exhaustively relative to aspecified property using a suite of numerical algorithms. The specification lan-guage is PCTL, a probabilistic extension of CTL, expressive enough to describemany performance style properties. PRISM computes the probabilities of satis-faction for DTMCs, and the best- and worst-case probabilities of satisfaction forMDPs. The maximal and minimal probabilities for an MDP refer to the worst-and best-case resolutions of the nondeterministic choice.

5.3 The Basic PRISM Model

All PRISM models described in this section share a similar structure. A nodeinitiates an exchange of data items with a random partner exactly once perround, but may be contacted not at all, once, or multiple times. In addition tothe random partner selection, the shuffle protocol exhibits nondeterministic be-


havior in that the order in which the nodes initiate a gossip is not determined. InChapter 3, the performance of the protocol is measured according to the spreadof the data item d in the network. Thus, it is sufficient to use a boolean to indicatethe presence (as true) or the absence (as false) of d in the cache of node.

For each node i in the network we introduce one global boolean variable ni

(whether the node currently possesses data item d) and one module nodei. Thismodule has a local variable senti, which records whether a node has initiatedan exchange in the current round. A node i initiating the contact with a node j ismodelled as a local transition of module nodei. This transition selects the nodej with probability 1/(N − 1), where N is the number of nodes in the network. Itthen sets the variable senti to true, and updates the state variables ni and nj,using the probabilities of pairwise interaction model, as calculated in Section 3.4.Thus, the entire communication process, including the exchange of data items, ismodelled by the probability that the participating nodes acquire the distinguisheddata item d. Figure 5.11 (in the appendix) contains an example PRISM model for3 nodes. Defining the variables ni and nj as global enables the initiating nodeto update the state atomically, removes the need for synchronizing transitionsbetween modules, and thus simplifies the PRISM model considerably.

Once all modules have initiated a gossip in a round, they perform one syn-chronous transition on the labelled action done to mark the end of the round.This transition will reset the local variables senti back to false. Overall, eachround in the PRISM model has N + 1 transitions: for each of the N node onetransition to specify the initiated contact, and one transition to mark the end ofthe round.

5.3.1 Coverage Property in PRISM

The coverage property, analyzed in Section 5.5.3, is expressed as the fractionof nodes that have “seen” item d in the past. This property cannot be expresseddirectly in PRISM; this tool allows for probabilistic specifications, i.e. it allows tocompute the probability that any given node has “seen” a node in the past. Sincethe network is completely connected and all nodes are symmetric, it suffices tocompute the probability for one node. The probability that node i had data itemd in the past k rounds is expressed as follows:a

P=?[(true) U<=k*(N+1) (ni)] (5.1)

Note that (N + 1) is the number of steps in a round.We now describe different PRISM models for networks of several nodes. All

models assume that the communication channels are reliable, provided by anunderlying layer that maintains the neighborhood information.

aFor MDP models it is necessary to specify whether to compute the upper or lower bound prob-ability. PRISM provides the PCTL operators Pmax=? and Pmin=? for this purpose. For simplicity weonly use P=? throughout this chapter.

5.3. The Basic PRISM Model 63

5.3.2 All Possible Schedules

The basic PRISM model as described above does not specify in which orderthe nodes initiate communication. We first consider the nondeterministic PRISMmodel which allows all schedulers for communication. The nondeterministicPRISM model of the algorithm is specified as an MDP. PRISM then allows tocompute the minimum (worst-case) and maximum (best-case) probabilities of anode to see item d over all possible schedulers, until round k.

1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 20 21 round 1 round 2

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Pro

babili

ty

MDP max

MDP min

Worst case t=9

Worst case t=10

Figure 5.1: Worst- and best-case bounds for MDP model for the initial tworounds, and schedules realizing the worst-case bounds at t = 9 and t = 10

Example 5.1. We consider a network with N = 10 nodes, cache size c = 100,

number of different data items n = 500, and message size s = 50. It is assumed

that one node, node 1, has the item d initially. Figure 5.1 depicts the maximal and

minimal coverage in detail for the first two rounds, i.e. we are checking

P=?[(true)U<=t (ni)]

rather than (5.1), for a node i other than node 1.

The results show that during the first two rounds the worst- and best-case prob-

abilities diverge. Note, that this is the worst-case probability for any distinguished


node, and not the average probability over all nodes. For example, right before the

second last step in round 1 (t = 8), the worst-case probability of node i will still be

0. This assumes that the other eight are scheduled first, then, at t = 9, node 1, and

finally, at t = 10, node i. All other nodes will have a positive probability at t = 8,

since they initiated communication before. The average coverage for these nodes

at this point is therefore strictly greater than 0, while the worst-case probability of

node i is still 0. There exists no schedule that realizes the lower bound of 0 for all

nodes simultaneously.

Furthermore, the worst-case bound in Figure 5.1 is not realized by a single worst-

case schedule, but by a combination of schedules. The previous paragraph discussed

the worst-case schedule for node i leading up to the second last step in round 1. The

worst-case probability for node i at the next step (t = 10), when all nodes but one

initiated communication, is realized by a schedule in which node i is the first toinitiate communication, followed by the other nodes, and then followed by the node

1 at t = 10 as the last node in this round. Figure 5.1 depicts the results for the

two different schedules in comparison to the worst-case bound for the MDP PRISM

model. Note also that in the second round, when these schedules are repeated in

round-robin fashion, neither of them realizes the worst-case bound.

Obviously, similar observations hold for the best-case bound, namely that itis the best-case probability for any individual node, which will not be realized byany single schedule. To that end, we presented the details for the steps within around, but in the remainder of this chapter we consider only the probabilities atthe end of each round.

5.3.3 Fair Random Scheduler

Rather than exploring all nondeterministic choices in PRISM model, we nowinstantiate them with uniform distribution. In PRISM, this is simply achievedby modelling the system as a DTMC (rather than as a MDP), which effectivelyimplements a fair random scheduler. On the level of PRISM modules, it is doneby using keyword probabilistic instead of nondeterministic.

In Figure 5.2, we can see that the curve of DTMC PRISM model representsthe average results and, as such, lies roughly in the middle of the area betweenthe upper and the lower bounds obtained from the MDP PRISM model.

5.3.4 Round-Robin Scheduler

The following PRISM model implements a round-robin scheduler to select thenext node that initiates communication. The nodes are arranged from node 1 tonode N , and node i + 1 can initiate communication only when node i has doneso. An exception to this rule is node 1 which can initiate the first communicationin each round.

The round-robin scheduler does not reflect a scheduler that would be imple-mented by an actual network, since it requires a central scheduler with a global


0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Pro

ba

bili

ty

Rounds

Round-robinMDP maxMDP min

DTMCDTMC wo rounds

Figure 5.2: Coverage for MDP and fair DTMC PRISM model.

view of the network. This would defeat the purpose of the distributed databasethat executes a distributed shuffling algorithm. However, a round-robin sched-uler would be suitable for any simulator that checks and selects the next nodein each round in the same order. In the context of the shuffle protocol, a round-robin scheduler is nothing but a loop iterating over an array of node IDs. Analysisof models that use a round-robin scheduler enables to estimate the range of cov-erage which simulators employing a deterministic round-robin scheduler maygive.

Example 5.2. As before, we consider a network with ten nodes, cache size c =100, number of different data items n = 500, and the size of the message s = 50.

Node 1 has the distinguished data item d initially. Figure 5.2 shows the results

for the nondeterministic, the fair random, and the round-robin scheduler. For all

these PRISM models the probability that a node has “seen” the item d in the past

will converge towards 1. Also as to be expected, the coverage for the fair random

scheduler and the round-robin scheduler lie in-between the upper and lower bounds

obtained from the nondeterministic PRISM model.

For the round-robin scheduler, coverage depends on when node 1 and node i are

scheduled. Selecting nodes in a predetermined order destroys the symmetry between

the nodes. The case when node 1 is scheduled first yields a higher coverage for the


other nodes compared to the round-robin scheduler that schedules this node last.

Also, if node i precedes in the round-robin order node 1, it has a higher coverage

than when it succeeds it.

Another interesting observation is that the average coverage over all round-robin schedulers is not equal to the result of coverage of the fair scheduler. Theaverage coverage for round-robin schedulers is slightly improved with respect toa fair random scheduler.

5.3.5 PRISM Model without Rounds

All previous PRISM models assumed that communication is organized inrounds, and that each node initiates communication exactly once in each round.This assumption is motivated by the assumption that all nodes progress at ap-proximately the same speed. However, round-based PRISM models do requireeither that all nodes have global knowledge of whether the other nodes initiatedexchange in the current round, or a strict underlying time synchronization al-gorithm. In general, this cannot be assumed for the shuffle protocol. It is notonly difficult to accomplish in large distributed networks, it would also defeatthe purpose of a distributed system.

As an alternative we built a PRISM model without rounds. It selects thenext nodes independent of how often they have been selected before, with auniform distribution over all nodes. The model is similar to the fair DTMC PRISMmodel. However, the model neither needs variable senti to record whether ithas initiated communication in this round, nor the synchronizing transition tomark the end of a round.

Figure 5.2 also shows the result for this PRISM model. To compare it with theround-based PRISM models we defined a “round” to be N steps of communicat-ing nodes. It is one step less than for the other PRISM models, since the modelomits the transition at the end of a round. The results show that the coveragein a PRISM model without rounds is smaller. However, they are clearly not onlywithin the bounds given by the MDP, but also within the range of the round-robinschedulers. Even with a pure random scheduler, without rounds, it is guaranteedthat the coverage will converge to 1.

Note that this model without rounds does not capture more intricate timingbehavior such as clock drift or jitterb. To capture this would require to keep localtiming information. While possible, it does, even for networks of very moderatesize, quickly result in state spaces that are larger than the model checker canhandle.

bA clock jitter is a deviation of the locally measured time of the event with respect to the perfectglobal time.


5.3.6 Unfair Schedulers

The previous results of this section show that the coverage for the fair ran-dom and the round-robin schedulers are not close to the best- and worst-caseprobabilities of the MDP PRISM model. It was already discussed in subsection5.3.2 that the worst- and best-case bounds are for individual nodes only, and thata single probabilistic scheduler that realizes these bounds for all nodes simulta-neously may not exist. These bounds provide conservative outer bounds on thecoverage. This raises the question how conservative these bounds are.

To address this question, we studied the behavior of unfair schedulers thathave a global knowledge of the network. These include implementable sched-ulers, and thus provide an inner approximation on the bounds on the coverage.

The first class are “eager” schedulers that give precedence to nodes that holdthe item. Nodes that do not hold the item are blocked from initiating communica-tion as long as there is a node with an item that has not initiated communication,yet. The other class are “reluctant” schedulers that give precedence to nodes thatdo not have d in their cache. Both resulting PRISM models remain nondeter-ministic. Thus, the upper and lower bounds for these classes will constrain thecoverage for any implementable scheduler in this class. The lower bound for the“eager” schedulers thus provides an inner approximation on the upper bound for

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30

Pro

ba

bili

ty

Rounds

MDP max

Eager max

Eager min

MDP min

Reluctant max

Reluctant min

0.73

0.735

0.74

0.745

7

0.9275

0.9325

20

Figure 5.3: Coverage for unfair scheduler models, and results for MDP (gray) forcomparison. Inset figures show the maximal distance between the results.


MDP PRISM model, while the upper bound for the “reluctant” schedulers givesan inner approximation on the lower bound. Note, that there might exist otherimplementable schedulers that improve these bounds.

As shown in Figure 5.3, the respective upper and lower bounds for the “eager”schedulers are numerically close to each other. Even though they do not coincide,they cannot be distinguished in this figure. The same holds for the bounds onthe “reluctant” schedulers. Figure 5.3 also shows that the MDP bounds and theinner approximation almost coincide as well. The difference is less than 0.005in probability. This suggests that although the outer bounds might be somewhatconservative, there exist implementable schedulers that approach these boundsclosely.

5.4 Simulation Model

We compare the result for coverage from the PRISM models with coverageobserved while running the shuffle protocol in a simulated environment. Wedeploy the shuffle protocol in a round-based fashion, using the event-based sim-ulator PeerSim [122]. The round-based mechanism of PeerSim allows collectingthe results of shuffling from cache at the end of each round. The experimentssimulate the case when a new item is introduced at one node in a network, whileall caches are full and uniformly populated with 500 items. As we know by now(see Section 3.3), it can be achieved by letting the nodes gossip for 1000 roundsbefore inserting the distinguished data item.

At the start, the simulator creates a randomly ordered list of nodes, providedby a configuration file. Along with the network size and the number of items,other parameters such as network topology and exchange buffer size are alsoprovided as input to the simulator. The network topology determines whichpairs of nodes can gossip. The experiments are performed on a fully connectednetwork of ten nodes. At the beginning of every gossip round, a scheduler inthe PeerSim simulator randomly shuffles an order of the nodes in the list. (Thiseffectively prevents the emergence of round-robin schedules.) Within a round ofthe protocol simulation, all nodes proceed in a lock-step fashion, executing theprotocol in the order decided by the scheduler.

For the experiments we assumed a cache size of c = 100. In every round eachnode randomly selects one of its neighbors and exchanges s = 50 items duringthe interaction. After each gossip round, we measure how many nodes in totalhave seen the distinguished item d at the end of each round.

Note that the simulator measures only whether a node holds d at the endof the round, and ignores if the node acquired the item but lost it later in thesame round. The same holds for the equations that will be discussed later inthis chapter. In order to ensure that the results are comparable with the PRISMresults, we use in the remainder of the chapter, instead of (5.1), the following

5.5. Modelling with Differential Equations 69

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Pro

ba

bili

ty

Rounds

Simulations (avg with std)MDP maxMDP min

DTMC

Figure 5.4: Comparison of results for PRISM models (gray), and with the resultsof simulation with standard deviation.

PCTL property

P=?[(true) U<=k*(N+1) (ni & sent)] (5.2)

The boolean sent is the conjunction of the variables senti, and takes the valuetrue at the end of the round.

The simulation results in Figure 5.4 are the average of 1000 independentruns. They show that the coverage obtained by simulation is within the MDPbounds and close to the DTMC results. Given the uncertainty inherent withinsimulation results, there is a close match with the PRISM model. They also showthat the MDP bounds are within the standard deviation of the simulation results.

5.5 Modelling with Differential Equations

We exploit the analytical model of information dissemination (from Chap-ter 3) to perform a mathematical analysis of replication and coverage with regardto the shuffle protocol. For the particular case of a network with full connectiv-ity, we can find explicit expressions for the dissemination of a generic item d interms of the probabilities presented in Section 3.4. We construct two differentialequations that capture replication and coverage of item d from a round-based


perspective. The advantage of this approach is that we can determine the long-term behavior of the system as a function of the parameters.

The characteristics of the replication and coverage are influenced by thetopology of the network. It is the topology of the network that will dictate thelikelihood of any two nodes to gossip. In order to model the properties of theprotocol, it is crucial that we know the probabilities of a node in a given state (0or 1) to interact with another node in a given state. For this reason, we chooseto model the properties for a fully-connected network, where a node can gossipwith any other node in the network. This topology allows us to easily calculatethe probability of a node in state 1 to interact with a node in state 0 or to interactwith another node in state 1. Knowing this, we concentrate on applying the statetransition probabilities that we calculated in Section 3.4. The aim of this sectionis to provide an example of how the transition probabilities can be used to modelthe properties of dissemination for a specific topology.

5.5.1 Replication

One node introduces a new item d into the network at time t = 0, by placingd into its cache. From that moment on, d is replicated as a consequence ofgossiping among nodes.

Let x(t) represent the percentage of nodes in the network that have d in theircache at time t, where each gossip round takes one time unit. The variation inx per time unit dx

dtcan be derived based on the probability that an item d will

replicate or disappear after an exchange between two nodes, where at least oneof the nodes has d in its cache:

dx

dt= [P (11|10) + P (11|01)] · (1− x) · x− [P (10|11) + P (01|11)] · x · x (5.3)

The first term represents duplication of d when a node that has d in its cacheinitiates the shuffle and contacts a node that does not have the item. The secondterm represents the opposite situation, when a node that does not have the itemd initiates a shuffle with a node that has d. The third and fourth term in theequation represent the cases where both gossiping nodes have d in their cache,and after the exchange only one copy of d remains. Substituting P (11|10) =P (11|01) = s

cc−sn−s

and P (10|11) = P (01|11) = sc

n−cn−s

c−sc

, we obtain

dx

dt= 2 · s

c· c− s

n− s· x · (1− n

c· x)

The solution of this equation, taking into account that x(0) = 1N

because onlyone of the N nodes in the network holds d initially, is

x(t) =eαt

(N − nc) + n

ceαt

(5.4)


where α denotes 2 sc

c−sn−s

.

By imposing stationarity, i.e. dxdt

= 0, we find the stationary solution cn

.Hence, this calculation confirms the observation in Section 2.2.3 that the net-work converges to a situation in which replication of d is c

n.

500 items

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

fraction o

f nodes

rounds


1000 items

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

rounds


2000 items

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000

fraction o

f nodes

rounds


Figure 5.5: Percentage of nodes in the network with a replica of item d in theircache, for N = 2500, c = 100, s = 50, and n = 500, n = 1000 or n = 2000.

We evaluate the accuracy of x(t) as a representation of the fraction of nodescarrying a replica of d, by running a series of experiments where N = 2500 nodesexecute the shuffle protocol, and their caches are monitored for the presence ofd. Unlike the experiments in Section 3.7, we assume full connectivity; that is,for each node, all other nodes are within reach. After 1000 rounds, where itemsare disseminated and replicated, a new item d is inserted at a random node, attime t = 0. We track the number of replicas of d for the next 1000 rounds. Theexperiment is repeated 100 times and the results are averaged. These simulationresults (average and standard deviation) for the protocol, together with x(t), arepresented in Figure 5.5. This figure shows that both curves have the same initial


increase in replicas after d has been inserted, and in all cases the steady statereaches precisely the expected value c

npredicted from the stationary solution.

Using the expression for replication, (5.4), we calculate the optimal size ofthe exchange buffer. That is, we repeat the calculation from Section 3.6, but nowagainst x(t), to determine which size of the exchange buffer yields the fastestconvergence to the steady-state for both replication and coverage. That is, wesearch for the s that maximizes the value of x(t). We first compute the derivativeof x(t) with respect to s (z(t, s)), and then derive the value of s that maximizesx(t), by taking z(·, m) = ∂x

∂s|m = 0:

z(t, s) =∂x

∂s=

2ekt(cN − n)(cn + s(−2n + s))t

(cN + (−1 + ekt) n)2(n− s)2

,

where k = 2 sc

c−sn−s

. Let z(t, s) = 0. For t > 0, c · n = s(2n − s). Solving this

equation we get s = n±√

n(n− c). Taking into the account that s ≤ n, the only

solution is s = n −√

n(n− c). This also coincides with the optimal exchangebuffer size found in Section 3.6.

5.5.2 Coverage

We use the term coverage to denote the percentage of nodes in the networkthat have seen item d from the moment it was introduced into the network. Lety(t) represent the coverage of d at time t. The variation in coverage per timeunit, dy

dt, is determined by the fraction of nodes that have not seen d, 1 − y, and

that interact with nodes that have d in their cache, x. Let∗∈ 0, 1, then:

dy

dt=

1

2· (P (1∗|01) · P (∗1|∗1) · (1 − y) · x (5.5)

+ P (∗1|10) · P (1∗|1∗) · x · (1− y))

where P (1∗|01) =

1∑

i=0

P (1i|01) and P (∗1|10) =

1∑

i=0

P (i1|10)

The first term of the equation represents increased coverage due to nodes dis-covering d after interacting with nodes that have d in their cache. This can occurwhen a node initiates the exchange (P (1∗|01)), or when the node is contacted(P (∗1|10)). Each of these scenarios has a 50% chance of occurring. The secondpart of these terms represents the case when a node discovers and does not giveaway its copy of d to another node within the same round. This is because cov-erage is measured only at the end of a gossip round, meaning that a node thatsees item d for the first time, and drops it in the same round, is considered notto have seen item d yet.c Since nodes shuffle, on average, twice per round (once

cThe reason for this is that in PeerSim a node in the shuffle protocol has an opportunity to readfrom the lower-level cache only once every round.


when they initiate the shuffle and again if they are contacted by a neighbor), thiscould occur under two scenarios:

i) the node acquired d by initiating an exchange with a node that had d (withprobability P (1∗|01)) and next lost its copy of d when shuffling with a nodethat contacted it, or

ii) the node was first contacted by a node that sent a copy of d (P (∗1|10)) andnext initiated a shuffle and gave away its copy of d.

The value of the probability P (∗1|∗1) can be calculated from the followingcases:a) the gossip partner of the node does not have d, and:

i) the state of two nodes does not change after the gossip (with probabilityP (01|01) · (1− x)), and

ii) the gossip partner obtains a copy of d after the gossip (P (11|01) · (1−x));or

b) the gossip partner of the node has d, and:i) two nodes have the same state after the exchange (P (11|11) · x), and

ii) the gossip partner loses its copy of d after the gossip (P (01|11) · x).Hence,

P (∗1|∗1) = (P (01|01) + P (11|01))·(1− x) + (P (11|11) + P (01|11))·x= P¬select + Pselect ·

(P¬drop ·(1− x) + Pselect ·x + P¬drop ·P¬select

)

Due to the symmetry of both gossiping nodes, P (∗1|∗1) = P (1∗|1∗).Substituting these probabilities into (5.5), we obtain

dy

dt=

s

c·(

1− s

c+

s

c· c− s

n− s

(2− s

c

)+

(s

c− c− s

n− s

)· x)· (1 − y) · x

The solution of this equation, taking into account that y(0) = 1N

, is

y(t) = 1− (N − 1) ·Nβ−1((

N − n

c

)+

n

c· eαt

)−β

· e−λ

where λ denotessc·( s

c− c−s

n−s)α·n

c

(1N− eαt

(N−nc)+ n

ceαt

), and β denotes

nc·κ+ s

c·( s

c− c−s

n−s )α·( n

c)2 ,

wherein κ is sc·(1− s

c+ s

c· c−s

n−s

(2− s

c

)).

By imposing stationarity dy

dt= 0, we find the stationary solution 1, meaning

that eventually all nodes will see d. In order to evaluate how closely y(t) modelscoverage, we use the traces from the simulations executed for Section 5.5.1. Atevery round, the nodes that carry a replica of d are identified, and a record ofthe nodes that have seen d since it was published is kept. Figure 5.6 presents thecoverage measured for three sets of experiments, each set with a different valuefor n. As n increases, a newly inserted item requires more time to cover thewhole network. This is due to having more competition from other items to cre-ate replicas in the limited space available, as was previously shown in Figure 5.5.


500 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

fraction o

f nodes

rounds


1000 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

rounds


2000 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

fraction o

f nodes

rounds


Figure 5.6: Percentage of nodes in the network that have already seen a replicaof item d, for N = 2500, c = 100, s = 50, and n = 500, n = 1000 or n = 2000.

However, as predicted by the stationary solution, in all cases the coverage even-tually reaches 1. As shown in Figure 5.6, the solution y(t) models the behaviorobserved in simulations in case of n = 2000 and n = 1000, falling nicely withinthe standard deviation of the simulation results. However, for smaller n (i.e.n = 500), the solution y(t) becomes less accurate, converging slower than thesimulation results. Next, we describe a more accurate, but also more compli-cated version of the coverage model for arbitrary number of contacts betweennodes.

5.5.3 Coverage Model Revisited

The coverage model in Section 5.5.2 has an implicit assumption that a nodeshuffles two times per round (once when it initiates the gossip and once when itis contacted by another node). Although this is true on average, what actually


happens is that a node initiates a shuffle once per round (with some randomneighbor), but can be contacted an arbitrary number of times (≤ N − 1). InFigure 5.7, it is depicted for k ≥ 1 which percentage of nodes, on average, shufflek times in a given round. The number of times a node shuffles in a round hasan impact on coverage. In this subsection, we revisit our coverage model to takeinto account that there is a probability distribution for the number of times anode is contacted.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10

fraction o

f nodes

number of shuffles

percentage of nodes

Figure 5.7: Distribution of nodes according to the number of shuffles they exe-cute per round, for a fully connected network of N = 2500 nodes.

We compute the probability that a node is contacted i times by other nodesin the network:

C(i) =

(N − 1

i

)(1

N − 1

)i (N − 2

N − 1

)(N−1)−i

, i ≤ N − 1

Namely, there are(N−1

i

)ways to choose i nodes in a network of N − 1 nodes

that contact the given node. Those i nodes contact the given node with proba-bility 1

N−1 each, and the remaining (N − 1) − i nodes do not contact the given

node with probability N−2N−1 .

A node may discover d during any of its shuffles. However, coverage is onlymeasured at the end of a gossip round, meaning that a node that sees item d forthe first time and drops it in the same round is considered not to have seen itemd at all.d Consequently, coverage increases only under the following conditions:

i) a node that has not seen item d obtains a copy of d during one of the contacts,and

ii) by the end of the round (after i exchanges), it still holds on to a copy of d.

dAgain, the reason for this is that in PeerSim a node in the shuffle protocol has an opportunity toread from the lower-level cache only once every round.


In order to model coverage, we need to express the probabilities Pget that a nodethat does not have a copy of d gets this copy in a shuffle, and Plose that a nodethat has a copy of d loses this copy in a shuffle.

Pget = P (1∗|0∗) = x·P (1∗|01) = x·(P (10|01) + P (11|01))

P¬lose = P (1∗|1∗) = x·P (1∗|11) + (1 − x)·P (1∗|10) (5.6)

= x·(P (10|11) + P (11|11)) + (1− x)·(P (10|10) + P (11|10))

and

P¬get = 1− Pget and Plose = 1− P¬lose

The increase in coverage from one round to the next can be modelled by iden-tifying all the possible cases where a node that has previously not seen item ddiscovers it by the end of the round. Let Φi express the probability that a nodethat does not hold d, does hold d after performing i shuffles. We have

Φi =

i−1∑

m=0

(1− Φm) · Pget · (P¬lose)(i−m)−1, i ≥ 0 (5.7)

where Φ0 = 0. Namely, the expression

(1− Φm) · Pget · (P¬lose)(i−m)−1

captures the case where a node does not have the item at the end of m < ishuffles (with probability 1 − Φm), then discovers it in the mth shuffle (withprobability Pget), and does not lose it in the remaining shuffles (with probability(P¬lose)

(i−m)−1).In a given round, only the fraction 1− y of nodes in the network that did not

yet see item d can contribute to an increase in coverage. Such a node is contactedi ≥ 0 times in the round with probability C(i), and then performs i + 1 shufflesin total. With probability Φi+1 the node will hold item d at the end of the round.Thus, coverage can be modelled by the equation

dy

dt= (1− y) ·

k∑

i=0

C(i) ·Φi+1 (5.8)

where k is the maximum number of times that a node is contacted in a round,i.e., k = N − 1. As before, y(0) = 1

N, as initially only one of the N nodes holds d.

For a network of 2500 nodes with full connectivity, the probability of a nodebeing contacted more than four times in one round is negligible (less than 1%).We therefore use the aforementioned coverage model with a limit of k = 4 toestimate the coverage and compare with our simulation traces. The results canbe seen in Figure 5.8.

Unlike the results from Figure 5.6, the current model not only falls into thestandard deviation of the shuffle simulation results, but also closely reproducesthe curve of average values in all three cases (n = 500, n = 1000 and n = 2000).

5.6. Difference Equation Model 77

500 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

fraction o

f nodes

rounds


1000 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

rounds


2000 items

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

fraction o

f nodes

rounds


Figure 5.8: Percentage of nodes in the network that have already seen a replicaof item d, for N = 2500, c = 100, s = 50, and n = 500, n = 1000 or n = 2000.

5.6 Difference Equation Model

We compared the model-checking and simulation results with the results ob-tained in Section 5.5. Modelling coverage as differential equation assumes thatthe fraction of nodes that held the item d in the past is continually changing,while it maintains at the same time the discrete notion of a round in equation(5.8) by including the number of contacts for each round (see (5.7)). Since theright-hand side of equation (5.8) is actually round-based, where a round takesone unit of time, we expressed coverage alternatively as a difference equationthat includes this assumption explicitly. Differential equations (5.8) and (5.3)


hence become

y(t+1) = y(t) + (1− y(t)) ·(

k∑

i=0

C(i) ·Φ(i + 1)

)

x(t+1) = x(t) + (P (11|10)+P (11|01))(1− x(t))x(t)

− (P (10|11)+P (01|11)) x2(t)

A problem with the difference equation is that it uses for all contacts the proba-bilities at time t, while the probabilities do change as nodes initiate contacts, asillustrated in Figure 5.1. Since these probabilities increase over time, we shouldexpect that this equational model underestimates the coverage.

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Pro

ba

bili

ty

Rounds

MDP maxDTMC

MDP minDifferential EquationDifference Equation

Figure 5.9: Comparison of results for PRISM models (gray) and with the resultsfor the equational models.

Example 5.3. We compared the model-checking results with the results obtained

in [28]. We implemented both equational models in MATLAB. The solution of the

differential equation was obtained by the numerical integration using the explicit

Runge-Kutta (4, 5) formula (the Dormand-Prince pair) [88]. For this example, we

considered a network with ten nodes, cache size c = 100, number of different data

items n = 500, and the size of the message s = 50. As shown in Figure 5.9, the

5.7. Fractions and Probabilities 79

coverage for the difference and differential equations roughly coincide. They are

lower than the result for the fair DTMC PRISM model, and they are mostly within

the MDP bounds. One potential cause for this observation was mentioned in the

previous paragraph, namely that the difference equation does not account for the

fact that the probabilities change during a round. In addition both equations treat to

some extent fractions as probabilities and probabilities as fractions. In the following

subsection we investigate which effects this might have.

5.7 Fractions and Probabilities

In equational models, the initial condition is expressed as a fraction of net-work that holds the observed item. Interpreting this fraction as a probability andvice versa is only applicable for models with a sufficiently large number of nodes.But even for large models these notions are fundamentally different. To illustratethis, it is best to consider the initial state of the equational model of the shuffleprotocol. All models assume that initially exactly one node holds item d. Thismeans that the fraction of nodes that hold the item initially is 1

N. Having one or

more nodes with d in their cache initially guarantees that the item will spreadthrough the network and that the coverage will eventually converge to 1. If incontrast each node would hold d with probability 1/N , it would mean that withprobability (1 − 1/N)N no node would hold the data item initially. In that case,the coverage would remain 0 forever. Even if the network size increases towardinfinity, we have that with probability

limN→∞

(1− 1

N

)N

=1

e

the coverage will remain constant 0, where e is the Euler constant. Thus, theoverall coverage will converge to 1− 1

e.

Regarding the initial state, all previous models assume consistently that afraction of 1

Nholds the item. However, in later rounds both equational models

treat fraction as probabilities and vice versa, which should result in a similar,maybe less pronounced, effect.

To investigate this influence we conducted another set of experiments. Weconsidered networks with 5, 10 and 15 nodes, where initially 1/5th of all nodeshold item d. Since the differential equation model has the most fuzzy notionof round, we compared it with the probabilistic PRISM model without roundspresented in Section 5.3.5.

Figure 5.10 depicts the results of the comparison. For the differential equationmodel we see that the network size has almost no impact. Only equations (5.6)and (5.7) incorporate the network size N . It seems that the increasing number ofpotential contacts in a round does not significantly increase the probability that anode acquires or loses an item. For the PRISM model we observe that, while theparameters s, n and c are the same for all models, and while all models assume


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30

Pro

ba

bili

ty

Rounds

DTMC, N=5DTMC, N=10DTMC, N=15

ODE, N=5ODE, N=10ODE, N=15

0.445

0.455

0.465

4

Figure 5.10: Comparison of the PRISM results with equational models, using thesame initial fraction.

that initially 1/5 of all nodes hold the data item d, it matters whether the networkhas 5, 10, or 15 nodes. The network with 5 nodes realizes the highest coverage,then the network with 10, and finally 15 nodes. As the number of nodes inthe PRISM model increases, and as probabilities behave more like fractions, thecoverage converges towards the results for the differential equation. This resultagrees with the theoretical result on the limiting behavior of the probabilisticsystems, such as epidemics (cf. [70, 136]).

5.8 Related Work

Several works have focused on model checking of gossip protocols, and stud-ied scheduling in the the context of gossip protocols.

Fehnker and Gao [84] investigated the impact of modelling choices on theanalysis of message dissemination in flooding and probabilistic broadcast proto-cols. In particular, the authors explored the effect of message collisions, lossychannels, and random delay in the gossip frequency of nodes in addition to anondeterministic execution order. For a comparison, the protocols were mod-elled in PRISM and implemented in MATLAB for Monte Carlo simulations.

Kwiatkowska et al. [138] presented the case study of a gossip peer sampling

5.9. Conclusion 81

service [123] with PRISM. The authors considered two properties of the protocol,the longest path between nodes and the time for the network graph to becomeconnected. To clarify the discrepancy between values of the best- and worst-casebehaviors, the authors studied the problem of scheduling. Due to the state spaceexplosion problem, their analysis is restricted to networks with three and fournodes.

In [46], a gossip protocol has been modelled as a DTMC, and implementedin PeerSim. The author analyzed the evolution of the indegree distribution ofnodes under two different schedulers. The first one is a random scheduler, inwhich each node has the same probability to perform a shuffle step. In the secondscheduler, one round ends when all nodes have initiated exactly one interaction.This study has shown how schedulers may affect the complexity of the modeland produce different results.

Differential equations have been previously used, for example in [99, 133,147], to model probabilistic protocols for large-scale distributed systems. More-over, they have been widely used in modelling epidemics (see Section 1.4.1).

5.9 Conclusion

This chapter considered a distributed algorithm for sharing data in a dis-tributed network to explore the different choices for PRISM models. It thencompared the PRISM results with the result of alternative approaches; two equa-tional models and a network simulator. While all models are based on a rigorousformalization of the algorithm – or, in the case of the simulator, on an actualimplementation – the results did not match, except the models from Section 5.5showing large-scale behavior. A careful analysis helped to find limitations ofPRISM, and uncovered hidden assumptions in the alternative modelling frame-works.

The PRISM models were designed with the state explosion problem in mind.We abstracted away the part of the algorithm that deals with pairwise commu-nication, and modelled it as an atomic probabilistic transition. This abstractionwas made possible by the analysis in Chapter 3 of transition probabilities of theshuffle protocol. The PRISM model covered only which nodes initiate in what or-der contact with other nodes to exchange information. This abstraction allowedus to model networks with up to 15 nodes, and instances of model checking tookup to 10 minutes on an Intel Core Duo CPU, 1.2 GHz, with 1 GB of RAM.

Under the assumption of full connectivity, we modelled the properties of thedissemination of a generic item with transition probabilities. Each property is ex-pressed as a formula which is shown to display the same behavior as the averagebehavior of the protocol.

This work uses PRISM to study the different scheduling policies, and to de-termine best- and worst-case bounds. However, the specification logic PCTL didnot let us define the exact equivalent of coverage. PCTL provides the capability


to study state probabilities, while coverage is best defined as an expected valueover CTL state formulae. Fortunately, this limitation was only relevant for MDPmodels; since all nodes symmetric and due to the assumption that the networkis completely connected, in the DTMC PRISM models PCTL alternative turns intothe exact equivalent of the coverage property.

Another limitation of PRISM and PCTL is that it does not allow to query stateprobabilities after k iterations. These state probabilities could have been usedto compare also other measures, such as replication. PRISM provides the transi-tion matrices to compute replication in, for example, MATLAB, but unfortunately,these sparse matrices were too large to manipulate. While this demonstrates theefficiency of the MTBDD encoding used in PRISM [112], it would be an evenmore useful tool if it could give state probabilities after k steps, if only for DTMCs.

A contributing factor to the difference between the PRISM and equationalmodels is that the latter treat fractions as probabilities and vice versa. We wereable to illustrate that this kind of models are more accurate the larger the numberof nodes is. PRISM distinguishes between probabilities and fractions inherently,since it captures fractions explicitly in the state of the model.

While we were able to specify networks with up to 15 nodes in PRISM, itshould be noted that the alternative models can be used for a few thousandnodes in equational models. The PRISM analysis of a relatively small networkproved useful nevertheless, since it helped to uncover simplifying assumptionsthat allow these models to deal with much larger networks, and thus with theinterpretation of the obtained results.

It would be interesting to extend presented results with equational models,assuming other network topologies than a fully-connected network. It is moredifficult to represent models of a non-uniform network because of limitationsof the PRISM specification language. For some networks, like a static torus, aself-implemented script (e.g., in Python) can assist one in generation of a PRISMspecification.

5.10. Appendix: Nondeterministic PRISM Model 83

5.10 Appendix: Nondeterministic PRISM Model

nondeterministic

const int c = 100;

const int s = 50;

const int n = 500;

const int N = 3;

global n1: bool init true;

global n2, n3: bool init false;

formula p0000 = 1;

formula p0101 = (c-s)/c;

formula p1001 = (s/c)*(n-c)/(n-s);

formula p1101 = (s/c)*(c-s)/(n-s);

formula p0110 = (s/c)*(n-c)/(n-s);

formula p1010 = (c-s)/c;

formula p1110 = (s/c)*(c-s)/(n-s);

formula p0111 = (s/c)*((c-s)/c)*((n-c)/(n-s));

formula p1011 = (s/c)*((c-s)/c)*((n-c)/(n-s));

formula p1111 = 1-2*(s/c)*((c-s)/c)*((n-c)/(n-s));

module node1

sent1: bool init false;

[] !sent1 & !n1 -> 1/(2)*(!n2?p0000:0): (n1’=false) & (n2’=false) & (sent1’=true)

+ 1/(2)*( n2?p0101:0): (n1’=false) & (n2’=true) & (sent1’=true)

+ 1/(2)*( n2?p1001:0): (n1’=true) & (n2’=false) & (sent1’=true)

+ 1/(2)*( n2?p1101:0): (n1’=true) & (n2’=true) & (sent1’=true)

+ 1/(2)*(!n3?p0000:0): (n1’=false) & (n3’=false) & (sent1’=true)



+ 1/(2)*( n3?p1101:0): (n1’=true) & (n3’=true) & (sent1’=true);

[] !sent1 & n1 -> 1/(2)*(!n2?p0110:0): (n1’=false) & (n2’=true) & (sent1’=true)

+ 1/(2)*(!n2?p1010:0): (n1’=true) & (n2’=false) & (sent1’=true)

+ 1/(2)*(!n2?p1110:0): (n1’=true) & (n2’=true) & (sent1’=true)



+ 1/(2)*( n2?p1111:0): (n1’=true) & (n2’=true) & (sent1’=true)

+ 1/(2)*(!n3?p0110:0): (n1’=false) & (n3’=true) & (sent1’=true)

+ 1/(2)*(!n3?p1010:0): (n1’=true) & (n3’=false) & (sent1’=true)

+ 1/(2)*(!n3?p1110:0): (n1’=true) & (n3’=true) & (sent1’=true)



+ 1/(2)*( n3?p1111:0): (n1’=true) & (n3’=true) & (sent1’=true);

[done] sent1 -> sent1’=false;

endmodule

module node2 = node1 [sent1=sent2,sent2=sent1,sent3=sent3,n1=n2,n2=n1,n3=n3] endmodule

module node3 = node1 [sent1=sent3,sent2=sent1,sent3=sent2,n1=n3,n2=n1,n3=n2] endmodule

Figure 5.11: Nondeterministic PRISM model of the shuffle algorithm for 3 nodes.

6Mean-field Framework

We employ mean-field approximation for an analytical evaluation of gossipprotocols. Nodes in the network are represented by small identical stochasticprocesses. Joining all nodes would result in an enormous stochastic process. Ifthe number of nodes goes to infinity, however, mean-field analysis allows us toreplace this intractably large stochastic process by a small deterministic process.This process approximates the behavior of very large gossip networks, and canbe evaluated using simple matrix-vector multiplications.

Traditionally, the mean-field method has been applied to the protocols withunidirectional communication. This chapter introduces a mean-field frameworkextended for the evaluation of push-pull protocols. We describe the necessarymean-field theory, and devise a simple analytical model for gossip-based infor-mation dissemination as an illustrative example. This example demonstrates themean-field theory for unidirectional communication models (called push-basedor pull-based in terms of gossip protocols). We further show how pairwise nodeinteraction can be modelled in the mean-field framework in the case of the shuf-fle protocol with the help of the state transition model from Chapter 3. Moreover,we present an analysis of a basic version of the gossip-based clock synchroniza-tion protocol GTP.

6.1 Introduction

We consider large-scale gossip networks where a large number of nodes inter-act. As mentioned before, the study of the emergent behavior of gossip protocolsdemands the consideration of large-scale networks [123]. Clearly, the large sizeof the network poses a serious challenge on modelling and analysis of a systemof interacting nodes, even if the behavior of a single node is relatively simple.As we have seen in the previous chapter, the analysis of gossip protocols withstandard verification tools is hard – it is, for example, beyond the capabilities ofcurrent probabilistic model-checking tools, as it is shown in the previous chapter.Moreover, when a large number of nodes interact in a connected environment,

85

86 Chapter 6 — Mean-field Framework

various phenomena emerge that cannot be explained in terms of the behavior ofa single node. We are therefore interested in going from a detailed local modelat node level to an abstract global model of the system.

In this work, we resort to so-called mean-field analysis. The stochastic processrepresenting the modelled system converges to a deterministic process if the sizeof the network tends to infinity (see, e.g., [48]). This convergence result providesan approximation for networks consisting of large numbers of interacting nodesthat are symmetrically defined. The system is modelled on an abstract level asthe distribution of nodes in the set of possible states; the mean-field method thenallows us to observe the evolution of this distribution over time.

The notion of “mean-field” is often used in the literature, with different mean-ings. The mean-field concept was first introduced in statistical mechanics (e.g.,[36]). It has been used in the context of Markov chain models of systems likeplasma and dense gases where the strength of the interaction between particlesis inversely proportional to the size of the system. A particle is seen as under acollective force generated by the other particles in a continuous time and spacesetting. Subsequently, the mean-field approximation enjoyed the attention of theneural networks community (e.g., [154]), and was used in the area of chemicalreaction networks (e.g., [135, 146]). Furthermore, a number of deterministiccontinuous models including mean-field approximation have been considered inthe theory of epidemics; for more details see, e.g., [6, 7, 148, 166].

In the area of communication networks, mean-field approximation has beenapplied in various forms to a variety of case studies, e.g., HTTP flows [11], TCPconnections [9, 10, 12, 171], bandwidth sharing [134], transportation networks[2], swarm robotic systems [142], reputation determination [48], queueing net-works [71, 126, 178] and Internet congestion control [124].

We show that the mean-field method is well-suited for a performance eval-uation of gossip protocols. Applying the method to these protocols is a naturalchoice, since gossip protocols are meant to operate in very large networks, con-sisting of interacting nodes that are symmetrically defined. In this chapter, wepresent a modelling framework for gossip protocols with pull, push and push-pulla communication models.

Related Work

In the context of gossip protocols, mean-field modelling has been consideredfor the analysis of the age of gossip [59], epidemic routing [185]. Several pre-vious papers on gossip protocols have used a notion of mean-value and infinitelimit for the asymptotic analysis of algebraic gossip [72, 169] and gossip-basedmembership protocols [46, 47].

Harwood and Ohrimenko [103] have proposed a model to analyse the effectof churn in a gossip network for the case of probabilistic broadcast. The authors

athat is, both nodes gossiping with each other share their local data

6.2. Gossiping Time Protocol 87

have derived equations of a message throughput for different message buffersize, message arrival rate, and number of message sources. The presented modelrelies on exponential distribution for the rate of nodes arrivals and departures.

Stojanovic et al. [169] analyzed and compared delay performance of networkcoding and cooperative diversity in a single-hop wireless network. The authorsperformed an asymptotic analysis (for the number of nodes N → ∞) of theexpected delay associated with the broadcasting of a file consisting of a certainnumber of packets.

We refer an interested reader to a survey on clock synchronization, for ex-ample, [156], and recent work on gossip-based clock synchronization protocol[90]. Our case study, the GTP protocol, assumes the presence of at least one clocksource. In the absence of external clock sources, there is a recent protocol, de-veloped in [32]. In [108], the combination of the Uppaal model checker and theproof assistant Isabelle has been used for analysis of a wireless synchronizationprotocol, developed by the company Chess.

6.2 Gossiping Time Protocol

Protocols based on epidemic and gossip concepts have found various practicalapplications [132], including nontraditional gossip applications [66], such asgossip-based clock synchronization. The Gossiping Time Protocol (GTP) [118,119] is a self-managing gossip time synchronization protocol for peer-to-peernetworks.

The protocol operates in a network of nodes, each equipped with a localclock, and assumes the presence of at least one node with accurate and robusttime in the network. Time is disseminated throughout the network by lettingnodes periodically gossip their clock samples. That is, each node periodically se-lects (initiates a gossip with) a random peer from the network to exchange timeinformation with. The initiating period is determined by a value of the gossipdelay parameter, which is the current delay between subsequent gossip interac-tions. The nodes subsequently exchange their local settings such that afterwardsthe node with the worse-quality time has adopted the higher-quality time of theother node. The protocol assumes a presence of the peer-sampling service [123],which allows a node to contact a uniformly randomly selected alive node.

In basic GTP, the quality of the time sample at a node is based on the distancefrom the time source to the node (hop count metric), that is, the number ofnodes on the synchronization path from the node to the time source. The timesource has hop count equal to 0. Completely unsynchronized nodes have a hopcount ∞. A gossiping node rejects the time sample if the hop count of its gossippartner is not smaller than its own. Furthermore, a node adopts a time sample ifit has not been synchronized for a long time. That is, if the difference betweenthe last update and the current time is larger than a timeout period, then a nodeaccepts a time sample even though it may degrade its time quality (with respect


to the hop count metric). Concisely, if the node decides to accept the sample, itsynchronizes its clock, and updates the values of the local variables. Namely, thenode records the value of current time as the time of the last clock update, andsets the hop count to the value of the gossip partner hop count incremented byone. The GTP protocol parameters described above are stored as the followinglocal variables, according to [118]: a gossip delay as GOSSIPING DELAY, a timeof the last clock update as LAST UPDATE, a hop count as TS DISTANCE, a timeoutas STANDALONE PERIOD .

Alternatively, each node may decide to adapt the rate at which it initiates atimestamp exchange (gossip frequency) based on its local settings. For instance,the better synchronized the node is, the lower the gossip frequency it may as-sume. In doing so, the gossip frequency gradually decreases when the networkis synchronized and stable. Note that dynamic gossip frequency is beyond basicGTP, and is a simple optimization idea for the gradual version of GTP. The moti-vation behind it is that even though the timestamp exchange can be implementedto be inexpensive, it still consumes resources.

We show how a mean-field framework can be applied to a nontraditionalapplication of gossip protocols, on the example of the basic GTP. That is, nodesexecute basic GTP based on an immediate clock adjustment model, and changegossip frequencies depending on a gossip delay. For the original protocol and itsdesign details, we refer to [118, 119].

6.3 Mean-field Modelling and Convergence

This section introduces the theory needed to apply mean-field results to gossipprotocols. We stay close to the presentation in [48], but change notation whenappropriate and simplify matters if possible in the gossip context.

6.3.1 Modelling and State Space

A discrete-time Markov chain (DTMC) is a stochastic process Y (t) | t ∈ Nthat takes values in a countable state space S. A DTMC obeys the Markov prop-erty, that is, the next state j ∈ S is independent of the past, given the presentstate il ∈ S:

PrY (t + 1) = j | Y (0) = i1, . . . , Y (t) = it = PrY (t + 1) = j | Y (t) = it.

We consider a system of N ∈ N interacting objects that are identically defined.The object with index n ∈ 1, . . . , N is represented by the discrete-time stochas-tic process

XNn (t) | t ∈ N

which takes values in the set S = 0, . . . , K − 1 where K = |S| is the number ofdifferent states.

6.3. Mean-field Modelling and Convergence 89

Example 6.1. In a gossip network, a node is represented by an interacting object.

As a running example we consider a simple information dissemination protocol. A

piece of information, e.g., the current time, is forwarded through the net. A node

can be in one of two states: either it already has the information (state 0) or it is

not yet informed (state 1). Hence, the state space for a node is S = 0, 1 with

|S| = K = 2. Let m0 be a fraction of informed nodes, and pN (m0) the probability

of moving from state 1 to state 0. Figure 6.1 shows a graphical representation of the

state-transition diagram describing such a node; the possible transitions and their

probabilities will be explained later in this section.

0 1

pN (m0)

1 1− pN (m0)

Figure 6.1: A single node transition

The complete system is composed of the N objects and is, consequently, alsodescribed by a discrete-time stochastic process:

Y N (t) =(XN

1 (t), . . . , XNN (t)

).

Its state space is SN which has |S|N = KN elements. For the mean-field conver-gence result we assume that we cannot distinguish objects that are in the samestate. It then suffices to keep track of the fraction of objects in each state. Thesefractions are collected in another stochastic process

MN (t) = (M0(t), . . . , MK−1(t))

called the occupancy measure. Its elements are defined as

MNi (t) =

1

N

N∑

n=1

1XNn (t) = i, i ∈ S,

where 1XNn (t) = i is equal to 1 if XN

n (t) = i, and 0 otherwise. Its state spaceSN

M ⊂ RK has

∣∣SNM

∣∣ =(

K + N − 1

K − 1

)

elements (the number of ways to distribute N objects over the K states they canbe in). One state from this state space is denoted

m = (m0, m1, . . . , mK−1) ∈ SNM ,

where mi is the fraction of nodes in state i.


Example 6.2. For the information dissemination example, the state space of the

occupancy measure is

SNM =

(k

N, 1− k

N

) ∣∣∣ k ∈ 0, . . . , N

.

Its size is∣∣SN

M

∣∣ =

(2 + N − 1

2− 1

)= N + 1.

Normally, the choice of the occupancy measure depends on the local param-eters of a node, which, in its turn, depends on the application of gossip.

6.3.2 Local Transition Probabilities

The evolution of the system of interacting objects is described by the local

transition probabilities of each object. The next state of any object not onlydepends on the current state of the object but also on the current occupancymeasure m:

PNi,j(m) = PrXN

n (t + 1) = j | XNn (t) = i, MN(t) = m, i, j ∈ S, m ∈ SN

M (6.1)

These probabilities are the same for all objects. They are gathered into the tran-sition probability matrix PN (m). These local transition probabilities determinethe unique transition probability matrix for the global system Y N (t), which is aDTMC because its next state (= occupancy measure) depends only on the currentstate.

In contrast to Example 6.1, the exchange of information between nodes canbe bidirectional as, for instance, in case of push-pull based gossip protocols. Inthese gossip protocols, both interacting nodes update their local states dependingon both of their current states.

Formally, two communicating objects (n1, n2) undergo a state transition ac-cording to

PN(i1,j1),(i2,j2)

(m) = PrXNn1

(t + 1) = j1, XNn2

(t + 1) = j2 |XN

n1(t) = i1, X

Nn2

(t) = i2, MN (t) = m (6.2)

where i1, i2, j1, j2 ∈ S, and m ∈ SNM . Analogous observations regarding pair-

wise interaction of objects in the mean-field models have been made in [41].In the pairwise interaction model, each transition probability PN

(i1,j1),(i2,j2)(m)

takes into account current occupancy measures Mi1(t) and Mi2(t) of both statesi1 and i2. Thus, at every time step t the value of the probability PN

(i1,j1),(i2,j2)(m)

is recomputed based on the current value of the occupancy measure Mi2(t).In the next sections, we model pairwise node interaction and pairwise state

update of the nodes. Using the pairwise interaction model (6.2) in the mean-field framework, we analyze the push-pull information dissemination protocol in


Section 6.4 and a basic version of GTP in Section 6.5. In this section, however, wekeep to the pull model of a node, introduced in Example 6.1, and the transitionequation (6.1).

Example 6.3. A node can only move from being uninformed (state 1) to being

informed (state 0). Afterwards it stays in state 0 forever, that is, it never forgets.

Suppose that in each time step a node A initiates a gossip interaction with proba-

bility g. It randomly chooses a partner node B among the N − 1 other nodes. If

B is already informed and A is not, A moves to state 0, so that we model a simple

pull protocol. Note that m0 is the fraction of informed nodes in the system and

m1 = 1 −m0 the fraction of uninformed nodes. The total probability for moving

from state 1 to state 0 equals

pN (m0) = PN1,0((m0, m1)) = g · m0 ·N

N − 1.

Here, m0 ·N is the number of informed nodes and m0 ·N/(N −1) is the probability

that a node chooses an informed node out of the N − 1 possible nodes (it does not

pick itself) as gossip partner. The complete probability matrix is then given by

PN ((m0, m1)) =

(1 0

pN(m0) 1− pN (m0)

).

For the global system, the probability to move from m0 informed nodes to m′0

informed nodes, for m′0 ≥ m0, equals

(m1 ·N

(m′0 −m0) ·N

)(pN (m0)

)(m′

0−m0)N (1− pN(m0)

)m′

1N,

where m1 = 1 −m0, m′1 = 1 −m′

0. This binomial expression is composed of the

number of possibilities to choose exactly the “missing” (m′0 −m0) ·N objects out of

the m1 ·N uninformed nodes; these then all have to take the transition to state 0,and all other m′

1 ·N nodes remain in state 1.

Consider now the occupancy measure MN (t) of the system at a given timet ∈ N. Recall that MN (t) is a random variable. For a given initial occupancymeasure m

N0 , there are two ways to determine the distribution of MN (t): first,

we can calculate the transient distribution analytically at time t, requiring tvector-matrix multiplications with a vector of size |SN

M |. Second, we can employdiscrete-event simulation to estimate the distribution. Often only discrete-eventsimulation is possible since, for large N , the size of the state space makes theanalytical computation of the transient probabilities practically infeasible. Buteven discrete-event simulation of this large DTMC is expensive.

Suppose the number of local states of nodes is s. Then the time complexityper step in the mean-field method is s2, i.e. the cost of matrix-vector multi-plication. For our simple example, the time complexity is only 2 · 2 = 4 per


time step. Already a single Monte Carlo simulation of the system with only 100nodes is much more expensive. Moreover, for a reasonable confidence intervalone needs to repeat the simulations for at least 100 times.b For more complexsystems, in general, to observe the emergent behavior may require several thou-sands of nodes and thousands of runs of the simulations, whereas the mean-fieldmethod requires only one run (independent of the number of nodes), providingthe limiting behavior of the system. Roughly speaking, the complexity is s2 forthe mean-field method compared to f(N) · r of the Monte Carlo simulations,where f(N) is a cost for simulating one run, which is at least linear with respectto the number of nodes N , and r is the number of runs. Note that f(N) canbe large; for example, the Monte Carlo simulations of the shuffle protocol for asingle pairwise communication costs c · log c (see Section 3.8 for details), wherec is the local storage size of a node.

00.20.40.60.81

0 0.01 0.02 0.03 0.04 0.05

Pr

MN 0

(10)≤

m0

m0

N=100, DTMCsimulationsimulationN=10000, DTMC andsimulationmean eldN=1000, DTMC andFigure 6.2: Distribution of MN

0 (10) for g = 0.1 and MN (0) = (0.01, 0.99)

Example 6.4. Figure 6.2 shows the analytically computed distribution of the frac-

tion of informed nodes at time t = 10, for gossip probability g = 0.1 and initial

occupancy measure

MN (0) = (0.01, 0.99).

The analytical results in the figure are obtained using mean-field analysis (the ver-

tical line at 0.025), and by using matrix-vector multiplications with a vector size

bIn general, it is difficult to estimate how many times Monte Carlo simulations have to be re-peated to obtain a reasonable confidence interval, especially if there are rare events with impact onthe system behavior.


|SNM |, as described above. Note that the distribution is “more deterministic” for

larger N .

We also simulated this simple dissemination protocol in a round-based fashion

similar to simulations in PeerSim [122]. In one round, which equals one time

step, each uninformed node gossips with probability g and picks a random peer.

If this peer is already informed, the number of informed nodes for the next round

is increased by one. Using 10000 independent runs for each curve, the resulting

distributions for MN0 (10) are also shown in Figure 6.2.

6.3.3 Convergence Result

At this point, the so-called mean-field convergence result applies. It capturesthe limiting behavior of the complete system if the number of objects N goes toinfinity and so provides an approximation for the occupancy measure for largeN . The requirement is that for all local states i, j ∈ S, all m ∈ R

K and N →∞:

PNi,j(m) converges uniformlyc in m to some Pi,j(m), which is a continuous

function of m.

If this requirement is satisfied, the occupancy measure converges almost surelyto a deterministic limit. This means that in case N → ∞, for each local state ithe fraction MN

i (t) of objects in state i at time t is known with probability one.

Theorem 6.1 (cf. [81]). Fix the initial occupancy measure to be identical for all

N ∈ N:

MN(0) = µ(0).

Define the limit of the local probability matrix:

P (m) = limN→∞

PN (m), m ∈ RK .

Define the deterministic process

µ(t + 1) = µ(t) · P (µ(t)).

Then for any t ∈ N,

limN→∞

MN (t) = µ(t), with probability 1,

that is, µ(t) is the deterministic limit occupancy measure for N →∞.

For large N we can now approximate the stochastic process for the occupancymeasure by this deterministic process.

cA sequence fN of real valued functions converges uniformly with limit f if for every ε > 0 thereexists a natural number n such that for all x and all N ≥ n we have |fN (x) − f(x)| < ε.


Example 6.5. The limit of the probability to move from state 1 to state 0 is

p(m0) = limN→∞

g · m0 ·NN − 1

= g ·m0,

which is continuous in m0. The requirement for the application of the mean-field

convergence result is thus satisfied. If we set

µ(0) = (0.01, 0.99)

and g = 0.1, the deterministic limit for time t = 10 is

µ(10) = (0.0256, 0.9744),

which is computed by ten matrix-vector multiplications. It is indicated by the vertical

line for m0 in Figure 6.2.

6.3.4 A Methodology for the Mean-field Analysis of Gossip

Protocols

We summarize how mean-field analysis can be used for the performance eval-uation of gossip protocols. Our methodology consists of the following steps:Step 1 – System description The specification of a system helps to obtain not

only a better (more modular) description, but also a clear understandingand an abstract view of the system. In general, it is hard to give a full speci-fication of a system or protocol under study. Such a study is usually done ona simplified system model of the actual protocol: one has to decide whichcharacteristics of the protocol should be studied, and which parameters ofthe protocol should be modelled in order to study these characteristics. Inorder to simplify the system model, assumptions should be made. Theseassumptions should be supported by experimental study.

Step 2 – Identification of local states and transitions This step requires iden-tification of the set S of local states of a node. The states should reflectall relevant situations a node can be in. Transitions between local statesusually occur because of gossip interactions.

Step 3 – Transition probabilities The (local) transition probabilities depend onthe global state of the gossip network model. The probabilities have to beinvestigated thoroughly. A node might also behave intrinsically in a prob-abilistic way. At the end of this step stands a directive of how to computethe transition probability matrix depending on the current global state.

Step 4 – Mean-field convergence requirements Only if the local transition prob-abilities converge appropriately for N → ∞, can we successfully apply themean-field convergence theorem.

6.4. A Mean-field Model for the Shuffle Protocol 95

Step 5 – Mean-field limit Finally, we can compute the mean-field limit for ourmodel using Theorem 7.1. With the obtained results we can test and com-pare different designs.

6.4 A Mean-field Model for the Shuffle Protocol

We apply mean-field theory to the shuffle protocol, using the pairwise inter-action model of Chapter 3. The specification of the shuffle protocol in Chapter 2and [94] corresponds to Step 1 in our mean-field framework. Steps 2 and 3 arecovered by the following subsections (6.4.1–6.4.3). Finally, Section 6.4.4 coversSteps 4 and 5.

We demonstrate how the analytical model, presented in Chapter 2 for pair-wise node interaction, can be used for our mean-field framework.

6.4.1 State Space

We use the abstraction of the shuffle protocol, the pairwise interaction model,and analyze the spread of a distinguished item in the network. The state of anode in the shuffle protocol is defined by a pair (g, d). The first component gdenotes the gossip delay. It defines the speed with which nodes communicatewith each other. When the gossip delay reaches zero a node initiates a shuffle.The second element d is a two-valued integer (bit) indicating whether a nodepossesses the item.

The state space of a node is

S = 0, . . . , Gmax × 0, 1,

and the size of the state space is

|S| = (Gmax + 1) · 2,

where Gmax is the maximal gossip delay. Note that we consider a discrete timemodel, that is, the system proceeds in discrete time steps t ∈ N.


The behavior of a node is based on its state and the current occupancy mea-sure m. The expression (g, d | g > 0) denotes any state where g > 0 and d ischosen arbitrarily from the set 0, 1. The expression m(g,d|g≤g′) is an examplefor the abbreviation of a sum of occupancy fractions, calculated as:

m(g,d|g≤g′) =

g′∑

g=0

m(g,0) +

g′∑

g=0

m(g,1).


The transition probability

PN(g,0|g>0),(g−1,0)(m)

denotes the transition probability from the state (g, 0) to state (g − 1, 0) for anyg > 0.

Figure 6.3 depicts the two-dimensional diagram for a single node transition.According to our model in Section 6.4.1, each possible state of the node is definedby its gossip delay and by the presence of the item at the node. The x-axisin Figure 6.3 corresponds to the value of the gossip delay g, and the y-axis tothe value of d. For all transitions represented by the arrow, we calculate theirrespective state transition probabilities.

Active Nodes

Periodically, a node A initiates the interaction with a random peer B. Theperiod is governed by the gossip delay g, and the length is equal to Gmax. Thus,whenever g of a node A reaches 0, the node becomes active and initiates theinteraction with a peer B, selected at random among N − 1 nodes (excluding Aitself).

According to the shuffle protocol, two gossiping nodes cannot become in-volved in another interaction until the current contact is finished. We addressthis assumption by introducing the notion of a collision. Namely, once activenode A randomly selected B as its gossip partner, the probability that no colli-sion occurs, nocN (m), ensures that other active nodes select neither A nor B.

· · ·

· · ·

. . .

...

d

1

0

gGmax10

Figure 6.3: The state transitions of a node


Thus, the probability that no collision occurs is equal to:

nocN (m) =

(N − 3

N − 1

)m(0,d)·N−1

(6.3)

That is, all active nodes m(0,d) ·N excluding A choose one of the remaining N−3nodes as their gossip partner, excluding A, B and the node itself.

We calculate the transition probabilities

PN(0,d),(Gmax,d′)(m)

for all combinations of d, d′ ∈ 0, 1. An active node is in one of two possiblestates:(a) the node holds the item (i.e. state (0, 0)), or(b) the node does not have the item (i.e. state (0, 1)).In either case, the node will either not have the item after the shuffle or obtainthe item during the exchange. That is, the new state after the interaction is either(Gmax, 0) or (Gmax, 1).

If an active node does not hold the item before the contact, the value of dremains 0 in the following three cases:(1) collision occurred while talking to a passive node,(2) the chosen peer is active itself, or(3) independent of the peer state, a node did not obtain the item after the con-

tact (by the probabilities P (00|00) and P (01|01)).

PN(0,0),(Gmax,0)(m) =

1∑

i=0

m(g′,i|g′>0) ·NN − 1

· (1− nocN (m))

+

1∑

i=0

m(0,i) ·N − 1 + i

N − 1(6.4)

+

1∑

i=0

m(g′,i|g′>0) ·NN − 1

· P (0i|0i) · nocN (m)

First, an active node selects a passive node with probability

m(g′,i|g′>0) ·N/(N − 1),

but the interaction fails due to a collision that occurs with probability 1−nocN (m).Second, the active node chooses another active node with probability

(m(0,0|g′>0) · (N − 1) + m(0,1|g′>0) ·N)/(N − 1),

and thus, the contact fails. Third, the active node chooses a passive node withprobability

m(g′,i|g′>0) ·N/(N − 1)


and successfully communicates with probability nocN (m). The node does notget the item with probability P (00|00), if the peer did not have the item as well,and with P (01|01), if the peer had the item.

The item is obtained after the interaction, if a passive peer holds the itembefore the interaction, according to the probabilities P (11|01) and P (10|01):

PN(0,0),(Gmax,1)(m) =

m(g′,1|g′>0) ·NN − 1

·1∑

j=0

P (1j|01) · nocN (m) (6.5)

That is, an active node chooses a passive peer that had the item with probability

m(g′,1|g′>0) ·N/(N − 1),

and after the communication without collision nocN (m), the active node obtainsthe item with probability P (11|01) + P (10|01).

If the active node holds the item before interaction, the value of d stays as 1,because(1) a collision occurred while talking to passive nodes,(2) the chosen peer is an active node, or(3) the item stays in the local cache, independent of whether the peer possesses

the item before or after the shuffle, determined by the probabilities P (10|10),P (11|10), P (10|11) and P (11|11).

Thus, the probability of this transition is

PN(0,1),(Gmax,1)(m) =

1∑

i=0

m(g′,i|g>0) ·NN − 1

· (1− nocN (m))

+

1∑

i=0

m(0,i) ·NN − 1

(6.6)

+

1∑

i=0

m(g′,i|g′>0) ·N − i

N − 1·

1∑

j=0

P (1j|1i) · nocN (m)

However, the item is given away by an active node after the exchange with prob-ability

PN(0,1),(Gmax,0)(m) =

1∑

i=0

m(g′,i|g′>0) ·NN − 1

· P (01|1i) · nocN (m) (6.7)

since, either(1) the active node chooses a passive peer that did not have the item with the

probability m(g′,0|g′>0) ·N/(N − 1), successfully communicated with proba-bility nocN (m), and passed the item to the peer with probability P (01|10);or,

(2) the active node chooses a passive peer that had the item with probabilitym(g′,1|g′>0)·N/(N−1), successfully communicated with probability nocN (m),and dropped its copy of the item with probability P (01|11).


Passive Nodes

A passive node, that is, a node with g > 0, can be contacted by an activepeer, resulting in an update of its state. In all cases, the gossip delay is decreasedby one, counting down to the next gossip initiation. Similar to active nodes, wedistinguish four main transitions, depending on whether the item is at the nodebefore or after gossiping.

The probability that a passive node does not obtain the item after the gossipis equal to the following sum:

PN(g,0|g>0),(g−1,0)(m) =

1∑

i=0

m(0,i) ·NN − 1

· (1 − nocN (m))

+

(1−

1∑

i=0

m(0,i) ·NN − 1

)(6.8)

+

1∑

i=0

m(0,i) ·NN − 1

· P (i0|i0) · nocN (m)

The sum consists of the three components:(1) the probability m(0,i) ·N/(N − 1) that the node is chosen by an active node,

but a collision occurred with probability 1− nocN (m);(2) the probability that a node is not chosen as a peer by any active node;(3) the probability m(0,i) · N/(N − 1) to be chosen by an active node, and af-

ter successful communication (with probability nocN (m)), not to obtain theitem with probability P (00|00), if the active node did not have the item, andwith P (10|10), otherwise.

A passive node obtains the item with probability

PN(g,0|g>0),(g−1,1)(m) =

m(0,1) ·NN − 1

·1∑

j=0

P (j1|10) · nocN (m) (6.9)

that is, the probability that the node is chosen by an active peer that holds theitem m(0,1) ·N/(N − 1), successfully communicates according to the probabilitynocN (m), and the node obtains the item with probability P (01|10) + P (11|10).

The remaining two probabilities cover the case when a passive node has theitem before the interaction. It can either keep the item or give it away as theresult of an interaction with an active peer with the following probabilities.

The probability that a passive node keeps the item after the interaction iscomputed as the sum

PN(g,1|g>0),(g−1,1)(m) =

1∑

i=0

m(0,i) ·NN − 1

· (1− nocN (m))


+

(1−

1∑

i=0

m(0,i) ·NN − 1

)(6.10)

+

1∑

i=0

m(0,i) ·NN − 1

·1∑

j=0

P (j1|i1) · nocN (m)

First, the node is chosen by an active node with probability m(0,i) · N/(N − 1),but a collision occurred during the interaction with probability 1−nocN (m). Thesecond summand expresses the probability that the passive node is not selectedby any active node as a peer. Third, after the successful communication with anactive peer with probability

m(0,i) ·N/(N − 1) · nocN (m),

the passive node keeps the item with probability P (01|i1) + P (11|i1), where iindicates whether the active peer has the item or not.

The probability that a node gives away its copy of the item is calculated as:

PN(g,1|g>0),(g−1,0)(m) =

1∑

i=0

m(0,i) ·NN − 1

· P (10|i1) · nocN (m) (6.11)

Namely, after the successful communication with an active peer with probability

m(0,i) ·N/(N − 1) · nocN (m),

the passive node drops its copy of the item with probability P (10|01), if the peerdid not have the item, and P (10|11), otherwise.

6.4.3 Coverage Property of the Protocol

The model, described in Section 6.4.1-6.4.2, is sufficient to perform the anal-ysis of the shuffle protocol based on the replication property. Recall that replica-tion expresses the fraction of nodes that possess a copy of the data item at a giventime. Other properties of the shuffle protocol include coverage, i.e., the fractionof nodes that have seen the item within the given period.

In order to analyze the performance of the protocol according to coverage,we need to extend the state space with an additional parameter, o ∈ 0, 1. Theparameter o = 0, if a node has not held the item previously, and o = 1, otherwise.Once the item is acquired in one of the exchanges, o stays equal to 1. Thus, thestate of a node is a triple (g, d, o). The state space of the extended model is

S = 0, . . . , Gmax × 0, 1 × 0, 1

with the size |S| = (Gmax + 1) · 4.The transition probabilities for the replication model have to be adapted ac-

cording to the new state space. Clearly, if a node holds the item (d = 1), the node


becomes informed about the item (o = 1). Moreover, once a node has obtainedthe item, it cannot ‘forget’ even after losing the copy during one of the exchanges:

PN(g,d,1|g>0),(g−1,d′,0)(m) = PN

(0,d,1),(Gmax,d′,0)(m) = 0

Thus, several transition probabilities are computed in a similar way. The transi-tion probabilities for active nodes from (6.4), (6.7), and (6.6) remain:

PN(0,0,o),(Gmax,0,o)(m) = PN

(0,0),(Gmax,0)(m), o ∈ 0, 1PN

(0,1,1),(Gmax,0,1)(m) = PN(0,1),(Gmax,0)(m)

PN(0,1,1),(Gmax,1,1)(m) = PN

(0,1),(Gmax,1)(m)

All four probabilities cover the transitions, where a value of the parameter odoes not change. Likewise, the corresponding transition probabilities for passivenodes from (6.8), (6.11), and (6.10) remain:

PN(g,0,o|g>0),(g−1,0,o)(m) = PN

(g,0|g>0),(g−1,0)(m), o ∈ 0, 1PN

(g,1,1|g>0),(g−1,1,1)(m) = PN(g,1|g>0),(g−1,1)(m)

PN(g,1,1|g>0),(g−1,0,1)(m) = PN

(g,1|g>0),(g−1,0)(m)

We distinguish the transitions in which a node acquires the item for the first time,and in which a node that held the item previously obtains it again. Although wedifferentiate these cases, the transition probabilities are equal to the probabilitythat a node obtains a copy of the item of (6.9) and (6.5):

PN(g,0,0|g>0),(g−1,1,1)(m) = PN

(g,0,1|g>0),(g−1,1,1)(m) = PN(g,0|g>0),(g−1,1)(m)

PN(0,0,0),(Gmax,1,1)(m) = PN

(0,0,1),(Gmax,1,1)(m) = PN(0,0),(Gmax,1)(m)

Note that in all probabilities the occupancy measure m(g,0) for any g is nowtreated as the sum m(g,0,0) + m(g,0,1), and m(g,1) for any g simply becomesm(g,1,1). The same applies to the probability that no collision occurs:

nocN (m) =

(N − 3

N − 1

)(m(0,0,0)+m(0,0,1)+m(0,1,1))·N−1

.

That is, given two nodes interacting with each other, all other active nodes choosepeers different from the gossiping pair.

6.4.4 Mean-field Limits

In the limit, the value of the probability nocN (m) for the replication model is

noc(m) = limN→∞

nocN (m) = limN→∞

(N − 3

N − 1

)m(0,d)·N−1

= e−2·m(0,d) ,


where m(0,d) denotes the sum m(0,0) + m(0,1), and for the coverage model is:

noc(m) = limN→∞


(N − 3

N − 1

)m(0,d,o)·N−1

= e−2·m(0,d,o) ,

where

m(0,d,o) = m(0,0,0) + m(0,0,1) + m(0,1,1).

We note that in the transition probabilities for active and passive nodes calculatedabove, factors like N/(N − 1) converge to 1 if N →∞.

6.4.5 Comparison with Previous Results

We compare the results of our mean-field model with the results of replicationand coverage observed while running the actual protocol in a large-scale deploy-ment. In case of the shuffle protocol, we simulate the network of 2500 nodes ona single workstation, using a Java-based implementation. Each node has localstorage size c = 100, and exchange message size s = 50. All nodes are randomlydivided into 10 different groups with the different values of gossip delay. Theperiod between two consecutive contact initiations is fixed and set to 10. Thenodes from the group with g = 0 are active nodes that initiate simultaneously anexchange with the peers chosen uniformly at random. If a node contacts two gos-siping nodes, the interaction between all three nodes fails. Initially a new itemis introduced at one node in a network, in which n = 500 items are uniformlydistributed over the local storage of all nodes. After each step, we measure thetotal number of copies of the distinguished item in the network, and the numberof nodes that have seen this item. Each simulation curve in Figure 6.4 representsthe average calculated over 500 runs.

Relating our mean-field model to these settings, we set the fraction of nodesthat possess the item initially to 1/2500. The maximum gossip delay Gmax = 9.All nodes have a gossip delay uniformly distributed between 0 and Gmax. Eachdiscrete time step, we record the values of the occupancy measure m.

Figure 6.4(a) shows the evolution of the number of copies of the distinguisheditem in the network over time. For the mean-field model we have multiplied thefraction of nodes with d = 1 by 2500 to compare with the simulation curve. Bothcurves of simulation and mean-field model are quite close, with a small differencebetween steps 650 and 1300. Nevertheless, the mean-field curve falls nicelywithin the standard deviation of the simulation results. Both curves approachthe same stable state c

n= 500 items after 1300 steps.

Figure 6.4(b) shows the evolution of the total number of nodes that have heldthe distinguished item at some moment in time. In the mean-field model, we sumthe fraction of nodes that are in state (g, d, 1) for all g and d. Like replication,in the beginning both curves grow at the same rate for around 650 steps, after


0

100

200

300

400

500

0 500 1000 1500 2000 2500 3000

num

ber

of re

plic

as

time

mean-fieldsimulation

(a) Replication

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500 3000

fraction o

f nodes

time


(b) Coverage

Figure 6.4: Comparison with simulation results for a network of 2500 nodes.

which there is a slight difference that accumulates over time. Still, the mean-field curve falls within the standard deviation of the simulation results, and bothcurves reach the stable state 1 by 1500 steps.

The difference between the simulation and mean-field curves in the networkof 2500 nodes arises from the fact that the distribution of the items slightly fluc-tuates around the average 500 (average deviation is 20). As a consequence theuniform distribution assumed in our model is not perfect in practice. However,when considering larger networks (i.e. 25000 nodes) the simulations come veryclose to the behavior predicted by the mean-field analysis; see Figure 6.5.

In Chapter 5, the pairwise interaction model is used for the differential equa-tions for replication and coverage, presented in Section 5.5. Thus, we compareour mean-field model with the differential equations for both replication (see

0

100

200

300

400

500

0 500 1000 1500 2000 2500 3000

num

ber

of re

plic

as

time


(a) Replication

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500 3000

fraction o

f nodes

time


(b) Coverage

Figure 6.5: Comparison with simulation results for a network of 25000 nodes.


Figure 6.6(a)) and coverage (see Figure 6.6(b)). Due to the deterministic natureof both models, we are interested to see how different the results are. We im-plemented the differential equations in MATLAB; the solutions of the differentialequations are obtained by numerical integration. The number of possible con-tacts by a node per round is limited to 4. To take into account the fact that theequational models disregard the possibility of collisions, we increase the maxi-mal gossip delay for the mean-field analysis to Gmax = 200. Intuitively, thereare N/Gmax nodes that have the same gossip delay g, and nodes with g = 0 talksimultaneously. By increasing Gmax and initially distributing all nodes uniformlyat random into states (g, d, o) with different g, fewer nodes become active at thesame time, which leads to more sequential initiation of a gossip by each node. ForGmax = 200 and N = 2500, 2500/200 = 12.5 nodes have the same g compared to2500/10 = 250 nodes, thus having a smaller chance that an active node contactsanother active node or a communicating pair than in the latter case. Formally,since the equational models do not include the notion of collisions, we have tomaximize the probability of no collisions noc(m) in the mean-field model. Maxi-mizing the value of the probability of no collision e−2·m(0,d,o) implies minimizingm(0,d,o) since e > 1, which can be achieved by increasing Gmax. The maximalgossip delay Gmax = 200 keeps the probability of no collisions in the mean-fieldmodel fairly small, i.e. 1/e25, while preserving the relatively small size of thestate space, i.e. 201× 2× 2 = 804.

We assume that one round of the differential equation corresponds to 200steps in the mean-field model, because in case of the equational models, everynode has to initiate a gossip once per round.

Figure 6.6(a) shows the comparison of both models for replication, in terms ofthe evolution of the fraction of nodes that have seen the distinguished item overtime. The differential equation curve overlaps nicely with the mean-field curve.Figure 6.6(b) presents the comparison of two models for the coverage property.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 50 100 150 200 250 300

fraction o

f nodes

rounds

mean-fieldODE

(a) Replication

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250 300

fraction o

f nodes

rounds

ODEmean-field

(b) Coverage

Figure 6.6: Comparison of the deterministic models.

6.5. A Mean-field Model for Basic GTP 105

Both curves are very close to each other, with a tiny difference starting afteraround 60 rounds; both curves settle to the same stable state after 80 rounds.The difference is explained by the assumed value for the number of maximumcontacts by a node per round, and a small collision probability 1−1/e25 comparedto 0 of the differential equation.

6.5 A Mean-field Model for Basic GTP

A detailed description of basic GTP can be found in Section 6.2 and in [118,119]. This corresponds to Step 1 in our methodology. Steps 2 and 3 are ac-complished in the following three subsections (6.5.1–6.5.2). Section 6.5.3 cor-responds to Steps 4 and 5. We compare the obtained mean-field model againstthe emulation results from [118] in Section 6.5.4. We perform an analysis ofproperties of basic GTP in Section 6.5.5, based on the mean-field model.

6.5.1 State Space

The state of a node in a basic GTP network is given by a triple (g, l, h). The firstcomponent g denotes the gossip delay, which defines the frequency of initiatinga contact. When the value of g becomes equal to zero, a node initiates a gossipinteraction. The second component l represents a counter for the last update. InGTP, the time of the last clock update is stored and, if necessary, compared tothe current time. If the difference exceeds the standalone period, an update isenforced. We replace it with a counter which is set to the length of the standaloneperiod at every update. Reaching zero, a clock update is enforced at the nextinteraction. Finally, h is the number of hops the timing information has travelledfrom the time source.

Let Gmax be the maximal gossip delay, and let L be the standalone period.We introduce H to be the maximal hop count. A node in a state with h = H hasa hop count of at least H . A node with h = ∞ is said to be unsynchronized. Thestate space of a single node then is

S = 0, . . . , Gmax × 0, . . . , L × 0, . . . , H,∞,which is of size |S| = (Gmax + 1)(L + 1)(H + 2).

Gossip Delay

Though basic GTP has a fixed gossip delay, we design the model in such away that it allows for the gossip delay to vary, depending on the hop count of anode. We assume that there is a function

G : 0, . . . , H,∞ 7→ 0, . . . , Gmaxthat gives the gossip delay G(h) for any hop count h. As described in Section 6.2,the varying gossip delay is an optimization feature of gradual GTP.



The behavior of a single node is determined by its state and the current oc-cupancy measure m. In the sequel, we again use a kind of pattern matchingnotation: for example, (g, l, h | g > 0) denotes any state where g > 0 whilel and h are chosen arbitrarily from their respective value sets. The expressionm(g,l,h|h<H) is an example for the abbreviation of a sum of occupancy fractions,defined by

m(g,l,h|h<H) =∑

g

∑

l

∑

h<H

m(g,l,h).

Time Sources

We begin with the description of the behavior of a time source, that is, a nodein a state with h = 0. Time sources never update their clock, hence, component lhas no meaning and we always set it to L. If the gossip delay is larger than zero(g > 0), we just decrement it by one. If it is equal to zero, the gossip delay isreset to G(0).

PN(g,l,0|g>0),(g−1,L,0)(m) = 1

PN(0,l,0),(G(0),L,0)(m) = 1.

As one can see, time sources act independently of their environment (as depictedin Figure 6.7).

Active Nodes

If the gossip delay g of a node A is equal to zero, it becomes active andinitiates a gossip interaction with a peer B randomly chosen from the remainingN − 1 nodes. In this interaction, the clock of A might get updated. In GTP, aninteraction is discarded if during its course there has been another interactionleading to an update of the clock. In the model we require that for each nodeonly one interaction can be active, otherwise we say that there is a collision.An update can take place only if the interaction prevails, that is, no collisionoccurs. After A has chosen a suitable peer B, the probability nocN (m) thatthere is no collision is given by the probability that all other active nodes selectpeers different from A and B. The probability that a node chooses neither Anor B (given that it does not try to interact with itself) is (N − 3)/(N − 1). We

(0, L, 0) (1, L, 0) · · · (G(0), L, 0)111

1

Figure 6.7: Transitions of time sources.


consequently have

nocN (m) =

(N − 3

N − 1

)m(0,l,h)·N−1

.

We further have to distinguish between nodes with an enforced update (l = 0)and those without (l 6= 0). If an update is enforced, the clock will be updatedas long as the peer is synchronized, having a hop count h′ different from ∞. Ifthe update is optional, the clock value is only changed if this does not increasethe hop count, that is, if h′ < h. In either case, the new state after a successfulupdate is (G(h′ + 1), L, h′ + 1). The probability to select a passive peer with hopcount h′ is

m(g′,l′,h′|g>0) ·N/(N − 1)

and so the probability of a successful update is

PN(0,0,h|h>0),(G(h′),L,h′+1)(m) =

m(g′,l′,h′|g′>0) ·NN − 1

· nocN (m), ∀h′ <∞,

PN(0,l,h|l>0,h>0),(G(h′),L,h′+1)(m) =

m(g′,l′,h′|g′>0) ·NN − 1

· nocN (m), ∀h′ < h.

Clearly, the successful update takes place if no collision occurred during the in-teraction (with the probability nocN (m)). If h =∞ and the active node choosesa passive peer with hop count H−1 or H , the probability of the successful updateis:

PN(0,l,∞),(G(H),L,H)(m) =

1∑

i=0

m(g′,l′,H−i |g′>0) ·NN − 1

· nocN (m),

since the interaction occurred with a synchronized peer.The case remains where the interaction does not lead to an update of the

clock. This can happen if(1) a collision occurs when gossiping with a passive node,(2) an active node is selected as peer, also leading to a collision, or(3) the interaction has prevailed but the peer cannot provide a suitable hop

count; that is, the peer is an unsynchronized node if l = 0, and h′ ≥ h ifl > 0.

We again distinguish enforced and optional updates and get the probabilities:

PN(0,0,h|h>0),(G(h),0,h)(m) =

m(g′,l′,h′|g′>0) ·NN − 1

· (1− nocN (m))

+m(0,l′,h′) ·N

N − 1+

m(g′,l′,∞|g′>0) ·NN − 1

· nocN (m),

PN(0,l,h|l>0,h>0),(G(h),l−1,h)(m) =

m(g′,l′,h′|g′>0) ·NN − 1

· (1− nocN (m))

+m(0,l′,h′) ·N

N − 1+

m(g′,l′,h′|g′>0,h′≥h) ·NN − 1

· nocN (m).


We treat the case where an active node with hop count H contacts anothernode with hop count H as an unsuccessful update, since in our model H sub-sumes all hop counts h ≥ H of the synchronized nodes of the actual protocol.The calculation of the probability of the successful update in this case requiresknowledge of the actual distribution of nodes with these hop counts, which wedo not know a priori.

Passive Nodes

A passive node with g > 0 has to be contacted by an active peer with hopcount h′ to be able to update its hop count to h′ + 1. This happens with prob-ability m(0,l,h′) · N/(N − 1). The gossip delay is decreased by one in all cases,shortening the time until the next gossip initiation. Following the same line ofargumentation as for active nodes, the probabilities for successful interactionsare

PN(g,0,h|g>0,h>0),(G(h′+1),L,h′+1)(m) =

m(0,l′,h′) ·NN − 1

· nocN (m), ∀h′ <∞,

PN(g,l,h|g>0,l>0,h>0),(G(h′+1),L,h′+1)(m) =

m(0,l′,h′) ·NN − 1

· nocN (m), ∀h′ < h.

That is, the probability of being selected by an active peer with hop count h′ ism(0,l′,h′) · N/(N − 1), and the probability that no collision occurred during theinteraction is nocN (m).

If h = ∞ and the passive node is chosen by an active peer with hop countH − 1 or H , the probability of the successful update is:

PN(g,l,∞|g>0),(G(H),L,H)(m) =

1∑

i=0

m(0,l′,H−i) ·NN − 1

· nocN (m),

because the passive node is contacted by a synchronized peer.The probability of not updating the clock is again composed of three terms:

(1) the probability of having a collision during the interaction with an activenode,

(2) the probability of not being chosen as a peer by any active node, and(3) the probability of having an interaction with an active peer that does not

have a suitable hop count: either an unsychnronised node if l = 0, or withhop count h′ ≥ h if l > 0.

Hence,

PN(g,0,h|g>0,h>0),(g−1,0,h)(m) =

m(0,l′,h′) ·NN − 1

· (1 − nocN (m))

+

(1− m(0,l′,h′) ·N

N − 1

)+

m(0,l′,∞) ·NN − 1

· nocN (m),

PN(g,l,h|g>0,l>0,h>0),(g−1,l−1,h)(m) =

m(0,l′,h′) ·NN − 1

· (1 − nocN (m))


+

(1− m(0,l′,h′) ·N

N − 1

)+

m(0,l′,h′|h′≥h) ·NN − 1

· nocN (m).

Again, we treat the case where a passive node with hop count H is chosen by anactive peer with hop count H as an unsuccessful update, since in our model Hsubsumes all hop counts h ≥ H of the synchronized nodes of the actual protocol.Like in the case of active nodes, the calculation of the probability of the successfulupdate in this special case requires knowledge of the actual distribution of nodeswith these hop counts, which again we do not know a priori.

6.5.3 Mean-field Limits

The probability nocN (m) of having no collision converges for N →∞:

noc(m) = limN→∞


(N − 3

N − 1

)m(0,l,h)·N−1

= e−2·m(0,l,h) .

For all the local transition probabilities the number of nodes N only appears inthe factor N/(N − 1), which has limit 1 for N → ∞. The limit probabilities arethus easily obtained by removing the factor N/(N − 1) from the expressions andby replacing nocN (m) by the above limit noc(m).

6.5.4 Comparison with Emulation Results

In [118], emulation is used to explore how GTP behaves in practice. For basicGTP a network of 1500 nodes was emulated on a single workstation, using localobject passing implementation for communication. One node is a time source,having hop count zero, all other nodes are not synchronized. The gossip delay isfixed and independent of the state of a node and set to 25 seconds. The maximumstandalone period is also set to 25 seconds.

Fitting our model to this scenario, we set the fraction of nodes being a timesource to 1/1500. We assume that one step in the model corresponds to onesecond in the emulation. This slightly overestimates the duration of a gossip in-teraction, which is reported to be in the sub-second range. The maximum gossipdelay is Gmax = 25 seconds, and since it is fixed we have G(h) = Gmax for anyhop count h. The maximum delay between two updates is L = 25 seconds. Themaximum hop count is chosen to be H = 15. A single node thus can assume26 · 26 · 17 = 11492 states. The time source fraction starts off with g = 12, thatis, it initiates a gossip interaction for the first time after 12 seconds. The unsyn-chronized nodes have remaining gossip delays uniformly distributed between 0and Gmax.

Figure 6.8(a) shows the evolution of the number of nodes that are aware ofthe time source over time. A node becomes aware of the time source existencewhen its hop count changes to a finite value. For the mean-field model we havemultiplied the fraction of nodes with a hop count smaller than ∞ by 1500 to


0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200

num

ber

ofnodes

t

mean-fieldemulation

(a) Discovering time source existence

0

2

4

6

8

10

12

0 800 1600 2400 3200 4000 4800

aver

age

hop

count

t

(b) Avg. hop count for different number of time sources

Figure 6.8: Comparison with emulation results (data taken from Figs. 6.1(a),6.5(b) in [118])

obtain the depicted curve. The curves of emulation and analytical model proceedclose to each other, both approaching 1500 after about 200 seconds, that is, afterabout 8 gossip cycles.

In Figure 6.8(b) we compare the evolution of the average hop count when thenumber of time sources is multiplied by 10 and 100, respectively. For the mean-field model, the average hop count is computed for synchronized nodes only, thusbeing a kind of underestimation as long as not all nodes are aware of the timesource existence. At the beginning, the average hop count increases faster in themean-field, however, mean-field model and emulation settle to similar values.The change in the average hop count depends logarithmically on the number oftime sources. Moreover, having the higher number of time sources result in thenonzero elements in the occupancy measure m, have values closer to each other.

Figure 6.9 finally shows the histogram of hop counts after the protocol stabi-lizes. For the mean-field curve we have taken the distribution at time t = 600,

0

50

100

150

200

250

300

350

0 5 10 15 20

num

ber

ofnodes

hop count

mean fieldemulation

Figure 6.9: Distribution of hop count (emulation results from Figure 6.6 in[118])


neglecting the fact that there are minor oscillations because of time source gossip.Taking this into account, and the fact that we are considering a fully connectednetwork in the mean-field model in contrast with a special peer sampling ser-vice in the emulations, there is a close match between the emulation and themean-field result.

Figures 6.8 and 6.9 document that the presented mean-field model capturesthe main features of basic GTP. The evolution of the hop counts in the consideredscenario is quite precisely represented. In the following we concentrate on themean-field model. We show further measures that were not considered in theemulation experiments in [118] and also evaluate the influence of varying thegossip delay.

6.5.5 More Properties of Basic GTP

We stick to the scenario of the previous section. Following Figure 6.9 wedepict in Figure 6.10(a) the evolution of the hop count distribution over the first10 minutes. Each curve corresponds to the fraction of nodes that have a hopcount of at most a given value. The distance between two curves corresponds tothe fraction of nodes that have exactly a given hop count, as shown for a hopcount of seven.

At the beginning, almost all nodes are unsynchronized, which results in thearea to the left of the graph. Gradually, the nodes acquire hop counts smallerthan∞. Since this change originates from the time source, hop counts differentfrom∞ are relatively small at first. Over time, the hop count distribution settlesto a quasi stable state, with – on average – higher hop counts than at the momentthe network got fully synchronized.

Figure 6.10(a) suggests that the hop count distribution reaches a stable state.This is not completely true, since there is a periodic disturbance whenever the

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600

fractionofnodes

t

hop count = 7

(a) Distribution of hop count

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

300 350 400 450 500

fractionofnodes

t

hop count = 0hop count = 1

hop count = 2

(b) Periodicity of hop count probabilities

Figure 6.10: Hop counts for constant gossip delay (25 seconds)


time source fraction gossips every 25 seconds. Figure 6.10(b) shows the fractionof nodes having hop count zero, one, or two for the time interval from 300 to500 seconds. The fraction of time sources (hop count 0) is constant since nonode with a hop count larger than zero can ever become a time source. Thefraction of nodes having hop count one oscillates: with each time source gossip,the fraction increases abruptly, decreasing gradually afterwards. Also for hopcount two, there is still a visible periodicity. The change is already very small, forhigher hop counts the periodicity effect wears off completely.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 100 200 300 400 500 600

per

node

t

interactionscollisionschangesupgrades

Figure 6.11: Interaction activity with gossip delay Gmin = Gmax = 25

Figure 6.11 depicts the interaction activity per node over the first 10 minutes.The nature of the mean-field model makes it necessary to state interactions per

node. Mapping back to the original scenario, multiplying the indicated valueswith 1500 results in the total number per second. Interactions are gossip attemptsby nodes which have reached hop count zero. The number stays constant due tothe fact that we have a constant gossip delay of 25 seconds. Only when the timesource fraction gossips, there is a small spike in the curve. A collision occurs ifthere is more than one attempt of a gossip interaction with a node. The numberof collisions is also constant, being a function of the number of interactions. Achange means that a node adjusts its clock and hop count to a different value.The hop count might be larger than before if the update has been been forced bythe last-update flag. At the beginning, changes are rare, since most interactionstake place between unsynchronized nodes, not leading to any updates. In thesynchronization phase, the number of changes increases and finally settles downto a stable level. Because of enforced updates there are always changes to beexpected. With upgrades we denote changes that actually lead to a better hopcount. Their number is of course smaller than the number of changes, settlingdown to a positive number as well: as long as there are “downgrading” changesbecause of enforced updates, there will also be upgrading interactions in thesequel.

Later in the chapter, we compare this graph to the mean-field results with the


0

0.2

0.4

0.6

0.8

1

600 700 800 900 1000 1100 1200

fractionofnodes

t

Figure 6.12: Distribution of hop countafter time source fails for constant gos-sip delay (25 seconds).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 20 30 40 50 60 70

fractionaw

areoftimesource

t

1 second

2 seconds

3 seconds

4 seconds

5 seconds

6 seconds

Figure 6.13: Synchronization speed fordifferent static gossip delays

other values of the gossip delays Gmin and Gmax in order to show the impactof different values on the protocol performance. Thus, the scale of the y-axis ischosen to be the same for the convenient comparison of these results.

Finally we show the behavior of the network when the time source fails. Wedo this by starting with a single time source, and removing it after 10 minutes(time t = 600). That is, by redistributing the fraction of nodes being a time source(i.e. with hop count 0) to the other states with hop counts > 0. Figure 6.12 showswhat happens to the hop count distribution in the following 10 minutes. In thisfigure, like in Figure 6.8(a), each curve represents the fraction of nodes thathave at most a certain hop count. The highest curve on the graph correspondsto the fraction of nodes with hop count at most H = 15, i.e., the fraction ofsynchronized nodes. As could be expected, nodes with a low hop count die outover time, leaving all nodes at the chosen maximum hop count of H = 15.

6.5.6 Different Static Gossip Delays

With the next graphs we want to clarify the influence of the gossip delay onthe performance of the complete network. In general, one expects that a highergossip delay leads to a lower synchronization speed. On the other hand, it alsoimplies less communication. The question we asked ourselves was: can a shortgossip delay lead to a slower synchronization than a longer gossip delay becauseof too many interactions and, subsequently, collisions?

Figure 6.13 shows the speed of synchronization for gossip delays betweenone and six seconds. In general, synchronization slows down with increasinggossip delay. But if nodes gossip every other second, that is, if the gossip delayis one, synchronization proceeds slower than for a delay of two, three, four orfive seconds. In this case, collisions impede a fast dissemination of the timinginformation through the network.


1e-05

0.0001

0.001

0.01

0.1

1

0 50 100 150 200

per

node

t

1 second2 seconds6 seconds

(a) Upgrading interactions

0

1

2

3

4

5

6

0 50 100 150 200

averagehopcount

t

1 second2 seconds6 seconds

(b) Average hop count

Figure 6.14: Varying the static gossip delay

Figures 6.14(a) and 6.14(b) further substantiate this insight. In 6.14(a), theupper set of curves depicts the total number of initiated interactions, and thelower set shows the number of interactions really leading to an upgrade. Eventhough with a gossip delay of one second the total number of interactions ishighest, the number of upgrades is lower than for a gossip delay of two seconds.Figure 6.14(b) documents that also the average hop count for a gossip delay ofone is higher than for a gossip delay of two.

6.5.7 Dynamic Adaptation of Gossip Delay

Our model allows us to dynamically adapt the gossip delay to the state of thesystem, that is, to its current hop count, via the function G(h). Gradual GTP alsooffers this possibility, thereby taking more parameters (not only the hop count)into account. In line with the description in [118] we want a node to gossipmore often if its time has “bad quality”. For our model this translates to a highhop count.

For this purpose we introduce a minimal gossip delay Gmin. The gossip delayof a node is then set to

G(h) =

Gmin, h =∞,Gmax − ⌊ h

H+1⌋ · (Gmax −Gmin), 0 ≤ h ≤ H.

Unsynchronized nodes initiate a gossip as often as possible while a time sourcedoes so as seldom as possible. For all other hop counts, the gossip delay spreadslinearly between Gmin and Gmax. Note that numerically a hop count of ∞ istreated like H + 1.

We compare three cases: the scenario considered so far with Gmin = Gmax =25 seconds (see Figure 6.11), a small range of gossip delays with Gmin = 15 sec-onds and Gmax = 35 seconds, and a large range with Gmin = 5 seconds andGmax = 45 seconds.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 6000

1

2

3

4

5

6

7

8

9

10

frac

tion

not

awar

eof

tim

esou

rce

aver

age

hop

count

t

Gmin = Gmax = 25

Gmin = 15, Gmax = 35

Gmin = 5, Gmax = 45

Figure 6.15: Synchronization speed and average hop count.

Figure 6.15 depicts both the synchronization speed (left set of curves) and theaverage hop count (right set of curves) in the first 10 minutes. Since nodes witha high hop count have more chances to upgrade if the range of the gossip delay isenlarged, the synchronization speed is increased as opposed to the static gossipversion. But this higher gossiping frequency of nodes with a high hop count alsoleads to a slightly increased average hop count.

Figures 6.16(a) and 6.16(b) compare the interaction activity of two scenarios.While for a static gossip delay the number of interactions is independent of thestate of the system (as shown in Figure 6.11), it highly depends on the stateif the gossip delay is computed dynamically. In the beginning, when most ofthe nodes are unsynchronized, there are many interactions, leading to fastersynchronization. In the long run, the activity pattern settles to similar valuesfor all three scenarios, with a slightly increasing number of interactions with

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 100 200 300 400 500 600

per

node

t


(a) Gmin = 15, Gmax = 35

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 100 200 300 400 500 600

per

node

t


(b) Gmin = 5, Gmax = 45

Figure 6.16: Interaction activity.


increasing range of gossip delay.

6.6 Conclusion

We have demonstrated that mean-field analysis is suitable for gossip proto-cols. The following premises enable mean-field analysis:

• there is a very large number of identically behaving nodes (symmetry prop-erty, Section 1.1);

• there are no central servers or global resources;• the behavior of a single node can be described in a local way;• the number of states of a single node is small in comparison to the number

of nodes;• transient measures (“at time t”) are of interest.

Extensions of the theory presented here would also allow for the incorporation ofa global memory, the failure or entering/leaving of nodes [48], the employmentof continuous-time models, and steady-state measures [45]. However, the mean-field approach does not allow for the evaluation of a centrally managed networkor the separate modelling of one single node.

In this chapter we considered two applications of gossip protocols: along withthe presentation of the necessary theory, we developed a simple information dis-semination model. The suitability of the mean-field approximation method wasshown by comparing the results obtained by analytically solving the resultingDTMC, by computing the mean-field limit, and by simulating the system. Themean-field method is applied to the bidirectional (push-pull) communicationmodel for the case of an application of gossip-based information dissemination,the shuffle protocol.

As a larger case study for a gossip protocol we derived a mean-field modelfor basic GTP. It includes the hop count metric and the constant gossip delay, andalso takes into account enforced updates due to the expiration of the standaloneperiod. We validated the fit of the mean-field model, matching it to emulationresults taken from [118]. Then we used the mean-field model to derive a varietyof interesting measures, also considering dynamically adjusted gossip delays.

With the conducted experiments we have shown how to successfully derive alarge variety of useful measures for basic GTP using a mean-field model. Whileemulation like in [118] requires the availability of suitable hardware and runsin real-time (20 minutes for most shown measures), the evaluation of the mean-field model for a given parameter setting is done in a couple of minutes.

7Automating the Mean-Field

for Dynamic Networks

In the previous chapter, we have investigated an abstraction method, calledmean-field method, for the performance evaluation of gossip protocols. Althoughthe protocols fit well the analytical framework we developed, the calculation ofthe transition matrix is a cumbersome and error-prone procedure. The naturalsolution is to automate the process. Furthermore, while the mean-field analysisis well-established in epidemics and for chemical reaction systems, it is rarelyused for communication networks because a mean-field model tends to abstractaway the underlying topology.

To represent topological information, however, we extend the mean-fieldanalysis with concepts of classes of states and multiplicities in the state tran-sitions. At the abstraction level of classes we define the network topology bymeans of connectivity between nodes. This enables us to encode physical nodepositions and model dynamic networks by allowing nodes to change their classmembership whenever they make a local state transition. Based on these exten-sions, we derive and implement algorithms for automating a mean-field basedperformance evaluation.

7.1 Introduction

We consider dynamic networks with pairwise communication between nodes.The mean-field method allows us to evaluate systems with very large numbers ofnodes, that is, systems of a size where traditional performance evaluation meth-ods fall short. However, as we have already seen, the calculation of a transitionprobability matrix is often a cumbersome and error-prone procedure. In addi-tion, the mean-field framework does not offer the possibility to model dynamicnetworks or topological information.

Thus, we now focus on the automation of mean-field analysis, and demon-strate how a network with pairwise communication between nodes can be mod-

117

118 Chapter 7 — Automating the Mean-Field for Dynamic Networks

elled and analyzed. For this we explore the mean-field method in two directions.(1) We introduce a notion of classes (of states) of nodes that can represent

group membership or topological information. By changing their states, weallow nodes to change their class membership. This enables us to modelmobile networks, where nodes are moving around.

(2) We introduce multiplicities in the state transitions such that nodes can becreated and removed. This extension is important for modelling of dynamicnetworks with node departures and arrivals.

To this end, we identify a class of stochastic systems of nodes for which the con-struction of the mean-field model can be automated. Using these results, wehave automated the mean-field method, which allows us to model large dynamicnetworks. Given as user input the probabilities of communication between dif-ferent classes of nodes, and an initial distribution of nodes, a new tool calculatesthe evolution of the system in discrete time. Apart from that, the tool allows themean-field method to be used by a broad community, ranging from the nonexpertoutside of academia to the expert in mean-field theory. The tool is suitable for thenonexpert since it releases the user from the burden of the details of mean-fieldtheory. For the expert, the tool is of interest since carrying out mean-field analysisrequires the cumbersome, error-prone computation of probabilities that describethe complete behavior of the nodes. Moreover, the tool can easily be extended toinclude other aspects of mean-field analysis (some of which we mention in thelast section of the chapter). However, it is important to stress is that this workis not about building the tool, but aims at developing the mean-field frameworkfor dynamic gossip networks.

7.2 Related Work

The idea of different classes of nodes for mean-field approximation emergedfrom the mean-field model of GTP, presented in Chapter 6. The model distin-guishes classes of nodes with respect to communication behavior. Moreover,nodes adapt their communication behavior depending on their state, and bychanging the state, nodes can change their class membership.

The multiple classes approach has also been used by [4, 33, 59, 185]. Inthese works, the authors give a mean-field approximation in the form of differ-ential equations, whereas we observe the evolution of the system in discrete timebased on matrix-vector multiplication. We focus on the automation of the gener-alized mean-field method for dynamic gossip networks. The main advantage ofthis automated framework over the equation-based solutions is that the frame-work allows to change a format of states of nodes (a tuple) without the need tomanually derive a corresponding set of equations.

A mean-field related technique has been developed in [62, 107, 110], sup-ported by a tool in [168]: an ODE-approximation for a specific class of continuous-time process algebra models (PEPA). In contrast to their approach, we employ adiscrete-time model. Whether to choose continuous or discrete time of course

7.3. Modelling Framework 119

depends on the system to be modelled. Moreover, the mean-field modelling ofthis chapter allows for time-dependent events and discrete time changes of nodestates and network topology, cf. Section 7.5.2. These are crucial ingredients ofcommunication networks, but fall outside of the capabilities of an ODE-basedanalysis.

The tool automates mean-field based performance evaluation for dynamicgossip networks, an indispensable step to facilitate the mean-field analysis in thearea of communication systems.

7.3 Modelling Framework

As setup for the subsequent sections, we describe the class of models of com-munication networks for which we automate the mean-field method. We con-sider large networks of nodes interacting in a peer-to-peer or gossip fashion. Thenodes are characterized by discrete-time stochastic processes

On(τ) | τ ∈ N, n ∈ N,

with N = 1, . . . , N, where N is the network size, n is the node identity, and τis the (global) discrete time. In other words, the nodes interact in a round-basedfashion. The values of On(τ) are taken from a finite sample set Ω, called (local)

state space.

internal connectivity 0.2

internal connectivity 0.4internal connectivity 0.9

connectivity 0.1connectivity 0.01

connectivity 0.05

A

B

C

Figure 7.1: An example of a clustered network.

We classify local states s ∈ Ω of nodes into one or more different groups,called ‘classes’. A class C is a collection of states of nodes, C ⊆ Ω. The classes ofnodes can represent group membership, topological information, a combinationof several aspects, etc. We impose no restriction on the formation of these classes,e.g., states are permitted to be in multiple classes. We have chosen for ‘classes


of local states’ over ‘classes of nodes’ since it enables the modelling of mobilenetworks with a changing topology; nodes changing their local states may changetheir class memberships. An example topology based on three classes is shownin Figure 7.1.

Example 7.1. Figure 7.1 depicts a network consisting of three clusters (classes)

A, B and C. Each cluster in the network has a uniform link probability between

any member of the cluster, and there is a specified probability of a communication

between the clusters A, B and C. Namely, nodes of cluster A interact with nodes of

cluster B with probability 0.01; however, nodes of cluster A interact with each other

with probability 0.9, and nodes of cluster B with probability 0.2.

user user

base station

user user

base station

movementcommunication

Figure 7.2: An example of a small mobile network.

s1 s2 s3

s4

C1 C2 C3

C5

C4

communication

Figure 7.3: Modelling mobile networks using classes of states.

Example 7.2. Figure 7.2 illustrates an example of a small mobile network. The

network consists of three classes of nodes, two users and one base station. One of

the users is mobile, and moves around. The base station can communicate with both

users at all times, but the communication between the users depends on the distance

between them; in the lower half of Figure 7.2 interaction between the users directly

is not possible.

7.3. Modelling Framework 121

This scenario is encoded in the framework by means of states and classes, as

shown in Figure 7.3. For the mobile user, we distinguish two states s2 and s3,

representing its position. Two other states, s1 and s4 represent the non-mobile user

and the base station, respectively. All states are divided into different classes, as

shown in Figure 7.3. The station in class C4 communicates with both users C5 =s1, s2, s3, irrespective of the position of the mobile user. The mobile user can

undergo the state transitions s2 → s3 and s3 → s2, that express the user movement.

By moving from s2 to s3, the mobile user changes its class from C2 to C3, and vice

versa. The stationary user in state s1 ∈ C1 cannot communicate with the mobile

user, when the latter is in the state s3 ∈ C3. However, this communication is possible

if the mobile user is in state s2 ∈ C2.

We introduce the following terminology. A node is called active if it initiates acontact at the current time step, and passive, otherwise. Moreover, a node is idle

if it is passive, and is not currently contacted by any node. On the contrary, anode is nonidle if it is active, or a node attempts to contact it.

Definition 7.1. Let S be a set. We use Distr(S) to denote the distributions over

S, that is, the set of functions µ : S 7→ [0, 1] such that∑

s∈S

µ(s) = 1. Moreover,

by Distr≤1(S) we denote the subdistributions over S, that is, the set of functions

µ : S 7→ [0, 1] such that∑

s∈S

µ(s) ≤ 1.

The behavior of the network is defined by the following four functions (spec-ified by the user):

• contacts: the network topology,• κ: state transition for successful communication,• φ: state transition for unsuccessful communication, and• ǫ: state transition for idle nodes.

The function contacts defines the topology of the network. The nodes periodi-cally interact with each other; at each time step every node picks a communica-tion partner. The choice of the partner is governed by a probability distribution:

contacts : Ω→ Distr≤1(P6=∅(Ω)) ,

which determines the probability that a certain node contacts different classes ofnodes. More precisely, for any state s ∈ Ω, contacts(s) is a subdistribution

µ : P6=∅(Ω) 7→ [0, 1]

meaning that a node in state s contacts with probability µ(C) a node from classC 6= ∅. Then the partner is chosen uniformly at random from all nodes that arein a state in this class C.


The probability of a node in state s ∈ Ω being active is:

Pact (s) =∑

C⊆Ω, C 6=∅

contacts(s)(C)

Then 1− Pact(s) is the probability of a node in state s to be passive.We explain this distribution in the following example.

Example 7.3. Assume that Ω = s1, s2, s3, and consider a network with 10 nodesin state s1, 10 in state s2, and 20 in state s3. Let the contact distribution for s1 be

defined as:

contacts(s1) = s2, s3 7→ 0.3, s1 7→ 0.5

and we tacitly assume C 7→ 0 for all other classes C ⊆ Ω.

This means that a node in state s1 contacts with probability 0.3 a node in state s2

or s3 (uniformly at random), with 0.5 another node in state s1, and is idle (not

initiating communication) with the remaining probability 0.2. Hence,

Pact(s1) = 0.8.

Assume that a node in state s1 decides to contact the class s2, s3. The choice of

the communication partner inside this class is uniformly at random, that is, everynode among the 30 nodes in state s2 or s3 has an equal probability of being picked.

Since there are 10 nodes in state s2 and 20 nodes in state s3, with probability 13 the

choice will fall on a node from s2, and with 23 on s3.

Remark. For simplicity, we will refer to “a node in state s” as “a node in s”.

When a node in s (attempts to) contact a node in t, then depending on thesuccess of the contact and the state t of the peer, both nodes stochastically changetheir states in a manner that depends on s and t. By moving from one state toanother, the nodes may change their class membership, which can be used toencode dynamic networks.

Since nodes pick their communication partner randomly, it can happen thatmultiple nodes choose the same partner simultaneously. Moreover, the chosentarget can itself be active. In this case, the result of the interactions depends ontheir sequential order. In this work, we handle the cases when a node is involvedin multiple interaction attempts as collisions. That is, a node has a collision eitherif (a) it is contacted at least twice, or (b) it is active and contacted at least once.A communication fails whenever there is a collision either in the source or inthe target. In other words, considering the pairwise communication betweenthe nodes as a directed graph, then every isolated edge, i.e., strongly connectedcomponent with one edge, is a communication success, and edges that are notisolated amount to communication failures. Such collision models are used, forexample, in the setting of wireless networks [117].

7.4. Mean-field Modelling 123

In the framework, we distinguish between the state transition of idle nodesand nodes with a collision. The function κ defines the state transition of nodesfor the case of successful communication,

κ :Ω×Ω→ Distr(Ω× Ω).

If a node in s (successfully) interacts with a node in t, then κ(s, t) defines thedistribution of the next states of both nodes, that is, κ(s, t)(s′, t′) is the probabilityof s′ and t′ being the successor states of s and t, respectively.

The function ǫ describes the state transition for idle nodes,

ǫ : Ω→ Distr(Ω).

ǫ(s)(t) is the probability of an idle node in s to progress to the next state t.The function φ describes the state transition for nodes whose communication

fails due to a collision,φ :Ω→ Distr(Ω),

That is, if a node in s is either the source or the target of a failed communication,then φ(s)(s′) is the probability that the node moves to the state s′.

7.4 Mean-field Modelling

In this section, we describe the theory behind the tool, and illustrate thepresented framework for two case studies.

7.4.1 Introduction and Notations

The format of the states of Ω, be it an integer, a tuple or another data struc-ture, is irrelevant to the computations presented in this section. Therefore, wesimply identify states by their indices (with respect to a fixed total order), i.e.,0, . . . , l − 1, where l is the number of different states.

Normally, the complete system is composed of the N nodes as a discrete-timestochastic process ZN (τ) = (O1(τ), . . . , ON (τ)). The size of this state space is|Ω|N = lN . However, in case of the mean-field approximation, nodes that are inthe same state are indistinguishable. Thus, we can describe the overall state attime τ with a vector

∆(τ) = (∆0(τ), . . . , ∆l−1(τ)),

where an entry ∆i(τ) is the fraction of nodes in state i. This discrete-time vec-tor is called the occupancy measure, and the size of the state space ΩN

∆ of this

“occupancy process” is(l+N−1

l−1

). That is, there are

(l + N − 1

N

)=

(l + N − 1

l − 1

)


ways to combine N (not necessarily distinct) nodes in l states. The evolution ofthe system is then described by the local transition probabilities of each node:

MN∆ (t, s) = PrON

n (τ + 1) = t | ONn (τ) = s, ∆(τ) = ∆

In other words, the next state t ∈ Ω of a node depends on its current states ∈ Ω and on the current occupancy measure ∆. These transition probabilitiesform the transition probability matrix MN

∆(τ).

7.4.2 Transition Probabilities and Matrix Calculation

We now describe how the state transition probability matrix MN∆(τ) can be

computed based on user-input probabilities, presented above. In order to com-pute the matrix, we assume ∆(τ) to be the current state distribution, and simplywrite∆.

We first compute the probability that a given node in s contacts any node int. Recall that a node in s ∈ Ω picks a communication partner from class C ⊆ Ωwith probability contacts(s)(C). It does so uniformly from the nodes that arecurrently in one of the states of C. For every pair of states s, t ∈ Ω and currentdistribution ∆, the probability that a node in s contacts a node in t is:

contact(s, t) =∑

C⊆Ω, t∈C

contacts(s)(C) · ∆t

∆C

where ∆C =∑

u∈C ∆u for any set of states C ⊆ Ω. Given the current distribu-tion ∆, the expected fraction of nodes that contact the state s ∈ Ω:

ξ(s) =∑

t∈Ω

∆t · contact(t, s)

Example 7.4. Consider the occupancy measure ∆ = (∆s, ∆t) = (0.4, 0.6). Let

contact(t, s) = 0.5 and contact(s, s) = 1. Then the number of nodes to contact s is

ξ(s) = 0.6 · 0.5 + 0.4 · 1 = 0.7,

that is 70% of all nodes in the network will contact nodes in s.

For a node in s ∈ Ω, the probability of not being contacted equals

Pc0 (s) =

(1− 1

N ·∆s

)ξ(s)·N

, (7.1)

if ∆s 6= 0, and Pc0 (s) = 0, otherwise.a Here, ξ(s) ·N is the expected number of

nodes that contact s, N ·∆s is the total amount of nodes in s, and 1− 1

N ·∆s

is

the probability that nodes that contact s do not choose a given node in s.

aThis guarantees that the probability PcolInc(s), defined later, is equal to 1 for ∆s = 0, that is,contact attempts amount to collisions.


Similarly, the probability of a node in s to be contacted once equals

Pc1 (s) =

(ξ(s) ·N

1

)· 1

N ·∆s

·(

1− 1

N ·∆s

)ξ(s)·N−1

=ξ(s)

∆s

·(

1− 1

N ·∆s

)ξ(s)·N−1

, (7.2)

if ∆s 6= 0, and Pc1 (s) = 0, otherwise.b Here,1

N ·∆s

is the probability that a

given node in s is contacted by nodes that contact s. Thus, Pc1 (s) expresses thatonly one of ξ(s)·N nodes contacts a given node in s and the remaining ξ(s)·N−1nodes that contacted the class s did not choose the node.

The probability that a node in s ∈ Ω is idle, i.e., neither initiating contact norbeing contacted, is:

Pidle(s) = (1 − Pact (s)) ·Pc0 (s)

There are two probabilities of communication failure. Firstly, we have to takeinto account the active nodes that are contacted themselves and the non-activenodes that are contacted more than once. The sum of both renormalized to thenonidle nodes yields the probability that a nonidle node in s has a collision:

PcolNonIdle(s) =(Pact (s) · (1− Pc0 (s))

+ (1− Pact(s)) · (1− Pc0 (s)− Pc1 (s)))/(1− Pidle(s)),

if Pidle(s) 6= 1, and PcolNonIdle(s) = 0, otherwise.Secondly, the probability that a communication with a target in s ∈ Ω failed

due to a collision at the target is:

PcolInc(s) = 1− Pc1 (s) · (1− Pact(s)) ·∆s ·Nξ(s) ·N = 1− Pc1 (s) · (1 − Pact (s)) ·

∆s

ξ(s),

if ξ(s) 6= 0, and PcolInc(s) = 1, otherwise. We briefly explain the derivation ofthis formula. The probability of a node in s to be contacted without collision in sis Pc1 (s) · (1−Pact(s)), that is, the probability of being contacted once multipliedwith the probability of being passive. Then Pc1 (s) · (1 − Pact (s)) · ∆s · N is theexpected number of nodes in s that are contacted without collision in s. As thecommunication is pairwise, this number coincides with the number of nodes thatcontact a node in s without collision in the target. Dividing this number by theexpected number of nodes contacting s, ξ(s) ·N , we obtain the probability thata contact with target s does not have a collision in the target.

bAgain, this guarantees that the probability PcolInc(s), defined later, is equal to 1 for ∆s = 0,that is, contact attempts amount to collisions.


s′

c

s

ab

node has a collision

peer has a collision

Figure 7.4: The three causes of communication failure.

Using these probabilities of communication failure, we can derive the proba-bilities of successful communication. The probability of a node in s ∈ Ω to talkto a node in t ∈ Ω, without a collision at t, is

PtalkOkTgt(s, t) = contact(s, t) · (1− PcolInc(t))

Then, the probability of a successful contact, i.e., there is no collision at the targetnor at the source, equals

PtalkOk (s, t) = PtalkOkTgt(s, t) · Pc0 (s).

The three causes for communication failures are displayed in Figure 7.4:(a) a nonidle node in s has a collision,(b) a node in s′ is contacted and the source of the contact has a collision, and(c) the target in s′ of an active node has a collision.

Finally, we describe the calculation of the transition probability matrix MN∆ .

Let MN∆ (t, s) be the entry in the matrix at the crossing of the row t and the

column s. For all nodes in states s, t ∈ Ω,

MN∆ (t, s) = Pidle(s) · ǫ(s)(t)+∑

s′ ∈Ω, t′ ∈Ω

PtalkOk (s, s′) · κ(s, s′)(t, t′)

+∑

s′ ∈Ω, t′ ∈Ω

PtalkOk (s′, s) · κ(s′, s)(t′, t) · ∆s′

∆s

(7.3)

+ (1− Pidle(s)) ·PcolNonIdle(s) · φ(s)(t)

+∑

s′ ∈Ω

(PtalkOkTgt(s′, s)− PtalkOk (s′, s)) · φ(s)(t) · ∆s′

∆s

+∑

s′ ∈Ω

(contact(s, s′)− PtalkOkTgt(s, s′)) · Pc0 (s) · φ(s)(t)

Note that we take ∆s′/∆s = 0, if ∆s = 0. The matrix consists of the six sum-mands that express the corresponding probabilities of:


(i) the state transition from s to t, if a node in s is idle;(ii)–(iii) the state transitions after the successful contact;(iv)–(vi) cover the cases (a)–(c) of interaction failure depicted in Figure 7.4,

namely,

(iv) the state transition from s to t, if node in s is nonidle and acollision occurs;

(v) the state transition of nodes in s contacted by nodes in s′, if acollision occurs at the source nodes in s′; and

(vi) the state transition of active nodes in s whenever a collisionoccurs at the target nodes in s′.

7.4.3 Mean-field Convergence

The mean-field approximation captures the asymptotic behavior of the com-plete system if the network size N tends to infinity. Namely, the occupancy mea-sure ∆ for large networks converges to a deterministic limit, if the followingis satisfied: for all local states s, t ∈ Ω, all ∆ ∈ ΩN

∆ and assuming the initialdistribution ∆(0),

if N →∞, MN∆ (t, s) converges uniformly in ∆ to M∆(t, s), which is a

continuous function of ∆.

If N tends to infinity, for each local state i the fraction ∆i(τ) of nodes in state iat time τ converges to a deterministic limit. This requirement is clearly satisfiedfor the matrix MN

∆ , since the dependence on the network size N vanishes in thelimit for all transition probabilities. Notably, when N →∞, (7.1) and (7.2) taketheir limit in forms

Pc0 (s)→ e−ξ(s)/∆s , Pc1 (s)→ ξ(s)

∆s

· e−ξ(s)/∆s

if ∆s 6= 0, and 0 otherwise.

Theorem 7.1 (cf. [81]). Fix the initial occupancy measure: ∆(0) = δ(0); define

the limit of the local probability matrix:

M∆ = limN→∞

MN∆ , ∆ ∈ ΩN

∆ .

Define the deterministic discrete time process

δ(τ + 1) = Mδ(τ) · δ(τ).

Then for ∀τ ∈ N,

limN→∞

∆(τ) = δ(τ), with probability 1.

In other words, δ(τ) is the deterministic limit occupancy measure at time τ for

N →∞.


In Section 7.6, we show that the convergence theorem holds even for the moregeneral setting of dynamic networks.

7.5 Cases

This section presents two case studies, an analysis of a clock synchronizationprotocol GTP and of a viral spread in a mobile network. Revisiting the clocksynchronization study serves two purposes; first, the comparison highlights theadvantages of an automated approach, and second, it verifies the results of au-tomation against the previously obtained results of Chapter 6 and [118].

7.5.1 Case Study on Clock Synchronization

The first case study illustrates that the tool can cope with large dense ma-trices. Moreover, it shows a nontrivial encoding of the states of nodes for themean-field analysis. Using the model of the clock synchronization protocol GTPfrom Section 6.5, we compare the results obtained by the tool with the results ofthe protocol emulation from [118].

According to the protocol, the nodes are equipped with local clocks. At leastone node (time source) has the accurate time. Nodes periodically interact withrandomly chosen peers, gossiping their clock samples, and update the clocksdepending on the quality of the sample, measured according to the distanceto the time source on a synchronization path. In Section 6.5, the state of anode is defined as a triple (g, l, h) of local parameters. The delay g defines aperiod of interaction of the node. The counter l defines a period of the enforcedsynchronization. The hop count h shows the distance to the time source on thesynchronization path, and is a metric of the sample quality. A node with h = ∞is unsynchronized. The state space of a node is

Ω = 0, . . . , Gmax × 0, . . . , L × 0, . . . , H,∞

of the size |Ω| = (Gmax+1)(L+1)(H+2). The occupancy measure is the fractionof nodes in state (g, l, h).

For the mean-field analysis, we consider the following values of Gmax = 25,L = 25, H = 15. The initial distribution of the time sources ∆(12,L,0) is 1

1500 in

Figure 7.5 and 11500 , 10

1500 , 1001500 in Figure 7.6. All other nodes are unsynchronized

and have remaining gossip delays uniformly distributed between 0 and Gmax.We compare the mean-field results with the experimental results from [118].

The latter results were obtained by emulating a network of 1500 nodes thatexecute GTP, on a single workstation. We assume one step in the GTP model tobe one second in the emulations.

Figures 7.5 and 7.6 show the results of two experiments: the evolution ofthe hop count distribution over the first 600 seconds, and the evolution of theaverage hop count for the different number of time sources, respectively.

7.5. Cases 129

0 200 400 6000.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

hop ≤ 5

hop ≤ 6

hop ≤ 7

mean-field

emulations

time

fract

ion

ofnodes

Figure 7.5: Distribution of the hop count (automated mean-field results) andfraction of the synchronized nodes for the network size 1500 (emulation results)in GTP

In Figure 7.5, each mean-field curve shows the fraction of nodes that have atmost a certain hop count. All but one node are initially unsynchronized. By com-municating to the time source, first some of the nodes obtain small hop counts.The hop count distribution then gradually increases, and finally converges to-wards the stable state. The local maxima in the hop count distribution before thefinal stable state is due to the forced update, governed by the parameter l. Nodesthat have not synchronized for 25 seconds are forced to update their hop countif they interact with a synchronized node. Thus, as more nodes become synchro-nized, the forced updates become more frequent. For the small hop counts suchas 1, this update leads to the increase of the hop count. For more details on theeffect of the forced updates, we refer to Sections 6.5.5.

The highest curve corresponds to the fraction of nodes with the hop countat most H = 15, that is, the fraction of synchronized nodes. The curve of theemulation results shows the fraction of nodes that are synchronized, and thus,corresponds to the mean-field curve of at most hop count 15.

Figure 7.6 shows the average hop count for 1, 10 and 100 time sources. Therespective curves for the mean-field (gray) and emulation (black) results settle tosimilar values. As one can observe, the average hop count decreases with largernumber of time sources.

7.5.2 Case Study on Mobile Networks

The purpose of this case study is to show how the framework can be used toencode mobility of nodes. We consider the spread of a (fictive) virus betweenthree villages A, B and C.


0 600 1200 1800 2400 3000 3600 4200 48000123456789

101112

time

hop

count

(avg)

Figure 7.6: Average hop count for different number of time sources in GTP (pro-duced by automated mean-field framework)

Every person (node) can be either healthy or infected. Additionally, everyperson has a parameter resistance with a value in 0, . . . , 20 (initially, 0). Forthe infected population, the resistance to the virus is increased by 1 at everystep. When it reaches the maximum value, infected people recover and remainhealthy afterwards. Note that immunity is a requirement of mean-field conver-gence result for epidemics (cf. [81]); however, this condition is not necessary forthe framework. Whenever an infected person successfully communicates witha healthy person with resistance lower than 20, the healthy person becomes in-fected. The connectivity among the villages is defined as follows:

contacts(A) = A 7→ 0.1, B 7→ 0.005contacts(B) = B 7→ 0.1, A, C 7→ 0.005contacts(C) = C 7→ 0.1, B 7→ 0.005

The mobility is introduced by a “ship” S that commutes in-between the vil-lages A and C. The ship “moves” according to the parameter position, with avalue in S0, . . . , S5 (see Figure 7.7). If the ship is in position S0 or S3, it indi-cates that the ship is docking at village A or C, respectively. The ship positionsS1 and S5 represent the same location, but indicate the movement in oppositedirections; likewise for positions S2 and S4.

The connectivity of the ship thus depends on its position. Let Sp denote a shipin position p, and define:

contacts(Sp) =

A 7→ 1, if p = 0,

C 7→ 1, if p = 3,

, otherwise.

7.5. Cases 131

Figure 7.7: Commuting ship

People that get on and off the ship are represented as follows. If a personfrom the ship interacts with a person from a village, then they exchange theirhealth/infection and resistance properties. Thus we simulate that one personleaves the ship, while the communication peer enters the ship.

0 10 20 30 40 50 60 70 80 90 1000.0

0.1

0.2

0.3

village Avillage Bvillage Cship S

time

infe

cted

nodes

fract

ion

of

Figure 7.8: Infection spread, mean-field results.

Figure 7.8 shows the evolution of the infected people over time at the differ-ent classes of people. All villages have an equal number of inhabitants, and theship can carry up to half of the size of any village. Initially, 50% of the popula-tion of village C is infected; the remaining population is healthy. For village C,the corresponding curve starts with the highest value. Over time, the infectionspreads to the next village A, since it is connected to C via the ship S. The infec-tion spreads fast on the ship, due to its small capacity, increasing the probabilityof infecting people in the village A. At the same time, occasional travellers be-tween the villages make it possible for the virus to spread onto villages B andC. At some point, infected people with a value of resistance 20 recover from thevirus. Hence, the fraction of infected people decreases and approaches zero after90 steps.


Comparison with Monte Carlo Simulations

We compare the mean-field results with Monte Carlo simulations; Figure 7.9show simulation results of the infection spread with 100 people travelling by theship, and 200 people in each of the three villages. Every line in the figure is theaverage of 1000 simulation runs (each 100 time steps). Note the accuracy (i.e.,resemblance between Figure 7.8 and Figure 7.9) of the mean-field approximation(Figure 7.8) even for the system of moderate size.

0 10 20 30 40 50 60 70 80 90 1000.0

0.1

0.2

0.3

village Avillage Bvillage Cship S

time

infe

cted

nodes

fract

ion

of

Figure 7.9: Monte Carlo simulations with 100 people on the ship and 200 peopleper village

7.6 Framework for Dynamic Networks

So far, the number of nodes in the network has been invariant over time. Thissection extends this to the dynamic creation and removal of nodes.

7.6.1 Introduction and Notations

We generalize the semantics of the occupancy measure ∆. Up to now, ∆s hasbeen the fraction of nodes in state s, and consequently

∑

s∈Ω

∆s = 1.

We relax this requirement, and henceforth interpret ∆s as the number of nodesin state s relative to the initial distribution ∆(0).

We use NS to denote the multisets over S, that is, the set functions

S 7→ N.

Accordingly, we generalize the functions κ, ǫ and φ to include, along with proba-bilities, “multiplicity”:

κ : Ω× Ω→ Distr(NΩ), (7.4)

7.6. Framework for Dynamic Networks 133

such that for all states s, t ∈ Ω, the domain of κ(s, t) is finite. For every pair ofcommunicating nodes there is only a finite number of possible outcomes. Here,the finiteness ensures the Lipschitz condition in Lemma 7.1 (below).

If two nodes in state s and state t communicate successfully, then κ(s, t)(µ)is the probability that the communicating nodes in s and t will transform intoa multiset of nodes in the states defined by µ, that is, for every state u ∈ Ω weobtain µ(u) nodes in u.

Example 7.5. κ(s, t)(s 7→ 3, u 7→ 1) = 0.5 means that with 50% probability a

successful communication of nodes in s and t will result in 3 nodes in s, and 1 in u.

Note that, for the multisets, we suppress entries with multiplicity 0, e.g., t 7→ 0.

Similar to κ, we also generalize the following distributions:

ǫ : Ω→ Distr(NΩ) (7.5)

φ : Ω→ Distr(NΩ) (7.6)

Again, we require that for every s ∈ Ω, the distributions ǫ(s) and φ(s) have afinite domain.

Example 7.6. ǫ(s)(s 7→ 1, t 7→ 2) = 0.3 means that with 30% probability an idle

node in s will transform into one node in s, and two nodes in t.

Since the evolution of a very large system can be expressed by the equation ofδ(τ) (as described in Section 7.4.3), the expected number of states after a statetransition can be obtained using the corresponding multiplicity.

We generalize the calculation of the transition matrix MN∆ . The computa-

tion of contact , Pidle , PtalkOk , PcolNonIdle and PtalkOkTgt remain as described inSection 7.4. For all nodes in states s, t ∈ Ω we compute MN

∆ (t, s) as shown inFigure 7.10. We assume ∆s′/∆s = 0, if ∆s = 0.

The crucial difference with the matrix (7.3) from Section 7.4 is that alongwith the probabilities, we take into account the corresponding multiplicities tocalculate the expected number of nodes in state t after the state transition. Forexample,

∑

µ ∈NΩ

ǫ(s)(µ) · µ(t)

expresses the expected number of nodes in t after an idle transition of a node ins.

The second difference with the matrix (7.3) in Section 7.4 is the following.For successful communication of two nodes, the function κ, defined in (7.4), ex-presses the resulting distribution without distinguishing which of the two nodesprogresses to which state. This simplifies the matrix in so far that, in comparisonwith (7.3), the third summand is dropped. In Figure 7.10, the second summand‘takes care’ of the whole state transition.


MN∆ (t, s) = Pidle(s) ·

∑

µ ∈NΩ

ǫ(s)(µ) · µ(t)

+∑

s′ ∈Ω

PtalkOk (s, s′) ·∑

µ ∈NΩ

κ(s, s′)(µ) · µ(t)

+ (1− Pidle(s)) · PcolNonIdle(s) ·∑

µ ∈NΩ

φ(s)(µ) · µ(t)

+∑

s′ ∈Ω

(PtalkOkTgt(s′, s)− PtalkOk (s′, s)) ·

∑

µ ∈NΩ

φ(s)(µ) · µ(t)

· ∆s′

∆s

+∑

s′ ∈Ω

(contact(s, s′)− PtalkOkTgt(s, s′)) · Pc0 (s) ·

∑

µ ∈NΩ

φ(s)(µ) · µ(t)

Figure 7.10: Matrix computation for dynamic networks

7.6.2 Mean-field Convergence Revisited

To ensure that Theorem 7.1 holds for the generalized setting we show that allrequired conditions are valid.

Lemma 7.1. The following conditions hold for MN∆ :

C1. For all time steps τ , MN∆(τ) is independent of the network size N in the limit.

C2. The number of possible state transitions per time step is bounded.C3. For all time steps τ , MN

∆(τ) satisfies the Lipschitz condition.

Proof. Condition C1 is satisfied for MN∆ , since all local state transition functions

are independent of the network size N , and for the contact probabilities Pc0 (s)and Pc1 (s) the dependence vanishes in the limit, as shown in Section 7.4.3.

Conditions C2 and C3 hold, since there are only finitely many pairs of states,each of which has finitely many possible transitions per time step; see (7.4),(7.5), (7.6) where the domain of the distribution functions is required to befinite. For condition C3 note that the maximum multiplicity is bounded for alllocal transitions and this yields a global bound since there are only finitely manypossible transitions.

Hence, Theorem 7.1 holds for the generalized setting. That is, for the initialdistribution ∆(0) = δ(0) and the limit

M∆ = limN→∞

MN∆ and δ(τ + 1) = Mδ(τ) · δ(τ),

the occupancy measure converges towards the mean-field limit:

limN→∞

∆(τ) = δ(τ), with probability 1.

7.6. Framework for Dynamic Networks 135

7.6.3 Case Study on Dynamic Networks

The current example illustrates the dynamic creation and removal of nodes.We consider the scenario inspired by the well-known Lotka-Volterra equations[150] which model a biological system. Two species, predator A and prey B,inhabit environment E . They periodically interact with each other, which resultsin the following population dynamics.

Prey feeds from the environment, that is, contacts(B) = E 7→ β, and theirpopulation grows whenever they eat:

κ(B, E) = (α, (2,B), (1, E)), (1− α, (1,B), (1, E)).

Predators hunt prey: contacts(A) = B, E 7→ 1; they may find prey B ornothing E . In case of a successful hunt, the population of the predators growswhile the prey is eaten:

κ(A,B) = (λ, (2,A), (0,B), (1− λ, (1,A), (0,B).

For a nonsuccessful hunt

κ(A, E) = (1− ν, (1,A), (1, E)), (ν, (0,A), (1, E)),

that is, the predators die naturally. Likewise, idle predators may starve to death,or a predator dies as the result of a fight (collision) between the two:

ǫ(A)(A) = φ(A)(A) = (1− ν, 1).

The environment stays invariant ǫ(E)(E) = φ(E)(E) = (1, 1). Likewise we definefor prey: ǫ(B)(B) = φ(B)(B) = (1, 1). All other transition or contact probabilitiesare 0.

0 200 400 600 800 10000.0

0.1

0.2

0.3

0.4

0.5

preypredators

time

fract

ion

ofnodes

Figure 7.11: Predator and Prey; mean-field results

Figure 7.11 shows the results of mean-field analysis using the following pa-rameters: α = 0.04, β = 0.5, λ = 0.5, and ν = 0.05, and initial distribution∆E = 0.8, ∆B = 0.14, and ∆A = 0.06. We can clearly see that the predator-prey system undergoes a simple harmonic motion, with the population sizes ofpredator and prey fluctuating.


Comparison with Monte Carlo simulations

We compare the mean-field results with Monte Carlo simulations. Figures 7.12and 7.13 show simulation results of the predator-prey system with 100 predatorsand 300 prey, and 500 predators and 1500 prey, respectively. Although the mean-field approximation is precise only for large populations, we have a surprisinglyclose match already for the simulation with only 100 predators. For the systemwith 500 predators, the match is nearly perfect and the standard deviation stayssmall over the whole period of 1000 steps.

0 200 400 600 800 10000.0

0.1

0.2

0.3

0.4

0.5

preypredators

time

fract

ion

ofnodes

Figure 7.12: Monte Carlo simulations with 100 predators and 300 prey

0 200 400 600 800 10000.0

0.1

0.2

0.3

0.4

0.5

preypredators

time

fract

ion

ofnodes

Figure 7.13: Monte Carlo simulations with 500 predators and 1500 prey

When dealing with simulations, there is of course always a small probabilitythat one of the species will become extinct. The graph in Figure 7.14 depicts theprobability that one of the species will become extinct within a period of 1000steps in dependence of the number of predators (the number of prey is 3-timesthis number). To give an impression: for 100 predators the probability of extinc-tion is 0.12, for 200, 300, and 400 predators it decreases to 0.006, 0.00028, and0.00004, respectively. For 500 predators we did not experience a single extinctionin over 25000 runs (that is, 25 · 106 time steps in total). Consequently, for 100predators or more, the probability of extinction is fairly small. We further discussthe issue of extinction in the mean-field modelling at the end of the chapter.

The graphs of both Figures 7.12 and 7.13 are the average of 1000 simulationruns (each 1000 time steps). For the graph with 100 predators we have filteredout the runs where one of the species became extinct (approximately 12% of the

7.7. Tool Support 137

0 100 200 300 400 5000.0

0.2

0.4

0.6

0.8

1.0

probability of extinction

predators

pro

bability

Figure 7.14: Probability of extinction within a period of 1000 time steps

runs, see Figure 7.14). For the graph with 500 predators no filtering was needed(we did not encounter any extinctions).

7.7 Tool Support

We implemented a tool allowing the user to specify a system specification witha network topology, defined by the distribution contacts and the initial distribu-tion ∆(0), and communication behavior in the form of the transition functionsκ, ǫ, and φ. The tool then performs an automatic mean-field analysis using thealgorithm, which proceeds in two steps:

(i) calculation of matrix MN∆ , according to (7.3) or Figure 7.10, and finding

the limit Mδ ;(ii) matrix-vector multiplications Mδ · δ, according to Theorem 7.1.

In order to avoid inhibitions towards a new tool-dependent specification lan-guage we decided to use existing programming languages for the system specifi-cation. Users can either use Scala [159] to specify the system in a functional pro-gramming style, or, alternatively, Java. Beside the fact that users might alreadybe familiar with these languages, there are some important advantages over tool-dependent languages. First, we have the full power of the platform-independentScala/Java library at our disposal for writing the specification. Second, the writ-ing process can be immensely accelerated by specialized development environ-ments (such as Eclipse, IntelliJ).

Figure 7.15 shows a complete specification of a simple gossip protocol. Theclass Clock defines a node type that is parameterized by a hop count rangingfrom 0 to 5. The hop count is updated when a node talks to another one withsmaller hop count, as defined in talk. It corresponds to the transition function κof the theoretical framework.

In the specification, the hop count remains unchanged for idle nodes and incase of collisions. That is, if a node in state s is idle or has a collision, then anode will stay in s, ǫ(s)(s) = 1 and φ(s)(s) = 1.

Using contacts that corresponds to the distribution contacts in the mod-elling, we define that a node contacts a random partner from all with probability


object SimpleNetDef extends NetDef val MAX HOP = 5

case class Clock(hop: Int) extends Node talk = case b: Clock =>

Array( 1 -> Clock(hop min (b.hop+1)),

1 -> b ) idle = () => Array(1 -> Clock(hop))

collision = idle

contacts = () => Array( 0.1/MAX_HOP*hop -> all )

val all = ContactClass(

for (h <- (0 to MAX HOP)) yield Clock(h): _*)

initial ( 0.1 -> Clock(0), 0.9 -> Clock(MAX_HOP) )

Figure 7.15: A specification of a simple gossip protocol

0.1/5 · hop. The initial distribution ∆(0) of the occupancy measure is expressedby the statement initial of the specification.

To handle specifications with a very large number of node states within areasonable time period, we implemented different performance optimizations.For the mean-field analysis, we need a fast mapping from nodes to indices inthe vector, which represents the occupancy measure. Thus, the tool determinesa local state space of nodes in a first phase. From the local state space the toolextracts the range for each node parameter (e.g., hop), and derives a function,mapping states to unique IDs from the set N. For a better performance, the toolextends the original specification with this function, and recompiles the sourcecode. As an additional optimization, by replacing object creation with look uptables, the tool instantiates each state only once to reduce memory consumption.

Mean-field analysis is perfect for a parallel computation. Its central ingredi-ent, matrix multiplication, can be easily distributed over multiple CPUs. The toolsupports computation on multi-core computers. We leave, however, distributedexecution on a cluster of computers as future work.

For the GTP case study (see Section 7.5.1), the matrix is dense (contains nozeros) and has a size of 11492 × 11492. For this example, the tool needs 4.3seconds per discrete time step (on a Core 2 Duo with 2.66GHz PC). Surprisingly,the throughput of the memory (RAM) channel turns out to be the bottleneck forparallel execution on more than two cores. However, on an 8 core machine wereached 200% speedup of the performance of 3 cores, while more cores did notyield further acceleration.

7.8. Conclusions 139

In this chapter so far, we presented three case studies, all supported by thetoolc.

7.8 Conclusions

We presented an automated mean-field method for large-scale systems withpairwise communication between nodes. To model dynamic systems, we intro-duced the notion of classes of states in our framework, and allow nodes to movebetween the classes by changing their states. We reported the results of three casestudies, performed by the tool; each of the case studies illustrates one aspect ofour framework.

We have demonstrated that it is possible to model mobility in our mean-fieldframework. In general, for the mean-field analysis the state space has to beknown in advance. To this end, the mean-field tool first computes the closure ofreachable states. Moreover, our framework allows for creation and removal ofnodes, and we have illustrated this aspect by modelling a dynamic system in thestyle of Lotka-Volterra equations. However, for modelling biological systems, themean-field analysis as well the Lotka-Volterra equations assume that the numberof individuals tends to infinity, and hence, cannot capture the phenomena that acertain population becomes instinct. In other words, in the long term run (timeτ →∞), when extinction becomes probable, the mean-field approximation doesnot give realistic results.

Regarding the future of the tool, we will experiment with different represen-tations of the matrix, to distribute the computations over a cluster of multiplePCs, and to extend the user-interface for more convenient input of distributionsand probabilities.

The mean-field framework presented in this chapter can be extended as fol-lows. First, the communication between the classes can be governed by a certaindistribution, e.g., the Poisson distribution (instead of discrete uniform distribu-tion). In the current framework, arbitrary distributions can be approximated byassigning nodes to different classes. Second, we plan to extend the method toallow multiple communications per node. Since the result depends on the orderof communications, we need to introduce a notion of schedulers or oracles.

Finally, the mean-field method for discrete-time models without a notion ofclasses in [41, 48] allows for the incorporation of a global memory (history of theoccupancy measure). The tool currently only supports models without memory,and the extension could be integrated in the future.

cSource code of the tool and all experimental data are available at [22].


7.9 Appendix: Specification of Basic GTP Protocol

object ClockNetDef extends NetDef val MAX_HOP = 16; val MAX_DELAY = 25; val MAX_TIMEOUT = 25

val allContacts = Array(

1.0 -> ContactClass( for (g <- (0 to MAX_DELAY);

t <- (0 to MAX_TIMEOUT);

h <- (0 to MAX_HOP))

yield Clock(g, t, h) : _*)

)

case class Clock(delay: Int, timeout: Int, hop: Int) extends Node talk = case b: Clock => Array(

1 -> update(this,b),

1 -> update(b,this)) idle = () => Array(

1 -> Clock(ndelay(delay), 0 max (timeout - 1), hop))

collision = idle

contacts = () => if (delay == 0) allContacts else Array()

def update(a: Clock, b: Clock): Clock = if(a.hop == 0) return Clock(ndelay(a.delay), a.timeout, a.hop);

if(a.timeout == 0) if (b.hop == MAX_HOP)

return Clock(ndelay(a.delay), a.timeout, a.hop);

return Clock(ndelay(a.delay), MAX_TIMEOUT, b.hop + 1);

else var timeout = if (b.hop + 1 < a.hop) MAX_TIMEOUT

else (a.timeout - 1)

return Clock(ndelay(a.delay), timeout, a.hop min (b.hop + 1));

def ndelay(delay: Int): Int = if (delay == 0) MAX_DELAY

else delay - 1

val SOURCE = 1.0 / 1500.0;

val NOSOURCE = (1.0 - SOURCE) / MAX_DELAY

initial (

( List(SOURCE -> Clock(12, MAX_TIMEOUT, 0)) :::

((0 to 11) ++ (13 to MAX_DELAY)).mapdelay => (NOSOURCE -> Clock(delay, MAX_TIMEOUT, MAX_HOP))

.toList ) : _*)

steps (600)

7.10. Appendix: Specification for Mobile Case 141

evaluationDiagram (

new Diagram.Config(7.5, 5, new Diagram.Range(0, 1), 200, 0.1),

(0 to MAX_HOP).mapfilterHop => nodeFilter case node: Clock => node.hop < filterHop

.toArray)

distributionDiagram (

new Diagram.Config(7.5, 5,

new Diagram.Range(0, 350d/1500), 5, 350d/1500/7),

(0 to MAX_HOP).mapfilterHop => nodeFilter case node: Clock => node.hop == filterHop

.toArray)

7.10 Appendix: Specification for Mobile Case

object VirusNetDef extends NetDef def getContacts (villages: Int*) = ContactClass(

for (village <- villages; ill <- List(false, true); h <- (0 to 20))

yield Village(village, ill, h) )

trait Crowd def vilNr: Int; def ill: Boolean; def heal: Int

case class Village (vilNr: Int, ill: Boolean,

heal: Int) extends Node with Crowd contacts = () =>

val neighbours =

((vilNr-1 max 0) to (vilNr+1 min 2)).filter(_ != vilNr)

Array(0.005 -> getContacts(neighbours: _*),

0.1 -> getContacts(vilNr))

talk = case vil: Crowd => Array(

1 -> Village(vilNr, (ill || vil.ill) && heal < 20, heal

1 -> Village(vil.vilNr, (ill || vil.ill) && vil.heal < 20, vil.heal)

)

def nHealing = Math.min(heal + (if (ill) 1 else 0), 20)

idle = () => Array( 1 -> Village(vilNr, ill && heal < 20, nHealing) )

collision = idle

case class Ship (pos: Int, ill: Boolean,

heal: Int) extends Node with Crowd def vilNr = 0

contacts = () => pos match case 0 => Array(1 -> getContacts(0))


case 3 => Array(1 -> getContacts(2))

case _ => Array()

talk = case vil: Crowd => Array(

1 -> Ship(nPos(pos), vil.ill, vil.heal),

1 -> Village(vil.vilNr, ill, heal)

)

def nPos(pos: Int) = if (pos == 5) 0 else pos + 1

idle = () => Array(1 -> Ship(nPos(pos), ill, heal))

collision = idle

var ship: Double = 1.0 / (2 * 2 + 3)

var vd: Double = (1.0 - ship) / (2 + 1)

initial( (

(0 to 2).toList.map(vilNr => vd -> new Village(vilNr, false, 0)) ++

List(0.5 * vd -> Village(2, false, 0)) ++

List(0.5 * vd -> Village(2, true, 0)) ++

List(ship -> Ship(0, false, 0))

): _*)

steps(100)

evaluationDiagram(

new Diagram.Config(10, 5, new Diagram.Range(0, 1), 10, 0.1),

Array(nodeFilter("red",

case node: Ship => node.infected;

case _ => false ),nodeFilter("" ,

case node: Village => node.villageNr == 0

&& node.infected;

case _ => false ),nodeFilter("dotted",

case node: Village => node.villageNr == 1

&& node.infected;

case _ => false ),nodeFilter("dashed",

case node: Village => node.villageNr == MAX_VILLAGE

&& node.infected;

case _ => false )))

7.11. Appendix: Specification for Dynamic Case 143

7.11 Appendix: Specification for Dynamic Case

object PredatorNetDef extends NetDef case class Water() extends Node

case class Prey() extends Node talk = case b: Water => Array(1.04 -> Prey(), 1 -> Water())

case _ => Array() contacts = () => Array(0.5 -> ContactClass(Water()))

case class Predator() extends Node talk = case b: Prey => Array(1.5 -> Predator(), 0 -> Prey())

case b: Water => Array(0.95 -> Predator(), 1 -> Water())

case _ => Array()

idle = () => Array(0.95 -> Predator())

collision = idle

contacts = () => Array(1 -> ContactClass(Prey(), Water()))

initial ( 0.8 -> Water(), 0.14 -> Prey(), 0.06 -> Predator() )

steps (1000)

evaluationDiagram (

new Diagram.Config(7.5, 5, new Diagram.Range(0, 1), 200, 0.1),

Array( nodeFilter case node : Water => true

case _ => false ,nodeFilter case node : Prey => true

case _ => false ,nodeFilter case node : Predator => true

case _ => false )

)

8Concluding Remarks

Large-scale applications running on thousands of nodes are popular nowa-days. These systems are often diagnosed by performing simulations to discoverinsight in the relation between design parameters and observed behavior. Suchexperimental results provide essential data on system behavior, and can aid inunderstanding the emergent behavior of the system. However, the infinite spaceof the parameter settings for probabilistic systems is often too large to be ex-plored experimentally. Moreover, it is very difficult to predict what the effects ofcertain design decisions are.

This thesis focused on gossip protocols, and explored both traditional andnon-traditional ways for analyzing them. This chapter discusses the observationsmade during my work on the analytical frameworks, presented in the thesis.

8.1 Analytical Models

Traditional analysis methods for computer systems such as model checkingwith its exhaustive state space search, fail to cope with large gossip networks. Al-though quite useful for studying certain behavior of small-sized networks, thesemethods do not scale well for gossip protocols. On the other hand, standardmethods for modelling of epidemics scale well, but abstract away the details of aprotocol, showing only simple emergent behavior of the system.

A challenge is to develop analytical models that capture (part of) the behaviorof a system, and then subsequently optimize design parameters, at the right levelof abstraction.

8.1.1 Pairwise Exchange Model

Gossip networks consist of thousand nodes interacting with each other. I be-lieve that modelling a gossip protocol at the level of local interactions, as demon-strated in Chapter 3 and developed further, is a good approach to analyze suchprotocols. On the one hand, this analytical model includes the mechanics of

145

146 Chapter 8 — Concluding Remarks

communication between nodes on the level of the protocol details. On the otherhand, such a model allows for studying the impact of system parameters on theperformance of the protocol, and can be used to optimally design and fine-tuneit.

Moreover, the pairwise exchange model can be integrated with traditionalanalysis techniques, for instance, model checking and differential equations, asit is shown in Chapter 5. In case of model checking, the size of the networkanalyzed cannot be large, but it is a significant improvement over straightforwardmodels. Additionally, such a combined model captures intricacy of a protocol.The pairwise exchange model combined with differential equations results in aMarkov Chain-style model, which enjoys scalability, at the same time bringingthe mechanics of a protocol into the model.

Figure 8.1: State transition diagram

As mentioned in Chapter 3, the pairwise exchange model is at a high abstrac-tion level, but it can be extended to model the spread of spam (for example,introduced for the shuffle protocol in [93]) in gossip networks. To show ‘the fla-vor’ of the extended model, one can simply add to Figure 3.2 one more state 2,indicating that an observed item is corrupted (see Figure 8.1). This ‘naıve’ modelhas nine states, and all state transitions and their corresponding probabilitiescan be calculated according to the description in [93]. It could be interesting toautomate this framework in the future.

The pairwise exchange model in Chapter 3 assumes a simplified Medium Ac-

8.1. Analytical Models 147

cess Control (MAC) layer with no message losses or collisions. In Chapter 4, themodel is revised under the assumption of lossy communication channels. Thepairwise exchange model can be further combined with a model of MAC layerembedding access mechanism in, for example, IEEE 802.11e ad hoc networks[43, 80].

The well-known Bianchi model [43] assumes that nodes always have a mes-sage in the queue ready for transmission, referred to as saturated or saturationcondition. That model is an analytical evaluation of the saturation throughput.Since in gossip protocols nodes periodically gossip with each other, the Bianchimodel is not suitable for these protocols. Its extension, the Engelstad et al. model[80], covers also a nonsaturated condition, computing the throughput for nodesfrom different access categories. Both models are Markov Chains.

Integrating the pairwise exchange model with one of them can be in principledone. To that end, however, one needs to consider, for example, one of thefollowing configurations:

• densely populated networks (resulting in communication interference be-tween nodes);

• mobile networks, where the travelling time is larger than the transmissionrange;

• the gossip protocol is not the only application running at the nodes;• a probabilistic broadcast protocol with a reply mechanism (bidirectional

interaction);• a network where nodes continually synchronize with each other.

If nodes communicate as in the shuffle protocol, the probability that a significantnumber of nodes gossip simultaneously is very small (see equation (6.3)). Thisleads to a low communication load, and message collisions can be neglected.Thus, in the current settings the integrated model is simply an overkill.

The pairwise exchange model abstracts away from the underlying networktopology. As a first step, it has been used to model the properties for a fully-connected network in Chapter 5 and 6. It would be interesting to investigatehow the pairwise exchange model can be applied to analyze properties of theshuffle protocol for other network topologies, and to apply this framework toanother gossip protocol.

8.1.2 Statistical Sampling Method

The interplay between traditional analysis techniques and abstraction is of-ten suitable for gossip protocols. However, despite the natural simplicity of theprotocols, they exhibit a complex probabilistic behavior which is different for dif-ferent configurations (that is, network topology and the probability of messageloss). This thesis introduced a hybrid method to compute a part of an analyticalexpression with statistical data sampling from large-scale experiments.

The calculation of the correction factor in Section 3.4.4 can be seen as atraditional curve fitting method, and a somewhat similar approach is taken in

148 Chapter 8 — Concluding Remarks

Chapter 4. In this chapter, the calculation of the probability Pdrop combines bothrigorous modelling of the message exchange in the shuffle protocol and statisti-cal data sampling from large-scale Monte Carlo simulations for different networktopologies. This approach allows for building a framework that captures the be-havior of the gossip protocol without incorporation of scenario-specific elementsinto the analytical model. It is a challenge, however, to decompose the modelinto its basic components in order to separate configuration-dependent ones andcompute them separately.

On the contrary, traditional analytical methods would allow only for mod-elling each specific scenario configuration, reducing the applicability of the frame-work to those very specific configurations. Thus, I see the statistical samplingmethod as a golden mean between high-level models such as for epidemics show-ing only the emergent behavior, and the low-level models of the protocol thatdepend on particular implementation settings.

8.1.3 Mean-field Framework

The use of deterministic models like differential equations has been extremelypopular for modelling of epidemics since 1920s. However, one of them, mean-field analysis, originating from statistical mechanics, has only been applied re-cently for the effective analysis of very large communication systems, includinggossip protocols as demonstrated in Chapters 6 and 7.

With the mean-field framework developed in this thesis, one can choose spe-cific local parameters for the performance analysis of a gossip protocol. Chap-ter 6 provided step-by-step guidelines for a manual analysis. The frameworkrequires the calculation of a transition probability matrix, which is often a te-dious and error-prone procedure. Therefore, the mean-field framework has beenautomated and extended for dynamic networks in Chapter 7.

Mean-field models assume that the network size tends to infinity, and pro-vides an asymptotic analysis of the system. It also predicts the behavior of thesystem before the steady state with a reasonable accuracy (Sections 6.4.5 and6.5.4). I believe that mean-field method and gossip protocols constitute a perfectsymbiosis.

However, some questions with respect to these models remain open. Themost important issue is related to the fact that the state space of a node has to befinite, which is quite tricky (if possible at all) to achieve for real numbers that areused in gossip-based aggregation protocols. For certain protocols this problemcan be dealt with an abstraction.

Furthermore, it would be interesting to investigate mean-field analysis for al-ternative stochastic models for the nodes, for example, by moving to the continuous-time context [45] or by introducing nondeterminism using Markov decision pro-cesses [160]. The automated mean-field framework can be extended to allow forcalculation of the accuracy of the mean-field results using the second and higherorder moments [31, 57].

8.2. Industrial Importance 149

8.2 Industrial Importance

I think that the future of gossip protocols is bright. Along with the advances incomputer science and robotics, autonomous robots become more and more com-pact, and at the same time more ‘intelligent’. Already today autonomous robotsshow impressive skills in playing football, and swarms of autonomous robotshelp cleaning the recent oil spill in the Mexican Gulf. The spectrum of applica-tion of autonomous robots extends rapidly. A long-term vision in bioinformaticsis to develop nanorobots than can be injected in the human bloodstream to re-pair damaged organs and cells from inside our bodies. Although this vision willtake decades to materialize, the general trend is obvious: large swarms of small,autonomous robots will cooperate to reach a certain goal. In such a setting,communication is lossy and the single units are prone to failure. Here, gossipprotocols are a natural design choice.

Besides robotics, technologies for renewable energy can also use gossip pro-tocols. For example,

• sensors embedded in wind turbines, gossiping the wind direction and ag-gregating this information to decide on the optimal angle and direction;

• sensors embedded in solar panels, gossiping and making mutual decisionson the optimal angle of the mirrors.

Gossip protocols with their built-in redundancy offer the perfect solution todeal with changing topologies, node failures and message corruption. Therefore,we need to put more work in the development of analytical frameworks for gossipprotocols.

9Summary of the Thesis

9.1 Samenvatting

Met het toenemend aantal roddelprotocollen is er steeds meer behoefte aaneen inzichtelijke en systematische analyse hiervan. Echter, de formele analysevan dergelijke protocollen is nog steeds een tamelijk onontgonnen onderzoeks-gebied, met vele uitdagingen en open problemen. Dit proefschrift richt zich opde analyse van roddelprotocollen in peer-to-peer en ad-hoc netwerken, en on-twikkelt verscheidene modelleer-raamwerken die schaalbaar zijn ten opzichtevan de grootte van het netwerk.

Dit proefschrift bestaat uit drie delen. Het eerste deel evalueert een roddel-gebaseerd disseminatieprotocol, op basis van een systeemmodel met perfectezowel als imperfecte communicatiekanalen. Het analytische model verheldert derelatie tussen de parameters van een roddelprotocol, en maakt het kiezen vanoptimale waardes voor deze parameters mogelijk.

Het model wordt in het tweede deel van het proefschrift gebruikt om tebepalen hoe snel data zich voortplant in een netwerk. Disseminatie wordt gemod-elleerd met behulp van de probablistische model-checker PRISM, als ook doormiddel van differentiaalvergelijking. Dit deel onderzoekt verder de impact vanverschillende schedulers op de prestaties van een protocol.

In het laatste deel wordt een zogenaamde mean-field methode uit de statis-tische mechanica aangepast voor en toegepast op roddelprotocollen. Een klassevan stochastische systemen wordt geıdentificeerd waarvoor de constructie vaneen mean-field model kan worden geautomatiseerd, en een tool wordt gepre-senteerd tezamen met drie verschillende toepassingen (op uniforme, mobiele endynamische netwerken).

9.2 Summary

With the growing number of gossip protocols, there is an increasing demandfor their analysis in an insightful and systematic way. However, the formal anal-

151

152 Chapter 9 — Summary of the Thesis

ysis of these protocols is still a rather unexplored research field, with many chal-lenges and open problems. This thesis focuses on the analysis of gossip protocolsin peer-to-peer and ad hoc networks, and develops various modeling frameworksthat are scalable with respect to the network size.

The thesis consists of three main parts. The first part evaluates a gossip-based dissemination protocol, based on a system model with perfect and lossycommunication channels. The analytical model clarifies the relationship betweenthe protocol parameters and allows for their optimization.

The model is used in the second part of the thesis to show how fast a data itemis disseminated through a network. To this end, dissemination is modeled usingthe probabilistic model checker PRISM, and by means of differential equations.This part also explores the impact of different scheduling policies on the protocolperformance.

In the last part, a mean-field framework originating from statistical mechanicsis adapted and applied to gossip protocols. A class of stochastic systems is iden-tified for which the construction of the mean-field model can be automated, anda tool is presented which is applied to three different case studies (on uniform,mobile and dynamic networks).

Bibliography

[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions

with Formulas, Graphs, and Mathematical Tables. Dover, New York, 9thedition, 1972.

[2] L. Afanassieva, S. Popov, and G. Fayolle. Models for transporation net-works. J. of Math. Sciences, 84(3):1092–1103, 1997.

[3] A. Allavena, A. Demers, and J. E. Hopcroft. Correctness of a gossip basedmembership protocol. In Proc. Symp. on Principles of Distributed Comput-

ing (PODC), pages 292–301. ACM Press, 2005.

[4] E. Altman, P. Nain, and J.-C. Bermond. Distributed storage managementof evolving files in delay tolerant ad hoc networks. In Proc. Conf. of Com-

puter and Communications Societies (INFOCOM), pages 1431–1439. IEEE,2009.

[5] T. Andel and A. Yasinsac. On the credibility of MANET simulations. Com-

puter, 37(7):48–54, 2006.

[6] H. Andersson and T. Britton. Stochastic epidemic models and their statistical

analysis, volume 151 of Lecture Notes in Statistics. Springer-Verlag, 2000.

[7] J. P. Aparicio, M. A. Natiello, and H. G. Solari. The quasi-deterministiclimit of population dynamics. Presented at Int. ISAAC Congress, July 25-30, 2005. Preprint, 2007.

[8] S. Assaf, P. Diaconis, and K. Soundararajan. A rule of thumb for riffleshuffling. Technical report, Stanford University, 2008.

[9] F. Baccelli, A. Chaintreau, D. De Vleeschauwer, and D. McDonald. A mean-field analysis of short lived interacting TCP flows. ACM SIGMETRICS Per-

formance Evaluation Review, 32(1):343–354, 2004.

153

154 BIBLIOGRAPHY

[10] F. Baccelli, A. Chaintreau, D. De Vleeschauwer, and D. McDonald. HTTPturbulence. AMS Networks and Heterogeneous Media, 1:1–40, 2006.

[11] F. Baccelli, D. McDonald, and M. Lelarge. Metastable regimes for multi-plexed TCP flows. In Allerton Conf. on Communication, Control, and Com-

puting, volume 42, pages 1005–1011. Curran Associates, Inc., 2004.

[12] F. Baccelli, D. McDonald, and J. Reynier. A mean-field model for mul-tiple TCP connections through a buffer implementing RED. PerformanceEvaluation, 49(1-4):77–97, 2002.

[13] N. Bahi-Jaber and D. Pontier. Modeling transmission of directly transmit-ted infectious diseases using colored stochastic Petri nets. Mathematical

Biosciences, 185(1):1–13, 2003.

[14] C. Baier, L. Cloth, B. R. Haverkort, M. Kuntz, and M. Siegle. Model check-ing markov chains with actions and state labels. IEEE Transactions Soft-

ware Eng., 33(4):209–224, 2007.

[15] C. Baier, B. R. Haverkort, H. Hermanns, and J.-P. Katoen. Model checkingcontinuous-time Markov chains by transient analysis. In Proc. Conf. on

Computer Aided Verification (CAV), volume 1855 of LNCS, pages 358–372.Springer, 2000.

[16] C. Baier, B. R. Haverkort, H. Hermanns, and J.-P. Katoen. Model-checkingalgorithms for continuous-time Markov chains. IEEE Transactions Software

Eng., 29(6):524–541, 2003.

[17] N. T. Bailey. Mathematical Theory of Infectious Diseases and Its Applications.Griffin, London, 2nd edition, 1975.

[18] R. Bakhshi, F. Bonnet, W. Fokkink, and B. Haverkort. Formal analysis tech-niques for gossiping protocols. ACM SIGOPS Oper. Syst. Rev., 41(5):28–36,2007.

[19] R. Bakhshi, L. Cloth, W. Fokkink, and B. Haverkort. Mean-field analysisfor the evaluation of gossip protocols. In Proc. Conf. on Quantitative Eval-

uation of SysTems (QEST), pages 247–256. IEEE Computer Society, 2009.

[20] R. Bakhshi, L. Cloth, W. Fokkink, and B. R. Haverkort. Mean-field analy-sis for the evaluation of gossip protocols. ACM SIGMETRICS Performance

Evaluation Review, 36(3):31–39, 2008.

[21] R. Bakhshi, L. Cloth, W. Fokkink, and B. R. Haverkort. Mean-field frame-work for performance evaluation of push-pull gossip protocols. Perfor-

mance Evaluation, 2010. In Press.

BIBLIOGRAPHY 155

[22] R. Bakhshi, J. Endrullis, and S. Endrullis. Tool for automated mean-fieldframework. http://www.few.vu.nl/~rbakhshi/alg/mean.tar.gz.

[23] R. Bakhshi, J. Endrullis, S. Endrullis, W. Fokkink, and B. Haverkort. Au-tomating the mean-field method for large dynamic gossip networks. InProc. Conf. on Quantitative Evaluation of SysTems (QEST), pages 241–350.IEEE Computer Society, 2010.

[24] R. Bakhshi, J. Endrullis, W. Fokkink, and J. Pang. Brief announcement:asynchronous bounded expected delay networks. In Proc. Symp. on Prin-

ciples of Distributed Computing (PODC), pages 392–393. ACM, 2010.

[25] R. Bakhshi and A. Fehnker. On the impact of modelling choices for dis-tributed information spread. A comparative study. In Proc. Conf. on Quan-titative Evaluation of SysTems (QEST), pages 41–50. IEEE Computer Soci-ety, 2009.

[26] R. Bakhshi and W. Fokkink. How probable is it to discard an ace of hearts?In Liber Amicorum for Roel de Vrijer. Letters and essays dedicated to Roel deVrijer on the occasion of his 60th birthday, pages 1–9. Lulu, 2009.

[27] R. Bakhshi, W. Fokkink, J. Pang, and J. van de Pol. Leader Election inAnonymous Rings: Franklin Goes Probabilistic. In Proc. Conf. on Theoret-ical Computer Science (TCS), volume 273 of IFIP, pages 57–72. Springer,2008.

[28] R. Bakhshi, D. Gavidia, W. Fokkink, and M. van Steen. An analytical modelof information dissemination for a gossip-based protocol. Computer Net-works, 53(13):2288–2303, 2009.

[29] R. Bakhshi, D. Gavidia, W. Fokkink, and M. van Steen. An analytical modelof information dissemination for a gossip-based protocol. In Proc. Conf. on

Distributed Computing and Networking (ICDCN), volume 5408 of LNCS,pages 230–242. Springer, 2009.

[30] R. Bakhshi and D. Gurov. Verification of Peer-to-Peer Algorithms: a CaseStudy. In Proc. Workshop on Methods and Tools for Coordinating Concur-

rent, Distributed and Mobile Systems (MTCoord), volume 181 of ENTCS,pages 35–47. Elsevier, June 2007.

[31] K. P. Balanda and H. L. MacGillivray. Kurtosis: A critical review. The

American Statistician, 42(2):111–119, 1988.

[32] R. Baldoni, A. Corsaro, L. Querzoni, S. Scipioni, and S. T. Piergiovanni.Coupling-based internal clock synchronization for large-scale dynamic dis-tributed systems. IEEE Transactions Parallel Distrib. Syst., 21(5):607–619,2010.

http://www.few.vu.nl/~rbakhshi/alg/mean.tar.gz

156 BIBLIOGRAPHY

[33] N. Banerjee, M. D. Corner, D. Towsley, and B. N. Levine. Relays, basestations, and meshes: enhancing mobile networks with infrastructure. InProc. Conf. on Mobile computing and networking (MobiCom), pages 81–91,2008.

[34] Z. Bar-Yossef, R. Friedman, and G. Kliot. RaWMS: random walk basedlightweight membership service for wireless ad hoc network. In Proc.

Symp. on Mobile Ad Hoc Networking and Computing (MobiHoc), pages238–249. ACM Press, 2006.

[35] F. Bause and P. Kritzinger. Stochastic Petri Nets: An Introduction to the The-

ory. Advanced Studies in Computer Science. Vieweg Verlagsgesellschaft,1996.

[36] R. J. Baxter. Exactly Solved Models in Statistical Mechanics. Academic Press,1982.

[37] D. Bayer and P. Diaconis. Trailing the dovetail shuffle to its lair. Annals of

Applied Probability, 2(2):294–313, 1992.

[38] G. Behrmann, A. David, and K. G. Larsen. A tutorial on Uppaal. In Formal

Methods for the Design of Real-Time Systems: Proc. 4th Int. School on Formal

Methods for the Design of Comput., Commun. and Software Syst. (SFM-RT2004), number 3185 in LNCS, pages 200–236. Springer, 2004.

[39] A. Bell and B. R. Haverkort. Sequential and distributed model checkingof Petri nets. Journal on Software Tools for Technology Transfer (STTT),7(1):43–60, 2005.

[40] A. Bell and B. R. Haverkort. Distributed disk-based algorithms for modelchecking very large Markov chains. Formal Methods in System Design,29(2):177–196, 2006.

[41] M. Benaım and J.-Y. Le Boudec. A Class Of Mean Field Interaction Mod-els for Computer and Communication Systems. Performance Evaluation,65(11-12):823–838, 2008.

[42] Y. Bertot and P. Casteran. Interactive Theorem Proving and Program De-

velopment Coq’Art: The Calculus of Inductive Constructions, volume XXV ofTexts in Theoretical Computer Science: An EATCS Series. Springer, 2004.

[43] G. Bianchi. Performance analysis of the ieee 802.11 distributed coor-dination function. IEEE Journal on Selected Areas in Communications,18(3):535–547, 2000.

[44] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. Bi-modal multicast. ACM Transactions on Comput. Syst., 17(2):41–88, 1999.

BIBLIOGRAPHY 157

[45] A. Bobbio, M. Gribaudo, and M. Telek. Analysis of large scale interactingsystems by mean field method. In Proc. Conf. on Quantitative Evaluation

of SysTems (QEST), pages 215–224. IEEE, 2008.

[46] F. Bonnet. Performance analysis of Cyclon, an inexpensive membershipmanagement for unstructured P2P overlays. Master thesis, ENS CachanBretagne, University of Rennes, IRISA, 2006.

[47] E. Bortnikov, M. Gurevich, I. Keidar, G. Kliot, and A. Shraer. Brahms:Byzantine resilient random membership sampling. Computer Networks,53(13):2340–2359, 2009.

[48] J.-Y. L. Boudec, D. McDonald, and J. Mundinger. A generic mean fieldconvergence result for systems of interacting objects. In Proc. Conf. on

Quantitative Evaluation of SysTems (QEST), pages 3–18. IEEE, 2007.

[49] J. Bowen and M. Hinchey. Ten commandments of formal methods ...tenyears later. Computer, 39(1):40–48, 2006.

[50] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms: design,analysis and applications. In Proc. Conf. of Computer and Communications

Societies (INFOCOM), volume 3, pages 1653–1664. IEEE Press, 2005.

[51] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Mixing times for randomwalks on geometric random graphs. In Proc. 7th Workshop on Algorithm

Eng. and Experiments and 2nd Workshop on Analytic Algorithmics and Com-binatorics (ALENEX/ANALCO 2005), pages 240–249. SIAM, 2005.

[52] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algo-rithms. IEEE/ACM Transactions on Networking (TON), 14(SI):2508–2530,2006.

[53] L. Breslau, D. Estrin, K. Fall, S. Floyd, J. Heidemann, A. Helmy, P. Huang,S. McCanne, K. Varadhan, Y. Xu, and H. Yu. Advances in network simula-tion. Computer, 33(5):59–67, 2000.

[54] Y. Busnel, R. Beraldi, and R. Baldoni. A formal characterization of uniformpeer sampling based on view shuffling. In Proc. Conf. on Parallel and

Distributed Computing, Applications and Technologies (PDCAT), pages 360–365, 2009. Extended version to appear in JPDC.

[55] M. Cadilhac, T. Herault, R. Lassaigne, S. Peyronnet, and S. Tixeuil. Eval-uating complex MAC protocols for sensor networks with APMC. In Proc.

Workshop on Automated Verification of Critical Syst (AVoCS’06), volume185 of ENTCS, pages 33–46. Elsevier, 2007.

[56] R. Cardell-Oliver. Why flooding is unreliable in multi-hop, wireless net-works. Technical Report UWA-CSSE-04-001, University of Western Aus-tralia, 2004.

158 BIBLIOGRAPHY

[57] G. Casella and R. L. Berger. Statistical Inference. Pacific Grove: Duxbury,2nd edition, 2002.

[58] D. Cavin, Y. Sasson, and A. Schiper. On the accuracy of MANET simula-tors. In Proc. 2nd ACM Int. Workshop on Principles of Mobile Computing

(POMC’02), pages 38–43. ACM Press, 2002.

[59] A. Chaintreau, J.-Y. Le Boudec, and N. Ristanovic. The age of gossip:spatial mean field regime. In Proc. Conf. on Measurement and Modeling ofComputer Systems (SIGMETRICS), pages 109–120. ACM, 2009.

[60] F. Chierichetti, S. Lattanzi, and A. Panconesi. Almost tight bounds forrumour spreading with conductance. In Proc. ACM Symp. on Theory of

Computing (STOC), pages 399–408. ACM Press, 2010.

[61] F. Chierichetti, S. Lattanzi, and A. Panconesi. Rumour spreading andgraph conductance. In Proc. ACM-SIAM Symp. on Discrete algorithm

(SODA), pages 1657–1663. SIAM, 2010.

[62] F. Ciocchetta, A. Degasperi, J. Hillston, and M. Calder. Some investigationsconcerning the CTMC and the ODE model derived from Bio-PEPA. InProc. Workshop on From Biology to Concurrency and Back (FBTC), volume229(1) of ENTCS, pages 145–163. Elsevier, 2009.

[63] A. Clark, S. Gilmore, J. Hillston, and M. Tribastone. Stochastic processalgebra. In Formal Methods for the Performance Evaluation: Proc. 7th

Int. School on Formal Methods for the Design of Comput., Commun. and

Software Syst. (SFM-PE 2007), number 4486 in LNCS, pages 132–179.Springer, 2007.

[64] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press,2000.

[65] C. Cooper, M. Dyer, and A. Handley. The flip markov chain and a random-izing P2P protocol. In Proc. Symp. on Principles of Distributed Computing

(PODC), pages 141–150. ACM, 2009.

[66] P. Costa, V. Gramoli, M. Jelasity, G. P. Jesi, E. Le Merrer, A. Montresor, andL. Querzoni. Exploring the interdisciplinary connections of gossip-basedsystems. ACM SIGOPS Oper. Syst. Rev., 41(5):51–60, 2007.

[67] P. Crouzen, J. van de Pol, and A. Rensink. Applying formal methods to gos-siping networks with mCRL and GROOVE. ACM SIGMETRICS Performance

Evaluation Review, 36(3):7–16, 2008.

[68] P. Dagum, R. Karp, M. Luby, and S. Ross. An optimal algorithm for MonteCarlo estimation. SIAM J. Comput., 29(5):1484–1496, 2000.

BIBLIOGRAPHY 159

[69] D. J. Daley and J. Gani. Epidemic Modelling: An Introduction. CambridgeUniversity Press, Cambridge, UK, 1999.

[70] R. Darling and J. Norris. Differential equation approximations for Markovchains. Probab. Surveys, 5:37–79, 2008.

[71] D. Dawson, J. Tang, and Y. Zhao. Balancing queues by mean field interac-tion. Queueing Syst. Theory & Appl., 49(3–4):335–361, 2005.

[72] S. Deb, M. Medard, and C. Choute. Algebraic gossip: a network codingapproach to optimal multiple rumor mongering. IEEE/ACM Transactions

on Networking (TON), 14(SI):2486–2507, 2006.

[73] A. Demaille, S. Peyronnet, and T. Herault. Probabilistic verification ofsensor networks. In Proc. Conf. on Comput. Sci., Research, Innovation and

Vision for the Future (RIVF), pages 45–54. IEEE Computer Society, 2006.

[74] A. Demaille, S. Peyronnet, and B. Sigoure. Modeling of sensor networksusing XRM. In Proc. Symp. on Leveraging Applications of Formal Methods,

Verification and Validation (ISoLA), pages 271–276. IEEE, 2006.

[75] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Stur-gis, D. Swinehart, and D. Terry. Epidemic algorithms for replicateddatabase maintenance. In Proc. Symp. on Principles of Distributed Com-

puting (PODC), pages 1–12. ACM Press, 1987.

[76] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood fromincomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39:1–38,1977.

[77] A. Dimakis, S. Kar, J. Moura, M. Rabbat, and A. Scaglione.Gossip algorithms for distributed signal processing. Tech-nical Report arXiv:1003.5309, CoRR, 2010. Available athttp://arxiv.org/abs/1003.5309.

[78] A. G. Dimakis, A. D. Sarwate, and M. J. Wainwright. Geographic gossip:efficient aggregation for sensor networks. In Proc. 5th Int. Conf. on Inform.

processing in sensor networks (IPSN’06), pages 69–76. ACM Press, 2006.

[79] N. Drost, E. Ogston, R. van Nieuwpoort, and H. Bal. ARRG: Real-worldgossiping. In Proc. Symp. on High Performance Distributed Computing

(HPDC), pages 147–158. ACM, 2007.

[80] P. E. Engelstad and O. N. Østerbø. Non-saturation and saturation analy-sis of IEEE 802.11e EDCA with starvation prediction. In Proc. Symp. on

Modeling, analysis and simulation of wireless and mobile systems (MSWiM),pages 224–233. ACM, 2005.

160 BIBLIOGRAPHY

[81] S. N. Ethier and T. G. Kurtz. Markov Processes: Characterization and Con-

vergence. Wiley, 1986.

[82] P. Eugster, R. Guerraoui, S. Handurukande, A.-M. Kermarrec, andP. Kouznetsov. Lightweight probabilistic broadcast. ACM Transactions on

Comput. Syst., 21(4):341–374, 2003.

[83] P. T. Eugster, R. Guerraoui, A.-M. Kermarrec, and L. Massoulie. Fromepidemics to distributed computing. Computer, 37(5):60–67, 2004.

[84] A. Fehnker and P. Gao. Formal verification and simulation for performanceanalysis for probabilistic broadcast protocols. In Proc. 5th Conf. on Ad-Hoc,

Mobile, and Wireless Networks (ADHOC-NOW’06), volume 4104 of LNCS,pages 128–141. Springer, 2006.

[85] U. Feige, D. Peleg, P. Raghavan, and E. Upfal. Randomized broadcast innetworks. In SIGAL Int. Symp. on Algorithms, volume 450 of LNCS, pages128–137. Springer, 1990.

[86] J. Fernandez, H. Garavel, A. Kerbrat, L. Mounier, R. Mateescu, andM. Sighireanu. CADP – a protocol validation and verification toolbox.In Proc. Conf. on Computer Aided Verification (CAV), volume 1102 of LNCS,pages 437–440. Springer, 1996.

[87] G. S. Fishman. Monte Carlo: Concepts, Algorithms, and Applications.Springer Series in Operations Research. Springer, 1996.

[88] G. Forsythe, M. Malcolm, and C. Moler. Computer Methods for Mathemati-

cal Computations. Prentice-Hall, 1977, New Jersey.

[89] D. Frey, R. Guerraoui, A.-M. Kermarrec, M. Monod, and V. Quema.Stretching gossip with live streaming. In Proc. Conf. on Dependable Sys-

tems and Networks (DSN), pages 259–264. IEEE, 2009.

[90] R. Friedman, A.-M. Kermarrec, H. Miranda, and L. Rodrigues. Gossip-based dissemination. In B. Garbinato, H. Miranda, and L. Rodrigues,editors, Middleware for Network Eccentric and Mobile Applications, pages169–190. Springer, 2009.

[91] A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topologyon the spread of epidemics. In Proc. Conf. of Computer and Communica-

tions Societies (INFOCOM), volume 2, pages 1455–1466. IEEE ComputerSociety, 2005.

[92] A. J. Ganesh, A.-M. Kermarrec, and L. Massoulie. SCAMP: Peer-to-peerlightweight membership service for large-scale group communication. InProc. Workshop on Networked Group Communication (NGC), volume 2233of LNCS, pages 44–55. Springer, 2001.

BIBLIOGRAPHY 161

[93] D. Gavidia, G. P. Jesi, C. Gamage, and M. van Steen. Canning spam in gos-sip wireless networks. In Proc. Symp. on Modeling, analysis and simulation

of wireless and mobile systems (MSWiM), pages 30–37. IEEE, 2007.

[94] D. Gavidia, S. Voulgaris, and M. van Steen. A gossip-based distributednews service for wireless mesh networks. In Proc. Conf. on Wireless On

demand Network Syst. and Services (WONS), pages 59–67. IEEE ComputerSociety, 2006.

[95] M. Gilleland. Working with fractions in Java. Available athttp://www.merriampark.com/fractions.htm, 2002.

[96] R. W. Gosper. Decision procedure for indefinite hypergeometric summa-tion. Proc. Nat. Acad. Sci. USA, 75(1):40–42, 1978.

[97] R. Grosu and S. A. Smolka. Monte carlo model checking. In Proc. Conf. on

Tools and Algorithms for the Construction and Analysis of Systems (TACAS),volume 3440 of LNCS, pages 271–286. Springer, 2005.

[98] G. Guirado, T. Herault, R. Lassaigne, and S. Peyronnet. Distribution, ap-proximation and probabilistic model checking. ENTCS, 135(2):19–30,2006.

[99] I. Gupta, M. Nagda, and C. F. Devaraj. The design of novel distributedprotocols from differential equations. Distributed Computing, 20(2):95–114, 2007.

[100] M. Gurevich and I. Keidar. Correctness of gossip-based membership un-der message loss. In Proc. Symp. on Principles of Distributed Computing

(PODC), pages 151–160. ACM, 2009.

[101] J. M. Hammersley and K. W. Morton. Poor man’s Monte Carlo. J. Royal

Statistical Soc. Series B (Methodological), 16(1):23–38, 1954.

[102] H. Hansson and B. Jonsson. A logic for reasoning about time and reliabil-ity. Formal Aspects of Computing, 6(5):512–535, 1994.

[103] A. Harwood and O. Ohrimenko. Mean field models of message through-put in dynamic peer-to-peer systems. Technical Report arXiv:0705.2065,CoRR, 2007. Available at http://arxiv.org/abs/0705.2065.

[104] B. R. Haverkort. Are stochastic process algebras good for performance anddependability evaluation? In Proc. Workshop on Process Algebra and Perfor-

mance Modeling (PAPM), volume 1853 of LNCS, pages 501–510. Springer,2000.

[105] B. R. Haverkort. Markovian models for performance and dependabilityevaluation. In European Educ. Forum: School on Formal Methods and Per-

formance Analysis, volume 2090 of LNCS, pages 38–83. Springer, 2002.

http://www.merriampark.com/fractions.htm

162 BIBLIOGRAPHY

[106] B. R. Haverkort, H. Hermanns, and J.-P. Katoen. On the use of modelchecking techniques for dependability evaluation. In Proc. Symp. on Re-

liable Distributed Syst. (SRDS), pages 228–237. IEEE Computer Society,2000.

[107] R. A. Hayden and J. T. Bradley. A fluid analysis framework for a markovianprocess algebra. Theor. Comput. Sci., 411(22-24):2260–2297, 2010.

[108] F. Heidarian, J. Schmaltz, and F. Vaandrager. Analysis of a clock-synchronization protocol for wireless sensor networks. In Proc. Symp. on

Formal Methods (FM), volume 5850 of LNCS, pages 516–531. Springer,2009.

[109] T. Herault, R. Lassaigne, F. Magniette, and S. Peyronnet. Approximateprobabilistic model checking. In Proc. 5th Int. Conf. on Verification, Model

Checking and Abstract Interpretation (VMCAI’04), volume 2937 of LNCS,pages 73–84. Springer, 2004.

[110] J. Hillston. Fluid flow approximation of PEPA models. In Proc. Conf. onQuantitative Evaluation of SysTems (QEST), pages 33–43. IEEE, 2005.

[111] P. Hilton, D. Holton, and J. Pedersen. Mathematical Reflections. Springer-Verlag, New York, 1997.

[112] A. Hinton, M. Kwiatkowska, G. Norman, and D. Parker. PRISM: A tool forautomatic verification of probabilistic systems. In Proc. Conf. on Tools and

Algorithms for the Construction and Analysis of Systems (TACAS), volume3920 of LNCS, pages 441–444. Springer, 2006.

[113] A. Hinton, M. Z. Kwiatkowska, G. Norman, and D. Parker. PRISM: A toolfor automatic verification of probabilistic systems. In Proc. Conf. on Tools

and Algorithms for the Construction and Analysis of Systems (TACAS), vol-ume 3920 of LNCS, pages 441–444. Springer, 2006.

[114] G. Holzmann. The Spin Model Checker, Primer and Reference Manual.Addison-Wesley, 2003.

[115] J. Hromkovic, R. Klasing, B. Monien, and R. Peine. Dissemination of in-formation in interconnection networks (Broadcasting & Gossiping). Com-

binatorial Network Theory, pages 125–212, 1996.

[116] M. Huth and M. Ryan. Logic in Computer Science: Modelling and reasoning

about systems. Cambridge University Press, Cambridge, UK, 2004.

[117] IEEE 802.11 WG. Part 11: Wireless LAN Medium Access Control (MAC) and

Physical Layer (PHY) specification. IEEE, 1999.

BIBLIOGRAPHY 163

[118] K. Iwanicki. Gossip-based dissemination of time. Master’s thesis, WarsawUniversity and Vrije Universiteit Amsterdam, 2005.

[119] K. Iwanicki, M. van Steen, and S. Voulgaris. Gossip-based clock syn-chronization for large decentralized systems. In Proc. Workshop on Self-

Managed Networks, Systems and Services, volume 3996 of LNCS, pages28–42. Springer, 2006.

[120] M. Jelasity, W. Kowalczyk, and M. van Steen. Newscast computing. Tech-nical Report IR-CS-006, Vrije Universiteit Amsterdam, 2003.

[121] M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation inlarge dynamic networks. ACM Transactions on Comput. Syst., 23(3):219–252, 2005.

[122] M. Jelasity, A. Montresor, G. P. Jesi, and S. Voulgaris. PeerSim: A peer-to-peer simulator. http://peersim.sourceforge.net/.

[123] M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. van Steen.Gossip-based peer sampling. ACM Transactions on Comput. Syst., 25(3):8,2007.

[124] W. Kang, F. Kelly, N. Lee, and R. Williams. Fluid and Brownian approxima-tions for an internet congestion control model. In Proc. Conf. on Decision

and Control, volume 4, pages 3938–3943, 2004.

[125] R. Karp, C. Schindelhauer, S. Shenker, and B. Vocking. Randomized rumorspreading. In Proc. IEEE Symp. on Foundations of Computer Science (FOCS),pages 565–574. IEEE Computer Society, 2000.

[126] F. Karpelevich, E. Pechersky, and Y. Suhov. Dobrushin’s approach to queue-ing network theory. J. of Applied Mathematics and Stochastic Analysis,9(4):373–397, 1996.

[127] J.-P. Katoen, T. Kemna, I. Zapreev, and D. Jansen. Bisimulation minimi-sation mostly speeds up probabilistic model checking. In Proc. Conf. on

Tools and Algorithms for the Construction and Analysis of Systems (TACAS),volume 4424 of LNCS, pages 76–92. Springer, 2007.

[128] J.-P. Katoen, D. Klink, M. Leucker, and V. Wolf. Three-valued abstractionfor continuous-time Markov chains. In Proc. Conf. on Computer Aided Ver-

ification (CAV), volume 4590 of LNCS, pages 311–324. Springer, 2007.

[129] D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggre-gate information. In Proc. IEEE Symp. on Foundations of Computer Science

(FOCS), pages 482–491. IEEE Computer Society, 2003.

164 BIBLIOGRAPHY

[130] D. Kempe, J. Kleinberg, and A. Demers. Spatial gossip and resource lo-cation protocols. In Proc. ACM Symp. on Theory of Computing (STOC),pages 163–172. ACM Press, 2001.

[131] D. Kempe and J. M. Kleinberg. Protocols and impossibility results forgossip-based communication mechanisms. In Proc. IEEE Symp. on Founda-

tions of Computer Science (FOCS), pages 471–480. IEEE Computer Society,2002.

[132] A.-M. Kermarrec and M. van Steen. Gossiping in distributed systems. ACM

SIGOPS Oper. Syst. Rev., 41(5):2–7, 2007.

[133] S. Y. Ko, I. Gupta, and Y. Jo. A new class of nature-inspired algorithmsfor self-adaptive peer-to-peer computing. ACM Trans. Auton. Adapt. Syst.,3(3):1–34, 2008.

[134] S. Kumar and L. Massoulie. Integrating streaming and file-transfer inter-net traffic: fluid and diffusion approximations. Queueing Syst. Theory &

Appl., 55(4):195–205, 2007.

[135] T. G. Kurtz. The relationship between stochastic and deterministic modelsfor chemical reactions. J. Chem. Phys., 57:2976–2978, 1972.

[136] T. G. Kurtz. Approximation of population processes, volume 36 of CBMS-

NSF Regional Conference Series in Applied Mathematics. SIAM, 1981.

[137] M. Kwiatkowska, G. Norman, and D. Parker. Game-based abstraction formarkov decision processes. In Proc. Conf. on Quantitative Evaluation of

SysTems (QEST), pages 157–166. IEEE Computer Society, 2006.

[138] M. Kwiatkowska, G. Norman, and D. Parker. Analysis of a gossip protocolin PRISM. ACM SIGMETRICS Performance Evaluation Review, 36(3):17–22,2008.

[139] S. Laplante, R. Lassaigne, F. Magniez, S. Peyronnet, and M. de Rouge-mont. Probabilistic abstraction for model checking: An approach basedon property testing. In Proc. IEEE Symp. on Logic in Comput. Sci. (LICS

2002), pages 30–39. IEEE Computer Society, 2002.

[140] M.-J. Lin and K. Marzullo. Directional gossip: Gossip in a wide areanetwork. In Proc. European Dependable Computing Conf. (EDCC), volume1667 of LNCS, pages 364–379. Springer, 1999.

[141] J. Luo, P. T. Eugster, and J.-P. Hubaux. Route driven gossip: Probabilisticreliable multicast in ad hoc networks. In Proc. Conf. of Computer and

Communications Societies (INFOCOM), volume 3, pages 2229–2239. IEEEComputer Society, 2003.

BIBLIOGRAPHY 165

[142] A. Martinoli, K. Easton, and W. Agassounon. Modeling Swarm RoboticSystems: A Case Study in Collaborative Distributed Manipulation. Int.

Journal of Robotics Research, 23(4):415–436, 2004.

[143] S. McCanne and S. Floyd. ns Network Simulator. Available athttp://www.isi.edu/nsnam/ns/.

[144] A. McIver and A. Fehnker. Formal techniques for the analysis of wirelessnetworks. In Proc. Symp. on Leveraging Applications of Formal Methods,Verification and Validation (ISoLA), 2006.

[145] G. J. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons,2000.

[146] T. Meyer and C. Tschudin. Chemical networking protocols. In Proc. ACM

Workshop on Hot Topics in Networks (HotNets), 2009.

[147] M. Mitzenmacher. The power of two choices in randomized load balanc-ing. IEEE Transactions Parallel Distrib. Syst., 12(10):1094–1104, 2001.

[148] D. Mollison, editor. Epidemic Models: Their Structure and Relation to Data.Cambridge University Press, 1995.

[149] D. Mosk-Aoyama and D. Shah. Computing separable functions via gossip.In Proc. Symp. on Principles of Distributed Computing (PODC), pages 113–122. ACM Press, 2006.

[150] J. Murray. Mathematical Biology, volume I: An Introduction. Springer-Verlag, 2003.

[151] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Proof Assistant

for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002.

[152] G. Norman, D. Parker, M. Z. Kwiatkowska, S. K. Shukla, and R. Gupta. Us-ing probabilistic model checking for dynamic power management. Formal

Aspects of Computing, 17(2):160–176, 2005.

[153] OPNET Technologies. OPNET modeler. http://www.opnet.com/.

[154] M. Opper and D. Saad, editors. Advanced Mean Field Methods: Theory and

Practice. MIT Press, 2001.

[155] S. Owre, S. Rajan, J. M. Rushby, N. Shankar, and M. K. Srivas. PVS:Combining specification, proof checking, and model checking. In Proc.

Conf. on Computer Aided Verification (CAV), volume 1102 of LNCS, pages411–414. Springer, 1996.

http://www.isi.edu/nsnam/ns/

http://www.opnet.com/

166 BIBLIOGRAPHY

[156] O. Ozkasap, M. Caglar, E. Cem, E. Ahi, and E. Iskender. Clock synchroniza-tion for wireless sensor networks: a survey. Ad Hoc Networks, 3(3):281–323, 2005.

[157] M. Petkovsek, H. S. Wilf, and D. Zeilberger. A=B. AK Peters, 1996.

[158] J. Pouwelse, P. Garbacki, J. Wang, A. Bakker, J. Yang, A. Iosup, D. Epema,M.Reinders, M. van Steen, and H. Sips. Tribler: A social-based peer-to-peer system. Concurrency and Computation: Practice and Experience, 19:1–11, 2007.

[159] Programming language Scala. http://www.scala-lang.org/.

[160] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic

Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.

[161] D. Randall. Rapidly mixing Markov chains with applications in computerscience and physics. Computing in Sci. and Eng., 8(2):30–41, 2006.

[162] A. Remke, B. R. Haverkort, and L. Cloth. Model checking infinite-stateMarkov chains. In Proc. Conf. on Tools and Algorithms for the Construction

and Analysis of Systems (TACAS), volume 3440 of LNCS, pages 237–252.Springer, 2005.

[163] R.Graham, D. Knuth, and O. Potashnik. Concrete Mathematics. Addison-Wesley, 2nd edition, 1994.

[164] L. Sattenspiel. The geographic spread of infectious diseases. Princeton Se-ries in Theoretical and Computational Biology. Princeton University Press,2009.

[165] K. Sen, M. Viswanathan, and G. Agha. Statistical model checking of black-box probabilistic systems. In Proc. Conf. on Computer Aided Verification

(CAV), volume 3114 of LNCS, pages 202–215. Springer, 2004.

[166] H. G. Solari and M. A. Natiello. Poisson approximation to density depen-dent stochastic processes: A numerical implementation and test. In Proc.

Workshop on Dynamical Systems from Number Theory to Probability-II, vol-ume 6 of Math. Modelling, pages 79–94. Vaxjo Univ. Press, 2003.

[167] A. Stavrou, D. Rubenstein, and S. Sahu. A lightweight, robust P2P systemto handle flash crowds. In Proc. Conf. on Network Protocols (ICNP), pages226–235. IEEE Computer Society, 2002.

[168] A. Stefanek, R. Hayden, and J. T. Bradley. A new tool for the perfor-mance analysis of massively parallel computer systems. In Proc. Workshop

on Quantitative Aspects of Programming Languages (QAPL), volume 28 ofEPTCS, pages 159–181. Elsevier, 2010.

BIBLIOGRAPHY 167

[169] I. Stojanovic, M. Sharif, and D. Starobinski. Data dissemination in wirelessbroadcast channels: Network coding or cooperation. In Proc. Conf. on

Information Sciences and Systems (CISS), pages 265–270. IEEE, 2007.

[170] A. Tanenbaum and M. van Steen. Distributed Systems: Principles and

Paradigms. Prentice Hall, 2nd edition, 2007.

[171] P. Tinnakornsrisuphap and A. Makowski. Limit behavior of ECN/REDgateways under a large number of TCP flows. In Proc. Conf. of Com-puter and Communications Societies (INFOCOM), volume 2, pages 873–883. IEEE, 2003.

[172] M. Umlauft and P. Reichl. Experiences with the ns-2 network simulator -explicitly setting seeds considered harmful. In Proc. of Wireless Telecom-munications Symposium (WTS), pages 1–5, 2007.

[173] R. van Renesse, K. P. Birman, and W. Vogels. Astrolabe: A robust andscalable technology for distributed system monitoring, management, anddata mining. ACM Transactions on Comput. Syst., 21(2):164–206, 2003.

[174] R. van Renesse, Y. Minsky, and M. Hayden. A gossip-style failure detectionservice. In Proc. Conf. on Distributed Syst. Platforms and Open Distributed

Processing (Middleware), pages 55–70. Springer, 1998.

[175] W. Vogels, R. van Renesse, and K. Birman. The power of epidemics: ro-bust communication for large-scale distributed systems. ACM SIGCOMM

Comput. Commun. Review, 33(1):131–135, 2003.

[176] S. Voulgaris. Epidemic-Based Self-Organization in Peer-to-Peer Systems.Doctoral thesis, Vrije Universiteit Amsterdam, 2006.

[177] S. Voulgaris, D. Gavidia, and M. van Steen. Cyclon: Inexpensive member-ship management for unstructured P2P overlays. Journal of Network and

Systems Management, 13(2):197–217, 2005.

[178] N. Vvedenskaya and Y. Suhov. Dobrushin’s mean-field approximation fora queue with dynamic routing. Markov Proc. Rel. Fields, 3(4):493–526,1997.

[179] S. Weber, V. Veeraraghavan, A. Kini, and N. Singhal. Analysis of gossipperformance with copulas. In Proc. Conf. on Information Sciences and Sys-

tems (CISS), pages 1212–1217, 2006.

[180] H. Younes, M. Kwiatkowska, G. Norman, and D. Parker. Numerical vs. sta-tistical probabilistic model checking. Journal on Software Tools for Tech-

nology Transfer (STTT), 8(3):216–228, 2006.

168 BIBLIOGRAPHY

[181] H. L. S. Younes and R. G. Simmons. Probabilistic verification of discreteevent systems using acceptance sampling. In Proc. Conf. on Computer Aided

Verification (CAV), volume 2404 of LNCS, pages 223–235. Springer, 2002.

[182] X. Zeng, R. Bagrodia, and M. Gerla. GloMoSim. Available athttp://pcl.cs.ucla.edu/projects/glomosim/.

[183] X. Zeng, R. Bagrodia, and M. Gerla. GloMoSim: a library for parallelsimulation of large-scale wireless networks. In Proc. Workshop on Paralleland Distributed Simulation (PADS), pages 154–161. IEEE, 1998.

[184] Q. Zhang and D. Agrawal. Dynamic probabilistic broadcasting in MANETs.J. of Parallel and Distributed Computing, 65(2):220–233, 2005.

[185] X. Zhang, G. Neglia, J. Kurose, and D. Towsley. Performance modeling ofepidemic routing. Computer Networks, 51(10):2867–2891, 2007.

http://pcl.cs.ucla.edu/projects/glomosim/

Titles in the IPA Dissertation Series since 2006

E. Dolstra. The Purely Functional Soft-

ware Deployment Model. Faculty ofScience, UU. 2006-01

R.J. Corin. Analysis Models for Secu-

rity Protocols. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2006-02

P.R.A. Verbaan. The Computational

Complexity of Evolving Systems. Fac-ulty of Science, UU. 2006-03

K.L. Man and R.R.H. Schiffelers.Formal Specification and Analysis of

Hybrid Systems. Faculty of Mathemat-ics and Computer Science and Fac-ulty of Mechanical Engineering, TU/e.2006-04

M. Kyas. Verifying OCL Specifications

of UML Models: Tool Support and Com-

positionality. Faculty of Mathematicsand Natural Sciences, UL. 2006-05

M. Hendriks. Model Checking Timed

Automata - Techniques and Applica-

tions. Faculty of Science, Mathematicsand Computer Science, RU. 2006-06

J. Ketema. Bohm-Like Trees for

Rewriting. Faculty of Sciences, De-partment of Computer Science, VUA.2006-07

C.-B. Breunesse. On JML: topics in

tool-assisted verification of JML pro-

grams. Faculty of Science, Mathemat-ics and Computer Science, RU. 2006-08

B. Markvoort. Towards Hybrid Molec-

ular Simulations. Faculty of Biomedi-cal Engineering, TU/e. 2006-09

S.G.R. Nijssen. Mining Structured

Data. Faculty of Mathematics andNatural Sciences, UL. 2006-10

G. Russello. Separation and Adap-

tation of Concerns in a Shared Data

Space. Faculty of Mathematics andComputer Science, TU/e. 2006-11

L. Cheung. Reconciling Nondetermin-

istic and Probabilistic Choices. Facultyof Science, Mathematics and Com-puter Science, RU. 2006-12

B. Badban. Verification techniques forExtensions of Equality Logic. Facultyof Sciences, Department of ComputerScience, VUA. 2006-13

A.J. Mooij. Constructive formal meth-

ods and protocol standardization. Fac-ulty of Mathematics and ComputerScience, TU/e. 2006-14

T. Krilavicius. Hybrid Techniques for

Hybrid Systems. Faculty of Electri-cal Engineering, Mathematics & Com-puter Science, UT. 2006-15

M.E. Warnier. Language Based Secu-

rity for Java and JML. Faculty of Sci-ence, Mathematics and Computer Sci-ence, RU. 2006-16

V. Sundramoorthy. At Home In Ser-

vice Discovery. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2006-17

B. Gebremichael. Expressivity of

Timed Automata Models. Faculty ofScience, Mathematics and ComputerScience, RU. 2006-18

L.C.M. van Gool. Formalising Inter-

face Specifications. Faculty of Mathe-matics and Computer Science, TU/e.2006-19

C.J.F. Cremers. Scyther - Semantics

and Verification of Security Protocols.

Faculty of Mathematics and ComputerScience, TU/e. 2006-20

J.V. Guillen Scholten. Mobile Chan-

nels for Exogenous Coordination of Dis-

tributed Systems: Semantics, Imple-

mentation and Composition. Facultyof Mathematics and Natural Sciences,UL. 2006-21

H.A. de Jong. Flexible Heterogeneous

Software Systems. Faculty of NaturalSciences, Mathematics, and ComputerScience, UvA. 2007-01

N.K. Kavaldjiev. A run-time reconfig-

urable Network-on-Chip for streaming

DSP applications. Faculty of Electri-cal Engineering, Mathematics & Com-puter Science, UT. 2007-02

M. van Veelen. Considerations on

Modeling for Early Detection of Ab-

normalities in Locally Autonomous Dis-

tributed Systems. Faculty of Mathe-matics and Computing Sciences, RUG.2007-03

T.D. Vu. Semantics and Applications of

Process and Program Algebra. Facultyof Natural Sciences, Mathematics, andComputer Science, UvA. 2007-04

L. Brandan Briones. Theories for

Model-based Testing: Real-time and

Coverage. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2007-05

I. Loeb. Natural Deduction: Sharing

by Presentation. Faculty of Science,Mathematics and Computer Science,RU. 2007-06

M.W.A. Streppel. Multifunctional Ge-

ometric Data Structures. Faculty ofMathematics and Computer Science,TU/e. 2007-07

N. Trcka. Silent Steps in Transition

Systems and Markov Chains. Faculty ofMathematics and Computer Science,TU/e. 2007-08

R. Brinkman. Searching in encrypted

data. Faculty of Electrical Engineer-ing, Mathematics & Computer Sci-ence, UT. 2007-09

A. van Weelden. Putting types to good

use. Faculty of Science, Mathematicsand Computer Science, RU. 2007-10

J.A.R. Noppen. Imperfect Informa-

tion in Software Development Pro-

cesses. Faculty of Electrical Engineer-ing, Mathematics & Computer Sci-ence, UT. 2007-11

R. Boumen. Integration and Test

plans for Complex Manufacturing Sys-

tems. Faculty of Mechanical Engineer-ing, TU/e. 2007-12

A.J. Wijs. What to do Next?: Analysingand Optimising System Behaviour in

Time. Faculty of Sciences, Departmentof Computer Science, VUA. 2007-13

C.F.J. Lange. Assessing and Improv-

ing the Quality of Modeling: A Series

of Empirical Studies about the UML.Faculty of Mathematics and ComputerScience, TU/e. 2007-14

T. van der Storm. Component-based Configuration, Integration and

Delivery. Faculty of Natural Sci-ences, Mathematics, and ComputerScience,UvA. 2007-15

B.S. Graaf. Model-Driven Evolution of

Software Architectures. Faculty of Elec-trical Engineering, Mathematics, andComputer Science, TUD. 2007-16

A.H.J. Mathijssen. Logical Calculi for

Reasoning with Binding. Faculty of

Mathematics and Computer Science,TU/e. 2007-17

D. Jarnikov. QoS framework for Video

Streaming in Home Networks. Facultyof Mathematics and Computer Sci-ence, TU/e. 2007-18

M. A. Abam. New Data Structures and

Algorithms for Mobile Data. Faculty ofMathematics and Computer Science,TU/e. 2007-19

W. Pieters. La Volonte Machinale: Un-

derstanding the Electronic Voting Con-

troversy. Faculty of Science, Math-ematics and Computer Science, RU.2008-01

A.L. de Groot. Practical Automa-

ton Proofs in PVS. Faculty of Science,Mathematics and Computer Science,RU. 2008-02

M. Bruntink. Renovation of Id-

iomatic Crosscutting Concerns in Em-bedded Systems. Faculty of ElectricalEngineering, Mathematics, and Com-puter Science, TUD. 2008-03

A.M. Marin. An Integrated System

to Manage Crosscutting Concerns in

Source Code. Faculty of ElectricalEngineering, Mathematics, and Com-puter Science, TUD. 2008-04

N.C.W.M. Braspenning. Model-basedIntegration and Testing of High-tech

Multi-disciplinary Systems. Faculty ofMechanical Engineering, TU/e. 2008-05

M. Bravenboer. Exercises in Free Syn-

tax: Syntax Definition, Parsing, and

Assimilation of Language Conglomer-

ates. Faculty of Science, UU. 2008-06

M. Torabi Dashti. Keeping Fairness

Alive: Design and Formal Verification

of Optimistic Fair Exchange Protocols.Faculty of Sciences, Department ofComputer Science, VUA. 2008-07

I.S.M. de Jong. Integration and Test

Strategies for Complex Manufacturing

Machines. Faculty of Mechanical Engi-neering, TU/e. 2008-08

I. Hasuo. Tracing Anonymity with

Coalgebras. Faculty of Science, Math-ematics and Computer Science, RU.2008-09

L.G.W.A. Cleophas. Tree Algorithms:

Two Taxonomies and a Toolkit. Fac-ulty of Mathematics and ComputerScience, TU/e. 2008-10

I.S. Zapreev. Model Checking Markov

Chains: Techniques and Tools. Facultyof Electrical Engineering, Mathemat-ics & Computer Science, UT. 2008-11

M. Farshi. A Theoretical and Exper-imental Study of Geometric Networks.Faculty of Mathematics and ComputerScience, TU/e. 2008-12

G. Gulesir. Evolvable Behavior Speci-

fications Using Context-Sensitive Wild-

cards. Faculty of Electrical Engineer-ing, Mathematics & Computer Sci-ence, UT. 2008-13

F.D. Garcia. Formal and Computa-

tional Cryptography: Protocols, Hashes

and Commitments. Faculty of Science,Mathematics and Computer Science,RU. 2008-14

P. E. A. Durr. Resource-based Verifi-

cation for Robust Composition of As-

pects. Faculty of Electrical Engineer-ing, Mathematics & Computer Sci-ence, UT. 2008-15

E.M. Bortnik. Formal Methods in Sup-

port of SMC Design. Faculty of Me-chanical Engineering, TU/e. 2008-16

R.H. Mak. Design and Performance

Analysis of Data-Independent Stream

Processing Systems. Faculty of Math-ematics and Computer Science, TU/e.2008-17

M. van der Horst. Scalable Block Pro-

cessing Algorithms. Faculty of Mathe-matics and Computer Science, TU/e.2008-18

C.M. Gray. Algorithms for Fat Ob-

jects: Decompositions and Applications.Faculty of Mathematics and ComputerScience, TU/e. 2008-19

J.R. Calame. Testing Reactive Systems

with Data - Enumerative Methods and

Constraint Solving. Faculty of Electri-cal Engineering, Mathematics & Com-puter Science, UT. 2008-20

E. Mumford. Drawing Graphs for

Cartographic Applications. Faculty ofMathematics and Computer Science,TU/e. 2008-21

E.H. de Graaf. Mining Semi-

structured Data, Theoretical and Exper-

imental Aspects of Pattern Evaluation.Faculty of Mathematics and NaturalSciences, UL. 2008-22

R. Brijder. Models of Natural Compu-

tation: Gene Assembly and MembraneSystems. Faculty of Mathematics andNatural Sciences, UL. 2008-23

A. Koprowski. Termination of Rewrit-

ing and Its Certification. Faculty ofMathematics and Computer Science,TU/e. 2008-24

U. Khadim. Process Algebras for Hy-

brid Systems: Comparison and Devel-

opment. Faculty of Mathematics andComputer Science, TU/e. 2008-25

J. Markovski. Real and Stochastic

Time in Process Algebras for Perfor-

mance Evaluation. Faculty of Mathe-matics and Computer Science, TU/e.2008-26

H. Kastenberg. Graph-Based Software

Specification and Verification. Facultyof Electrical Engineering, Mathemat-ics & Computer Science, UT. 2008-27

I.R. Buhan. Cryptographic Keys

from Noisy Data Theory and Applica-

tions. Faculty of Electrical Engineer-ing, Mathematics & Computer Sci-ence, UT. 2008-28

R.S. Marin-Perianu. Wireless SensorNetworks in Motion: Clustering Algo-

rithms for Service Discovery and Pro-

visioning. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2008-29

M.H.G. Verhoef. Modeling and Vali-

dating Distributed Embedded Real-Time

Control Systems. Faculty of Science,Mathematics and Computer Science,RU. 2009-01

M. de Mol. Reasoning about Func-

tional Programs: Sparkle, a proof as-

sistant for Clean. Faculty of Science,Mathematics and Computer Science,RU. 2009-02

M. Lormans. Managing Requirements

Evolution. Faculty of Electrical Engi-neering, Mathematics, and ComputerScience, TUD. 2009-03

M.P.W.J. van Osch. Automated

Model-based Testing of Hybrid Systems.Faculty of Mathematics and ComputerScience, TU/e. 2009-04

H. Sozer. Architecting Fault-Tolerant

Software Systems. Faculty of Electri-

cal Engineering, Mathematics & Com-puter Science, UT. 2009-05

M.J. van Weerdenburg. Efficient

Rewriting Techniques. Faculty of Math-ematics and Computer Science, TU/e.2009-06

H.H. Hansen. Coalgebraic Modelling:

Applications in Automata Theory and

Modal Logic. Faculty of Sciences, De-partment of Computer Science, VUA.2009-07

A. Mesbah. Analysis and Testing of

Ajax-based Single-page Web Applica-

tions. Faculty of Electrical Engineer-ing, Mathematics, and Computer Sci-ence, TUD. 2009-08

A.L. Rodriguez Yakushev. Towards

Getting Generic Programming Ready

for Prime Time. Faculty of Science,UU. 2009-9

K.R. Olmos Joffre. Strategies for Con-text Sensitive Program Transformation.Faculty of Science, UU. 2009-10

J.A.G.M. van den Berg. Reasoning

about Java programs in PVS using JML.Faculty of Science, Mathematics andComputer Science, RU. 2009-11

M.G. Khatib. MEMS-Based Stor-

age Devices. Integration in Energy-

Constrained Mobile Systems. Faculty ofElectrical Engineering, Mathematics &Computer Science, UT. 2009-12

S.G.M. Cornelissen. Evaluating Dy-

namic Analysis Techniques for Program

Comprehension. Faculty of ElectricalEngineering, Mathematics, and Com-puter Science, TUD. 2009-13

D. Bolzoni. Revisiting Anomaly-

based Network Intrusion Detection Sys-

tems. Faculty of Electrical Engineer-

ing, Mathematics & Computer Sci-ence, UT. 2009-14

H.L. Jonker. Security Matters: Pri-

vacy in Voting and Fairness in Digital

Exchange. Faculty of Mathematics andComputer Science, TU/e. 2009-15

M.R. Czenko. TuLiP - Reshaping Trust

Management. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2009-16

T. Chen. Clocks, Dice and Processes.Faculty of Sciences, Department ofComputer Science, VUA. 2009-17

C. Kaliszyk. Correctness and Avail-

ability: Building Computer Algebra on

top of Proof Assistants and making

Proof Assistants available over the Web.Faculty of Science, Mathematics andComputer Science, RU. 2009-18

R.S.S. O’Connor. Incompleteness &

Completeness: Formalizing Logic andAnalysis in Type Theory. Faculty of Sci-ence, Mathematics and Computer Sci-ence, RU. 2009-19

B. Ploeger. Improved Verification

Methods for Concurrent Systems. Fac-ulty of Mathematics and ComputerScience, TU/e. 2009-20

T. Han. Diagnosis, Synthesis and Anal-

ysis of Probabilistic Models. Faculty ofElectrical Engineering, Mathematics &Computer Science, UT. 2009-21

R. Li. Mixed-Integer Evolution Strate-

gies for Parameter Optimization and

Their Applications to Medical Image

Analysis. Faculty of Mathematics andNatural Sciences, UL. 2009-22

J.H.P. Kwisthout. The Computational

Complexity of Probabilistic Networks.Faculty of Science, UU. 2009-23

T.K. Cocx. Algorithmic Tools for Data-

Oriented Law Enforcement. Facultyof Mathematics and Natural Sciences,UL. 2009-24

A.I. Baars. Embedded Compilers. Fac-ulty of Science, UU. 2009-25

M.A.C. Dekker. Flexible Access Con-

trol for Dynamic Collaborative Envi-

ronments. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2009-26

J.F.J. Laros. Metrics and Visualisation

for Crime Analysis and Genomics. Fac-ulty of Mathematics and Natural Sci-ences, UL. 2009-27

C.J. Boogerd. Focusing Automatic

Code Inspections. Faculty of ElectricalEngineering, Mathematics, and Com-puter Science, TUD. 2010-01

M.R. Neuhaußer. Model Checking

Nondeterministic and Randomly Timed

Systems. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2010-02

J. Endrullis. Termination and Pro-

ductivity. Faculty of Sciences, De-partment of Computer Science, VUA.2010-03

T. Staijen. Graph-Based Specification

and Verification for Aspect-Oriented

Languages. Faculty of Electrical En-gineering, Mathematics & ComputerScience, UT. 2010-04

Y. Wang. Epistemic Modelling and Pro-

tocol Dynamics. Faculty of Science,UvA. 2010-05

J.K. Berendsen. Abstraction, Prices

and Probability in Model Checking

Timed Automata. Faculty of Science,Mathematics and Computer Science,RU. 2010-06

A. Nugroho. The Effects of UML Mod-

eling on the Quality of Software. Fac-ulty of Mathematics and Natural Sci-ences, UL. 2010-07

A. Silva. Kleene Coalgebra. Faculty ofScience, Mathematics and ComputerScience, RU. 2010-08

J.S. de Bruin. Service-Oriented Discov-

ery of Knowledge - Foundations, Imple-

mentations and Applications. Facultyof Mathematics and Natural Sciences,UL. 2010-09

D. Costa. Formal Models for Com-

ponent Connectors. Faculty of Sci-ences, Department of Computer Sci-ence, VUA. 2010-10

M.M. Jaghoori. Time at Your Ser-

vice: Schedulability Analysis of Real-

Time and Distributed Services. Facultyof Mathematics and Natural Sciences,UL. 2010-11

R. Bakhshi. Gossiping Models: Formal

Analysis of Epidemic Protocols. Facultyof Sciences, Department of ComputerScience, VUA. 2011-01

Gossiping Models - Vrije Universiteit Amsterdam...Gossiping Models Formal Analysis of Epidemic...

Documents

Transcript of Gossiping Models - Vrije Universiteit Amsterdam...Gossiping Models Formal Analysis of Epidemic...