Distributed Algorithms EpidemicProtocols:...

Distributed AlgorithmsEpidemic Protocols: Beyond dissemination

Alberto Montresor

Università di Trento

2018/10/28

Acknowledgments: S. Voulgaris

This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

references

M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen.The peer sampling service: Experimental evaluation ofunstructured gossip-based implementations.In Proc. of Middleware 2004, volume 3231 of LNCS, pages 79–98,Toronto, Canada, 2004. Springer.http://www.disi.unitn.it/~montreso/ds/papers/Newscast.pdf.

M. Jelasity, A. Montresor, and O. Babaoglu.Gossip-based aggregation in large dynamic networks.ACM Trans. Comput. Syst., 23(1):219–252, Aug. 2005.http://www.cs.unibo.it/bison/publications/aggregation-tocs.pdf.

M. Jelasity, A. Montresor, and O. Babaoglu.T-Man: Gossip-based fast overlay topology construction.Elsevier Computer Networks, 2009.http://dit.unitn.it/~montreso/pubs/papers/comnet09.pdf.

M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, andM. van Steen.Gossip-based peer sampling.ACM Trans. Comput. Syst., 25(3), 2007.http://www.disi.unitn.it/~montreso/ds/papers/PeerSampling.pdf.

G. P. Jesi, D. Hales, and M. van Steen.Identifying malicious peers before it’s too late: A decentralizedsecure peer sampling service.In Proc. of the First International Conference on Self-Adaptive andSelf-Organizing Systems (SASO’07), pages 237–246, Boston, MA,2007. IEEE Computer Society.http://www.cs.vu.nl/~steen/papers/2007a.saso.pdf.

J. Sacha and A. Montresor.Identifying frequent items in distributed data sets.Computing, 95(4):289–307, 2013.http://www.disi.unitn.it/~montreso/pubs/papers/computing13.pdf.

S. Voulgaris, D. Gavidia, and M. van Steen.Cyclon: Inexpensive membership management for unstructured p2poverlays.J. Network Syst. Manage., 13(2), 2005.ftp://ftp.cs.vu.nl/pub/steen/papers/jnsm.05.pdf.

http://creativecommons.org/licenses/by-sa/4.0/

http://creativecommons.org/licenses/by-sa/4.0/

http://www.disi.unitn.it/~montreso/ds/papers/Newscast.pdf


http://www.cs.unibo.it/bison/publications/aggregation-tocs.pdf


http://dit.unitn.it/~montreso/pubs/papers/comnet09.pdf

http://www.disi.unitn.it/~montreso/ds/papers/PeerSampling.pdf


http://www.cs.vu.nl/~steen/papers/2007a.saso.pdf

http://www.disi.unitn.it/~montreso/pubs/papers/computing13.pdf


ftp://ftp.cs.vu.nl/pub/steen/papers/jnsm.05.pdf

Table of contents

1 Peer SamplingIntroductionnewscastCyclonSecurity

2 Topology constructionT-ManEvaluationT-Chord

3 AggregationAggregation protocolsAnalytical resultsEvaluationMost frequent items

Peer Sampling Introduction

System model

A dynamic collection of distributed nodes that want to participatein a common epidemic protocol

Node may join / leaveNode may crash at any time

Communication:To communicate with node q, node p must know its addressPotentially high levels of message omissions

Alberto Montresor (UniTN) DS - Epidemics 2 2018/10/28 1 / 88


Motivation

In the original paper

Nodes have full view of thenetworkEach node periodically“gossips” with a random node,out of the whole set

Why it was reasonable

(Almost) static networkRelatively small size

Currently

Nodes have a partial view ofthe networkThe partial view is dynamic,reflecting nodes joining/leavingEach node periodically gossipswith a random node, out of itspartial view

What you get

Unlimited scalabilityCapable to deal with churn



Service specification

Peer sampling service

Input: the distributed collection of nodesOutput: a method getPeer() that returns a peer as a result ofan independent uniform random sampling among thecollection

The idea

Nodes gossip with their neighbors about... other neighbors!Old nodes are removed / new nodes are insertedRandom shuffling of views



Service architecture

Each node has a view containing C neighborsEach node periodically contacts a neighborThey exchange their views

The neighbor descriptor of node p contains

The address needed to communicate with pTimestamp information about the age of the descriptorAdditional information that may be needed by upper layers



A generic algorithm

Generic protocol executed by p:upon initialization do

view ← descriptor(s) of nodes already in the system

repeat every ∆ time unitsProcess q ← selectNeighbor(view) % Select a random neighborm← prepareRequest(view , q)send 〈request,m, p〉 to q

upon receive 〈request,m, q〉 dom′ ← prepareReply(view , q)send 〈reply,m′, p〉 to qview ← merge(view ,m, q)

upon receive 〈reply,m, q〉 doview ← merge(view ,m, q)


A generic algorithm

Generic protocol executed by p:upon initialization do

view ← descriptor(s) of nodes already in the system

repeat every ∆ time unitsProcess q ← selectNeighbor(view) % Select a random neighborm← prepareRequest(view , q)send 〈request,m, p〉 to q



2019-09-16

DS - Epidemics 2Peer Sampling

IntroductionA generic algorithm

• selectNeighbor(): select one of the neighbor contained in the view

• prepareRequest(view , q), prepareReply(view , q): returns a subset of thedescriptors contained in the local view; may add other descriptors(e.g. its own)

• merge(view ,m, q): returns a subset of the descriptors contained in theunion of the local view and the received view

Peer Sampling newscast

newscast

selectNeighbor()Select one node at random from the local view

prepareRequest(view , q), prepareReply(view , q)Returns the entire view + a local descriptor with a fresh timestamp

merge(view ,m, q)Returns the C freshest descriptors (w.r.t. timestamp) from theunion of local view and message

Bibliography

M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sam-pling service: Experimental evaluation of unstructured gossip-based implemen-tations.In Proc. of Middleware 2004, volume 3231 of LNCS, pages 79–98, Toronto,Canada, 2004. Springer.http://www.disi.unitn.it/~montreso/ds/papers/Newscast.pdf




newscast



newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014



newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014

1. Pick random peer from my view



newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014

1. Pick random peer from my view2. Send each other view + own fresh link



newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014

91216

20




newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014

91216

20


20

71014



newscast

ID &Address

Timestamp

91216

ID &Address

Timestamp

71014

91216

20


3. Keep c freshest links (remove own info, duplicates)

20

71014



newscast

ID &Address

Timestamp

141620

ID &Address

Timestamp

141620


3. Keep c freshest link (remove own info, duplicates)


Peer Sampling Cyclon

Cyclon

selectNeighbor():Select the oldest descriptor in the view

prepareRequest(view , q):Remove t− 1 random neighbors from the local viewReturn the t− 1 descriptors plus a fresh local one

prepareReply(view , q):Remove and return t fresh neighbors from the local view

merge(view ,m, q):merge the local view and the messageremove p, keep freshes in case of duplicatesre-insert entries sent to q if space permits

Bibliography

S. Voulgaris, D. Gavidia, and M. van Steen. Cyclon: Inexpensive membershipmanagement for unstructured p2p overlays.J. Network Syst. Manage., 13(2), 2005.ftp://ftp.cs.vu.nl/pub/steen/papers/jnsm.05.pdf


ftp://ftp.cs.vu.nl/pub/steen/papers/jnsm.05.pdf


Cyclon

ID &Address

Timestamp

9

412

ID &Address

Timestamp

14

710



Cyclon

ID &Address

Timestamp

9

412

ID &Address

Timestamp

14

710

1. Pick oldest neighbor from my view



Cyclon

ID &Address

Timestamp

9

412

ID &Address

Timestamp

14

710

1. Pick oldest neighbor from my view2. Exchange some neighbors



Cyclon

ID &Address

Timestamp

9

12

ID &Address

Timestamp

14

710


20



Cyclon

ID &Address

Timestamp

9

12

ID &Address

Timestamp

147 10


20



Cyclon

ID &Address

Timestamp

9

12

ID &Address

Timestamp

14710


20



Experimental evaluation

Simulation parameters

N = 100, 000 nodesC = 20 neighbors

We are measuring

Clustering coefficientAverage path lengthDegree distributionRobustness to catastrophic failuresSelf-cleaning



Brief recap

Clustering coefficient

The clustering coefficient of graph G is the average over the localclustering coefficient of all nodes in the graph

CC(G) =1

N

∑i∈V

2|Ei||Vi|(|Vi| − 1)

For epidemic diffusion:high clustering coefficient means more redundant messages



Clustering coefficient! High clustering is bad for:" Flooding: It results in many redundant messages" Self-healing: Strongly connected cluster ! weakly connected to the rest of the network


Clustering coefficient! High clustering is bad for:" Flooding: It results in many redundant messages" Self-healing: Strongly connected cluster ! weakly connected to the rest of the network

2019-09-16


CyclonClustering coefficient

High clustering is bad for:

• Flooding: It results in many redundant messages

• Self-healing: Strongly connected cluster ⇒ weakly connected to therest of the network


Brief recap

Average path length

The average path length of a connected graph G = (V,E) is definedas:

`(G) =1

|V |(|V | − 1)

∑i,j∈V,i 6=j

d(i, j)

For epidemic diffusion:high average path length means a longer number of rounds toreach everybody else



Average path length! Indication of the time and cost to flood the network


Average path length! Indication of the time and cost to flood the network

2019-09-16


CyclonAverage path length

Indication of the time and cost to flood the network


Degree distribution

Affects:

! Robustness (shows weakly connected nodes)

! Load balancing

! Way epidemics spread


Degree distribution

Affects:

! Robustness (shows weakly connected nodes)

! Load balancing

! Way epidemics spread

2019-09-16


CyclonDegree distribution

Affects:

• Robustness (shows weakly connected nodes)

• Load balancing

• Way epidemics spread


Robustness



Self-cleaning behavior


Peer Sampling Security

Security

Hub attack

Input: a set of colluding,malicious nodes

How: they only gossip theirIDs

Output:all views become “polluted”non-malicious nodes arecut-off from each othermalicious nodes may leavethe network, leaving thenetwork disconnected withno way to recover



Secure peer sampling

Algorithm

Maintain multiple independent views in each nodeDuring a gossip exchange measure similarity of exchangedviewsWith probability equal to proportion of identical nodes in twoviews, reject the gossip and blacklist the nodeApply an aging policy to the black listWhen supplying a random peer, select the current “best” view

Bibliography

G. P. Jesi, D. Hales, and M. van Steen. Identifying malicious peers before it’stoo late: A decentralized secure peer sampling service.In Proc. of the First International Conference on Self-Adaptive and Self-OrganizingSystems (SASO’07), pages 237–246, Boston, MA, 2007. IEEE Computer Society.

http://www.cs.vu.nl/~steen/papers/2007a.saso.pdfAlberto Montresor (UniTN) DS - Epidemics 2 2018/10/28 19 / 88

http://www.cs.vu.nl/~steen/papers/2007a.saso.pdf


Secure peer sampling

1000 nodes20 malicious nodes


Topology construction

Introduction

Topology bootstrap

Informal definition: building an overlay topology from the ground upas quickly and efficiently as possible

Do not confuse with node bootstrapPlacing a single node in the right place in the topology

Do not confuse with topology maintenanceStabilization (routing table cleanup)Can be used later to optimize the topology


Topology construction

State of the art

Current DHT protocols do not support bootstrappingThey assume an already formed network

Even supposing the network exists, they work “unless a tremendousnumber of nodes joins the system” [Chord]

We could envision a “join orchestration approach”, but it wouldrequire a linear time

Bibliography

M. Jelasity, A. Montresor, and O. Babaoglu. T-Man: Gossip-based fast overlaytopology construction.Elsevier Computer Networks, 2009.http://dit.unitn.it/~montreso/pubs/papers/comnet09.pdf


http://dit.unitn.it/~montreso/pubs/papers/comnet09.pdf

Topology construction T-Man

T-Man

T-man is a generic protocol for topology formation

Topologies are expressed through ranking functions:“which are my preferred neighbors?”

ExamplesRings, tori, trees, DHTs, etc.Distributed sortingSemantic proximityLatency proximityEtc.

after 2 cycles after 3 cycles after 4 cycles after 7 cycles

Figure 3: Illustration of constructing a torus over 50 ! 50 = 2500 nodes, starting from a uniform random graph with initial views containing 20random entries and the parameter values m = 20,! = 10, K = 4.

To be able to conduct controlled experiments withT-M!" on di!erent ranking methods, we first select agraph instead of a ranking method, and subsequently“reverse-engineer” an appropriate ranking method fromthis graph by defining the ranking to be the orderingconsistent with the minimal path length from the basenode in the selected graph. We will call this selectedgraph the ranking graph, to emphasize its direct rela-tionship with the ranking method.

Note that the target graph is defined by parameter K,so the target graph is identical to the ranking graph onlyif the ranking graph is K-regular. However, for conve-nience, in this section we will not rely on K becausewe either focus on the dynamics of convergence (as op-posed to convergence time), which is independent of K,or we study the discovery of neighbors in the rankinggraph directly.

In order to focus on the e!ects of parameters, in thissection we assume a greatly simplified system modelwhere the protocol is initiated at the same time at allnodes, where there are no failures, and where mes-sages are delivered instantly. While these assumptionsare clearly unrealistic, in Section 6 we show throughevent-based simulations that the protocol is extremelyrobust to failures, asynchrony and message delays evenin more realistic settings.

5.1. Analogy with the Anti-Entropy Epidemic Protocol

In Section 3 we used an (unspecified) disseminationapproach to define the overlay construction problem.Here we would like to elaborate on this idea further.Indeed, the anti-entropy epidemic protocol, one imple-mentation of such a dissemination approach, can beseen as a special case of T-M!", where the message sizem is unlimited (i.e., m " N such that every possible nodedescriptor can be sent in a single message) and peer se-lection is uniform random from the entire network. Inthis case, independent of the ranking method, all node

descriptors that are present in the initial views will bedisseminated to all nodes. Furthermore, it is known thatfull convergence is reached in less than logarithmic timein expectation [25].

For this reason, the anti-entropy epidemic protocol isimportant also as a base case protocol when evaluatingthe performance of T-M!", where the goal is to achievesimilar convergence speed to anti-entropy, but with theconstraint that communication is limited to exchanginga constant amount of information in each round. Dueto the communication constraint, performance will nolonger be independent of the ranking method.

5.2. Parameter Setting for Symmetric Target Graphs

We define a symmetric target graph to be one whereall nodes are interchangeable. In other words, all nodeshave identical roles from a topological point of view.Such graphs are very common in the literature of over-lay networks. The behavior of T-M!" is more easily un-derstood on symmetric graphs, because focusing on atypical (average) node gives a good characterization ofthe entire system.

We will focus on two ranking graphs, both undi-rected: the ring and a k-out random graph, where krandom out-links are assigned to all nodes and subse-quently the directionality of the links is dropped. Wechoose these two graphs to study two extreme cases forthe network diameter. The diameter (longest minimalpath) of the ring is O(N) while that of the random graphis O(log N) with high probability.

Let us examine the di!erences between realistic pa-rameter settings and the anti-entropy epidemic dissemi-nation scenario described above. First, assume that themessage size m is a small constant rather than being un-limited. In this case, the random peer selection algo-rithm is no longer appropriate: if a node i contacts peerj that ranks low with i as the base node, then i cannotexpect to learn new useful links from j because now

6



The T-Man algorithm

Protocol executed by p:upon initialization do

view ← a collection of random descriptors

repeat every ∆ time unitsProcess q ← selectNeighbor(view) % Select a neighborm← prepareRequest(view , q)send 〈request,m, p〉 to q





Node descriptors and ranking functions

Node descriptor

IP address + timestamp, as in newscast/cyclonOne or more attributes of the node, used for ranking:

The ID of a node in a DHT; orA semantic description of the node; orThe geographic location of the node

Ranking function

A ranking function rank(U, p) ranks a collection U of nodesbased on p’s order of preference.Alternatively, a ranking function rank(U, p, r) returns r nodesfrom U that are the most preferred by p.



Ranking functions

The ranking function may be based on a distance over a spaceSpace: set of possible attribute valuesDistance: a metric d(x, y) over the space

Given a collection U of nodesrank them in increasing distance order, orreturn the r closest node based on the notion of distance

Example (Line/ring)Space: [0, 1)

Distance over the line/ringd(a, b) = |a− b|d(a, b) = min{|a−b|, 1−|a−b|}

Example (Grid or torus –Manhattan Distance)

Space: [0, 1) · [0, 1)

Distance:d(a, b) = |ax˘bx|+ |ay˘by|



How to express the ranking function

Generic line/ring: returns

the r nodes closest to p w.r.t.the distance function

the r/2 nodes “≥ p”, closest top w.r.t. the distance function

Sorted line/ring: returns

the r/2 nodes “≥ p”, closest top w.r.t. the distance functionthe r/2 nodes “≤ p”, closest top w.r.t. the distance function



The T-Man algorithm

selectNeighbor()

Select a random node from rank(viewp, p, r), i.e. from the top-rnodes in p’s local view as ranked by p

prepareRequest(view , q), prepareReply(view , q):Returns rank(viewp, q, r), i.e. the top-r nodes in p’s local view asranked by q

merge(view ,m, q):merge the local view and the message



T-Man: How it works

p

q

0 2m-1

2m-10



T-Man: The movie!

http://www.disi.unitn.it/~montreso/ds/handouts/ring-fingers-20.gif


http://www.disi.unitn.it/~montreso/ds/handouts/ring-fingers-20.gif


How to start and stop

In the previous animation

Nodes starts simultaneously

Convergence is measured globally

In reality

We must start the protocol at all nodesBroadcasting, using the random topology obtained through a peersampling services

We must stop the protocolLocal condition: when a node does not observe any view change fora predefined period, it stopsIdle: number of cycles without variations


Topology construction Evaluation

Scalability, with different starting protocols

5

10

15

20

25

30

210 211 212 213 214 215 216 217 218

Con

verg

ence

Tim

e (s

)

Network Size

Anti-Entropy (Push)Anti-Entropy (Push-Pull)

FloodingSynchronous start



Bootstrap protocol: Starting and stopping

0

20

40

60

80

100

0 5 10 15 20 25 30

activ

e no

des

(%)

Time (s)

size=210

size=213

size=216



Stopping condition: different values for idle

10

20

30

40

50

2 3 4 5 6 7 8

Tim

e (s

)

δidle (s)

size = 216

size = 213

size = 210



Communication costs

2

4

6

8

10

12

14

16

18

2 3 4 5 6 7 8

Mes

sage

s

Idle cycles

size = 216

size = 213

size = 210



Parameter tuning: message size

10

20

30

40

50

60

5 10 15 20 25 30

Ter

min

atio

n T

ime

(cyc

les)

Message Size



Robustness to failures

98

98.5

99

99.5

100

0 0.002 0.004 0.006 0.008 0.01

Tar

get L

inks

Fou

nd (

%)

Node failures per node per second

size=216

size=213

size=210



Robustness to message losses

98.6

98.8

99

99.2

99.4

99.6

99.8

100

0 5 10 15 20

Tar

get L

inks

Fou

nd (

%)

Message loss (%)

size = 210

size = 213

size = 216


Topology construction T-Chord

T-Chord

The general idea:Node descriptor contains node ID in a [0..2t) spaceNodes are sorted over the ring defined by IDsFinal output is the Chord ringAs by-product, many other nodes are discovered

Example:t = 32, N = 214, r = 20



T-Chord

Run T-Man until the ring is formed

Use the closest neighbors to select successors

Use the entire view to fill the finger tableFor each node p:For each i = 0 . . .m− 1 (m number of bits)search in the local view the closest node q s.t.:

q ∈ [(p+ 2i) mod 2m, (p+ 2i+1 − 1) mod 2m]



T-Chord-Prox

Run T-Man until the ring is formed

Use the closest neighbors to select successors

Use the entire view to fill the finger tableFor each node p:For each i = 0 . . .m− 1 (m number of bits)search in the local view the closest node (in terms of latency) q s.t.:

q ∈ [(p+ 2i) mod 2m, (p+ 2i+1 − 1) mod 2m]



Routing through T-Chord

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

210 211 212 213 214 215 216 217 218

Hop

s

Network Size

T-ChordChord



Robustness

0

2

4

6

8

10

12

14

0 10 20 30 40 50

Routing failu

res (

%)

Crashed nodes (%)

Chord (crash)T-Chord (crash)T-Chord (churn)

T-Chord-Prox (churn)


Aggregation

Aggregation

Aggregation

The collective name of a set of functions that provide statistical infor-mation about a system

Useful in large-scale distributed systems:The average load of nodes in a computing gridThe sum of free space in a distributed storageThe total number of nodes in a P2P system

Wanted: a decentralized, robust, proactive solution

Bibliography

M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in largedynamic networks.ACM Trans. Comput. Syst., 23(1):219–252, Aug. 2005.http://www.cs.unibo.it/bison/publications/aggregation-tocs.pdf



Aggregation

A generic aggregation algorithm

Protocol executed by p:upon initialization do

state ← some initial state

repeat every ∆ time unitsProcess q ← getPeer() % Peer sampling servicesp ← prepareRequest(state, q)send 〈request, sp, p〉 to q

upon receive 〈request, sq, q〉 dosp ← prepareReply(state, sq, q)send 〈reply, sp, p〉 to qstate ← mergeRequest(state, sq, q)

upon receive 〈reply, sq, q〉 dostate ← mergeReply(state, sq, q)


Aggregation Aggregation protocols

Average aggregation

Local state maintained by nodes:initialized with a real number representing the quantity to beaveragedcontains the current estimate of the average

Function getPeer():Based on peer sampling serviceReturns a random peer from the entire collection of nodes

Function prepareRequest(state, q), prepareReply(state, sq, q):Returns state

Function mergeRequest(state, sq, q), mergeReply(state, sq, q):Returns state+sq

2



Example

16

10

4

36

2

8



Example

16

6

4

36

6

8

(10+2)/2=6



Example

16

6

4

36

6

8



Example

10

6

10

36

6

8

(16+4)/2=10



Example

10

6

10

36

6

8



Example

10

7

10

36

6

7

(6+8)/2=7



Average aggregation

The exchange of messages is not atomic, so pairwise averaging is betterdescribed in this way:

Function prepareRequest(state, q)

Returns state

Function prepareReply(state, sq, q):Returns state − state+sq

2

Function mergeRequest(state, sq, q):

Returns state −(state − state+sq

2

)=

state+sq2

Function mergeReply(state, d, q):Returns state + d



Convergence

If the graph is connected, each node converges to the average ofthe original values

After each exchange:Average does not changeVariance is reduced

Different from load balancing due to lack of constraints



Convergence

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

100000

0 5 10 15 20 25 30

Est

imat

ed A

vera

ge

Cycles

ExperimentsMaximumMinimum

0.990.995

11.005

1.011.015

1.021.025

22 24 26 28 30



Other problems

Average merge(a, b) returns (a+ b)/2

Geometric mean merge(a, b) returns√ab

Min/max merge(a, b) returns min(a, b) or max(a, b)

Sum Average · Count

Product Geometric · Count

Variance compute a2 − a2



Counting

Counting protocol

Init: one node starts with 1, all the others with 0

Expected average: 1/N

Problem: how to select that “one node”?Concurrent instances of the counting protocolEach instance is lead by a different nodeMessages are tagged with a unique identifierNodes participate in all instancesEach node acts as leader with probability P = c/NE


Aggregation Analytical results

Questions

How fast is convergence on different topologies?Analytical results for fully connected graphs: exponentialconvergenceRandom, newscast topologies: similar results

Can we solve other problemsCounting, sum, variance estimation

What are the effects of failuresLink failures: not criticalCrashes/message omissions can ruin convergenceBut we have a solution for that



Analytical results

Consider a “centralized translation” of the aggregation protocolN values to be aggregated are stored in vector AAn infinite number of rounds is executedAt each round, the average operation is run on N pairs of values

Aggregationrepeat forever

for i← 1 to N do(p, q)← getPair()A[p], A[q]← (A[p] +A[q])/2



Some definitions

We measure the speed of convergence by measuring populationmean and variance at round i:

σ2i =

1

N

N∑k=1

(Ai[k]− µi)2 where µi =1

N

N∑k=1

Ai[k]

Elementary variance reduction step:ρi = σ2

i+1/σ2i

Variable Φk: the number of times that node k has been selectedfrom getPair()


Some definitions

We measure the speed of convergence by measuring populationmean and variance at round i:

σ2i =

1

N

N∑k=1

(Ai[k]− µi)2 where µi =1

N

N∑k=1

Ai[k]

Elementary variance reduction step:ρi = σ2

i+1/σ2i

Variable Φk: the number of times that node k has been selectedfrom getPair()

2019-09-16

DS - Epidemics 2Aggregation

Analytical resultsSome definitions

Unbiased estimator of the population variance:

σ2i =

1

N − 1

N∑k=1

(Ai[k]− µi)2

The term N − 1 is called Bessel’s correction


Analytical results

Theorem: Convergence

If1 The indexes returned by each call to getPair() are

uncorrelated;2 the random variables Φk are identically distributed;3 Φ is a random variable with this common distribution;4 after (p, q) is returned by getPair(), the number of times p andq will be selected by the remaining calls to getPair() hasidentical distribution

then:E[σ2

i+1] = E[2−Φ]σ2i


Analytical results

Theorem: Convergence

If1 The indexes returned by each call to getPair() are

uncorrelated;2 the random variables Φk are identically distributed;3 Φ is a random variable with this common distribution;4 after (p, q) is returned by getPair(), the number of times p andq will be selected by the remaining calls to getPair() hasidentical distribution

then:E[σ2

i+1] = E[2−Φ]σ2i20

19-09-16


Analytical resultsAnalytical results

Explanation: p and q are selected in an independent way


Analytical results

Optimal case: E(2−Φ) = E(2−2) = 1/4

getPair() implements perfect matchingno corresponding distributed protocol

Random case: E(2−Φ) = 1/e

getPair() implements random global samplingA local corresponding protocol exists

Aggregation protocol: E(2−Φ) = 1/(2√e) = 0.3032 . . .

Scalability: results independent of NEfficiency: convergence is very fast



Convergence factor

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

5 10 15 20 25 30

variance r

eduction

cycle

getPair_rand, completegetPair_rand, 20-reg. random

getPair_seq, completegetPair_seq, 20-reg. random



Convergence factor

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

100 1000 10000 100000

variance r

eduction

network size

getPair_rand, completegetPair_rand, 20-reg. random

getPair_seq, completegetPair_seq, 20-reg. random


Aggregation Evaluation

Different topologies

Theoretical results are based on the assumption that theunderlying overlay is sufficiently random

What about other topologies?Our aggregation scheme can be applied to generic connectedtopologiesSmall-world, scale-free, newscast, random, completeConvergence factor depends on randomness




0.3

0.4

0.5

0.6

0.7

0.8

100 1000 10000 100000 1e+06

Con

verg

ence

Fac

tor

Network Size

W-S (β = 0.00)W-S (β = 0.25)W-S (β = 0.50)W-S (β = 0.75)

NewscastScale-Free

RandomComplete




1e-16

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

0 5 10 15 20 25 30 35 40 45 50

Var

ianc

e R

educ

tion

Cycles

W-S (β=0.00)W-S (β=0.25)W-S (β=0.50)W-S (β=0.75)

NewscastScale-free

RandomComplete



Adaptivity

The generic protocol is not adaptiveDynamism of the networkVariability of values

Periodical restarting mechanismAt each node:

The protocol is terminatedThe current estimate is returned as the aggregation outputThe current values are used to re-initialize the estimatesAggregation starts again with fresh initial values



Dynamic membership

TerminationRun protocol for a predefined number of cycles λλ depends on:

required accuracy of the outputthe convergence factor that can be achieved

RestartingDivide run in consecutive epochs of length ∆Start a new instance of the protocol in each epoch

Node joiningReceives from a node already in the network:

Next epoch identifierThe time until the start of the next epoch

To guarantee convergence:Joining node is not allowed to participate in the current epoch



Synchronization

The protocol described so far:Assumes synchronized epochs and cyclesRequires synchronized clocks / communication

This is not realistic:Clocks may driftCommunication incurs unpredictable delays

Complete synchronization is not neededIt is sufficient that the time between the first/ last node starting toparticipate in an epoch is bounded



Cost analysis

If the overlay is sufficiently random:exchanges per round = 1 + Φ, where Φ has Poisson distributionwith average 1

Round length δ defines trade-off between overhead andconvergence

δ = 1sOverhead = 2 · 2 · (28 + 4) = 128B/S

N. of rounds λ defines trade-off between accuracy and convergence:

E(σ2λ)/E(σ2

0) = ρλ

If ε the desired accuracy, then λ ≥ logρ ε



Evaluation

Underlying topology is based on Newscast

The count protocol is usedMore sensitive to failures

Some parameters:Network size is 100.000Partial view size in Newscast is 30Epoch length is 30 cyclesNumber of experiments is 50



Sudden death

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20

Estim

ate

d S

ize (

/10

5)

Cycle

Experiments


Sudden death

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20

Estim

ate

d S

ize

(/1

05)

Cycle

Experiments

2019-09-16


EvaluationSudden death

Network size estimation where 50% of the nodes crash suddenly. The x-axisrepresents the cycle of an epoch at which the “sudden death” occurs.


Churn

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 500 1000 1500 2000 2500

Estim

ate

d S

ize (

/10

5)

Nodes Substituted per Cycle

Experiments


Churn

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 500 1000 1500 2000 2500

Estim

ate

d S

ize

(/1

05)

Nodes Substituted per Cycle

Experiments

2019-09-16


EvaluationChurn

Network size estimation in a network of constant size subject to churn. Thex-axis is the churn rate which corresponds to the number of nodes that crashat each cycle and are substituted by the same number of new nodes.


Gnutella trace

4000

4100

4200

4300

4400

4500

4600

4700

4800

4900

470 480 490 500 510 520 530 540 550

Netw

ork

Siz

e

Epoch

Estimated SizeActual Size


Gnutella trace

4000

4100

4200

4300

4400

4500

4600

4700

4800

4900

470 480 490 500 510 520 530 540 550

Ne

two

rk S

ize

Epoch

Estimated SizeActual Size

2019-09-16


EvaluationGnutella trace

Network size estimation in the presence of churn according to a Gnutellatrace.


Link failure probability

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Converg

ence F

acto

r

Pd

ExperimentsAverage Rate

Upper Bound on Convergence Factor


Link failure probability

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Co

nve

rge

nce

Fa

cto

r

Pd

ExperimentsAverage Rate

Upper Bound on Convergence Factor2019-09-16


EvaluationLink failure probability

Convergence factor as a function of link failure probability.


Message losses

102

103

104

105

106

107

108

109

0 0.1 0.2 0.3 0.4

Estim

ate

d S

ize

Fraction of Messages Lost

Experiments


Message losses

102

103

104

105

106

107

108

109

0 0.1 0.2 0.3 0.4

Estim

ate

d S

ize

Fraction of Messages Lost

Experiments

2019-09-16


EvaluationMessage losses

Network size estimation as a function of lost messages. The length of thebars illustrate the distance between the minimal and maximal estimated sizeover the set of nodes within a single experiment.


Multiple executions

To improve accuracy in the case of failures:Multiple concurrent instances of the protocol may be runMedian value taken as result

SimulationsVariable number of instancesWith node failures: 1000 nodes substituted per cycleWith message omissions 20% of messages lost



Multiple executions – churn

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

0 5 10 15 20 25 30 35 40 45 50

Estim

ate

d S

ize (

/10

5)

Number of Aggregation Instances

Experiments


Multiple executions – churn

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

0 5 10 15 20 25 30 35 40 45 50

Estim

ate

d S

ize

(/1

05)


Experiments

2019-09-16


EvaluationMultiple executions – churn

Network size estimation with multiple instance. Network size is 100 000, 1000nodes crash at the beginning of each cycle. The length of the bars correspondto the distance between the minimal and maximal estimates over the set ofall nodes within a single experiment.


Multiple executions – message losses

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35 40 45 50

Estim

ate

d S

ize (

/10

5)


Experiments


Multiple executions – message losses

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35 40 45 50

Estim

ate

d S

ize

(/1

05)


Experiments

2019-09-16


EvaluationMultiple executions – message losses

Network size estimation as a function of concurrent protocol instances. 20%of messages are lost. The length of the bars correspond to the distancebetween the minimal and maximal estimates over the set of all nodes withina single experiment.


Planet-lab

PlanetLab is a global research network that supports thedevelopment of new network servicesPlanetLab consists of 1353 nodes at 717 sites (April 2016)



Planet-lab

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

7000

0 5 10 15 20 25 30 35 40 45 50

Netw

ork

siz

e

Epoch

Estimated sizeActual size


Planet-lab

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

7000

0 5 10 15 20 25 30 35 40 45 50

Ne

two

rk s

ize

Epoch

Estimated sizeActual size

2019-09-16


EvaluationPlanet-lab

The estimated size and the actual size of a network oscillating between 2500and 6000 nodes (approximately). Standard deviation of estimated size isdisplayed using vertical bars.

Aggregation Most frequent items

Most frequent items

System model

A collection P of nodes of size N = |P |Each node p ∈ P stores a local multiset Ip of items taken froma universe I of size m = |I|

Problem statement

Identify the k most frequent items, i.e. the k distinct items thatappear more frequently in the distributed multiset given by I =⋃

p∈P Ip

Bibliography

J. Sacha and A. Montresor. Identifying frequent items in distributed data sets.Computing, 95(4):289–307, 2013.http://www.disi.unitn.it/~montreso/pubs/papers/computing13.pdf




Formal model

Global frequency

Global absolute frequency : F (i) =∑p∈P

Fp(i)

Global relative frequency : F̂ (i) = F (i)/M

Note: M = |I|

Most frequent items

Let j be the k-th item in the sequence of all items ordered by decrea-sing global frequency. We want to identify the top-k most frequentitems:

MF = {i : F (i) ≥ F (j)}



Formal model

Absolutely frequent items

To identify absolutely frequent items (AF ) whose global absolutefrequency is larger than the absolute threshold f :

AF = {i : F (i) ≥ f}

Relatively frequent items

To identify relatively frequent items (RF ) whose global relativefrequency is larger than the relative threshold φ:

RF = {i : F̂ (i) ≥ φ}



Solution

Naive protocol

Compute the average global frequency F (i)/N of all itemsSort items by decreasing frequencyIdentify the item j ranked k-th in such ordered sequenceReturn the items whose average frequency is larger or equalthan F (j)/N



FreqMF protocol (1)

FreqMF algorithm executed by node pupon initialization do

Map estp ← new Map()foreach i ∈ Ip do

estp[i]← Fp(i)

function getTop (k)return extract(estp, k)

repeat every ∆ time unitsNode q ← random(P )send 〈request, extract(estp, s)〉 to q



FreqMF protocol (2)

FreqMF algorithm executed by node pupon receive〈request, reqq〉 from q do

Map repp ← new Map()

foreach i ∈ reqq doreal δi ← 1

2(reqq[i]− estp[i])

estp[i]← estp[i] + δirepp[i]← δi

send 〈reply, repp〉 to q

upon receive〈reply, repq〉 doforeach i ∈ repq do

estp[i]← estp[i]− repq[i]



FreqAF and FreqRF protocols

How can you implement the FreqAF protocol?How can you implement the FreqRF protocol?



FreqMF evaluation

0

10

20

30

40

50

60

70

80

90

10 15 20 25 30 35 40 45 50

Co

nve

rge

nce

tim

e (

Ro

un

ds)

s

N=102, M=10

3

N=102, M=10

4

N=102, M=10

5

N=102, M=10

6

N=103, M=10

3

N=103, M=10

4

N=103, M=10

5

N=103, M=10

6

N=104, M=10

3

N=104, M=10

4

N=104, M=10

5

N=104, M=10

6



FreqMF evaluation

0

50

100

150

200

250

300

350

400

450

500

10 15 20 25 30 35 40 45 50

To

tal co

mm

un

ica

tio

n o

ve

rhe

ad

(It

em

s)

sAlberto Montresor (UniTN) DS - Epidemics 2 2018/10/28 86 / 88


FreqMF evaluation

0

100

200

300

400

500

600

700

800

900

1000

10 15 20 25 30 35 40 45 50

Sto

rag

e (

Ite

ms)

sAlberto Montresor (UniTN) DS - Epidemics 2 2018/10/28 87 / 88


FreqMF evaluation

0

0.5

1

1.5

2

2.5

0 5 10 15 20 25 30

MF

req

Co

nv.

tim

e /

Na

ive

Co

nv.

tim

e

K

M=103

M=104

M=105

M=106


Reading Material

M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. van Steen.

Gossip-based peer sampling.ACM Trans. Comput. Syst., 25(3), 2007.http://www.disi.unitn.it/~montreso/ds/papers/PeerSampling.pdf

M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in largedynamic networks.ACM Trans. Comput. Syst., 23(1):219–252, Aug. 2005.http://www.cs.unibo.it/bison/publications/aggregation-tocs.pdf



Distributed Algorithms EpidemicProtocols:...

Documents

Transcript of Distributed Algorithms EpidemicProtocols:...