Lecture 1: Introduction to Gossip-based and Epidemic Algorithmskpanagio/ALEA/alea1_web.pdf ·...

Surprising Performance with Trivial Algorithms

Lecture 1: Introduction to Gossip-based and Epidemic Algorithms

Alea Meeting, Munich, February 2016

Benjamin Doerr, LIX, École Polytechnique, Paris-Saclay

Slides at http://people.mpi-inf.mpg.de/~doerr/alea1.pdf = google for my homepage and add “alea1.pdf”

http://people.mpi-inf.mpg.de/~doerr/alea1.pdf





Gossip-based algorithms ≈ epidemic algorithms: particular class of distributed algorithms

gossip-based: the main form of communication is that nodes call random

neighbors highly robust works well even when topology of network is unknown, changing, … typical application: wireless sensor networks, mobile ad-hoc networks

epidemic: information or activity spreads in an epidemic-like manner

very fast and efficient strong connections between epidemic algorithms and other epidemic

processes (spread of rumors, diseases or malware; forming of opinions)

Alea'16, Epidemic Algorithms, Lecture 1 2

Topic of this Mini-Course: Gossip-Based & Epidemic Algorithms

Examples

Epidemic algorithms: Communicating updates in replicated database systems: Nodes of the

network regularly call random others and exchange updates. Communication in wireless sensor networks: Since the network

topology is unknown and unreliable, the only way to send data from A to B is to send it to random neighbors and make them do the same.

Peer-to-peer exchange of large data sets: Gain fairness by choosing random paths

Epidemic processes:

How do opinions form in (electronic) social networks, viral marketing Mathematical epidemiology: Model the spread of diseases and analyze

possible counter measures Malware/viruses spreading in computer networks


Power of gossiping/epidemics: Toy example phone chains

A first epidemic/gossip-based algorithm: randomized rumor spreading

Analysis of randomized rumor spreading in complete networks


Topics Today

Phone chain problem: there is a group of 𝑛 people one of them knows a piece of information (“rumor”) this rumor shall be communicated to all others how does the group achieve this?

In computer science language: what is a good protocol to communicate a

piece of information from one person to all?


Part 1: Toy Example: Phone Chains

Traditional phone chain: the set of people has a cyclic order. when you get called (or you are the first informed person), then you call

your successor in the cyclic order

Properties of this chain: efficient w.r.t. to total work (number of phone calls): 𝑛 − 1 calls in total

best possible when 𝑛 − 1 people are to become informed fair: everyone does and receives at most one call very slow: 𝑛 − 1 rounds of communication not robust: if one call fails (e.g., the person is lazy), then all subsequent

people remain uninformed


Traditional Phone Chain

Hypercube phone chain: assume that we have 𝑛 = 2𝑑 people numbered from 0 to 2𝑑 − 1 think of bit-strings of length 𝑑

in round 𝑖 ∈ [1. .𝑑], each informed node 𝑥 calls node 𝑥 𝑋𝑋𝑋 2𝑖−1, that is, the bit-string obtained from flipping the 𝑖 − 1 st bit

Properties of the hypercube chain: work-efficient: 𝑛 − 1 calls in total relatively fair: nodes perform between 1 and 𝑑 = log2 𝑛 calls very fast: 𝑑 = log2 𝑛 rounds of communication (this is best possible,

since the number of informed nodes can at most double in each round) “the power of epidemics”: making everyone join the process

gives a great speed-up! not very robust: if a node stops working in round 𝑖, then 2𝑑−𝑖 nodes

remain uninformed Alea'16, Epidemic Algorithms, Lecture 1 7

Hypercube Phone Chain

Randomized phone chain (in the spirit of Paul Erdős): in each round, each node that knows the rumor calls a random node

First observation: It is not clear when the protocol ends. This is a non-trivial problem, but can be solved. For the moment, let us ignore this and only measure the performance up to the point when everyone is informed.

Properties of the randomized chain (to be proven later): moderately work-efficient: ≈ 𝑛 ln𝑛 calls in total until all informed relatively fair in terms of work: Θ(log𝑛) calls per node until all informed quite fast: With probability 1 − 𝑜(1), after (1 + o 1 )(log2 𝑛 + ln𝑛)

rounds of communication everyone is informed very robust: if each call gets lost with 50% chance (unnoticed by the

caller), the time increases only by roughly 82% the power of randomization: get robustness (almost) for free!


Randomized Phone Chain

The randomized phone chain is a communication protocol / epidemic algorithm / gossip-based algorithm that works in any connected network:

Randomized rumor spreading a round-based process/protocol/algorithm starts with one node informed in each round, each informed node calls a random neighbor – this

becomes informed if it was not already

Protocol assumptions: the nodes have some method of synchronization (“rounds”) we assume that nodes can call neighbors chosen uniformly at random no interference or other problems if a node is called by several nodes in

one round (since we have only one rumor and only informed node call, we can simply assume that it accept an arbitrary incoming call)


Part 2: The Randomized Rumor Spreading (RRS) Protocol

Alea'16, Epidemic Algorithms, Lecture 1

Some Known Results

Randomized rumor spreading (RSS) is fast: Θ log𝑛 rounds suffice to inform all nodes with probability 1 − 1 𝑛⁄ in… – complete graphs [FriezeGrimmet85] – hypercubes [FeigePelegRaghavanUpfal90] – random graphs 𝐺𝑛,𝑝, 𝑝 ≥ (1 + 𝜀)(log𝑛)/𝑛 [FPRU90] – random regular graphs [FountoulakisPanagiotou13]

Rumor spreading is robust: If calls fail independently with probability p…

– then for any graph the rumor spreading time (roughly) increases at most by a factor of 6/(1 − 𝑝) [ElsässerSauerwald09]

– in complete graphs: time increases by less than 1/(1 − 𝑝). e.g., by 1.82 when calls fail with probability ½ [DoerrHuberLevavi13]

10

Show that randomized rumor spreading in complete networks with probability 1 − 𝑜(1) informs all nodes in (1 + o 1 )(log2 𝑛 + ln𝑛) rounds: get an intuitive understanding how this process works and why we have

the runtime of ≈ log2 𝑛 + ln𝑛 make this intuition precise (prove things)

Main insight beyond understanding this process: Quite elementary

arguments (when combined cleverly) lead us pretty far


Part 3: Let’s Prove Something

Markov process: To analyze what happens in round 𝑡 we only need to know which nodes are informed at the start of the round (not how they became informed).

Symmetry: Only the number of informed nodes counts.

Domination: Extra informed nodes can only speed-up the process.

All this implies that we can split the process into phases: the time to go to at least 𝑥 informed nodes is at most the time to go to at

least 𝑦 informed nodes plus the time to go from exactly 𝑦 informed nodes to at least 𝑥 informed nodes


Very Elementary Observations

The rumor spreading process can be split into 4 phases:

1st phase (up to 𝑜( 𝑛) informed nodes): True doubling. With high probability, all calls in one round reach a “new” node, that is, the number of informed nodes doubles in these rounds. “Birthday paradox.”

2nd phase (up to 𝑜(𝑛) informed nodes): Exponential growth. Each call still has a probability of 1 − 𝑜(1) of reaching a “new” node. Hence the number of informed nodes almost doubles, that is, increases by a factor of (2 − 𝑜 1 ) in these phases

3rd phase: a short connecting phase between the exponential growth of the informed nodes and the exponential shrinking of the uninformed nodes

4th phase (from 𝑛 − 𝑜(𝑛) informed nodes on): Exponential shrinking of the uninformed nodes by a factor of (𝑒 − 𝑜 1 ). “Coupon collector”


Overview: RRS in Complete Graphs

Assumption: Nodes call a random node including themselves: only slows down things (but not significantly) adds symmetry eases writing: “1 𝑛⁄ ” instead of “1 (𝑛 − 1)⁄ ”

Write log𝑛 ≔ log2 𝑛

Plan: Show that each phase does what we want with probability 1 − 𝑜(1) for didactic reasons: analyze 1st , 4th, 3rd, and then 2nd phase.


Notation

Lemma: If 𝑘 = 𝑜 𝑛 , then with probability 1 − 𝑜(1) the first 𝑘 calls of the process (no matter in how many rounds they take place) all reach a new node.

Proof 1: Simply compute the probability. 1 − 1 𝑛⁄ ⋅ 1 − 2 𝑛⁄ ⋅ … ⋅ 1 − 𝑘 𝑛⁄ ≥ 1 − ∑ 𝑖 𝑛⁄𝑘

𝑖=1 = 1 − (𝑘 + 1)𝑘 2𝑛⁄

Proof 2: Estimate wisely. For each of the first 𝑘 calls, at most 𝑘 nodes are informed when they

take place. Hence each has a failure probability of at most 𝑘 𝑛⁄ . Union bound: Probability that some call fails ≤ sum (over all calls) of

their failure probability = 𝑘 ⋅ 𝑘 𝑛⁄ = 𝑘2 𝑛⁄

In both cases, the total failure probability is 𝑜(1) if 𝑘 = 𝑜( 𝑛 ).


Phase 1: True Doubling

Bernoulli inequality

Corollary (Phase 1): Let 𝑇 ≤ 0.5 − 𝑜 1 log𝑛. With probability 1 − 𝑜 1 , after the first 𝑇 rounds 2𝑇 nodes are informed.

Proof: Consider the first 2𝑇 − 1 calls of the process. By the previous lemma, they all inform a “new” node (apart from a

failure probability of 𝑜(1)). A simple induction shows that the number of informed nodes doubles in

each of the first 𝑇 rounds (in the non-failure case).


True Doubling (2)

Imagine the year has 𝑛 days and you have 𝑘 people having birthday on a random day. What is the chance that at least two have the same birthday? 𝑜(1), if 𝑘 = 𝑜( 𝑛 ) 1 − 𝑜(1), if 𝑘 = 𝜔 𝑛 Surprisingly high “in practice”: 23 people are enough to have a 50%

chance for a double-birthday when 𝑛 = 365

Proof: Estimate the probability of proof 1 on the previous slide

Note: It is no rocket science to compute these probabilities. But: It is good to have this “rule” that things are decided at 𝑛 in the

back of your mind (so that you don’t have to calculate, but you know immediately what to roughly expect)


Excursus: Birthday Paradox

Lemma: If at some time there are 𝑖 = 𝑛 − 𝑜(𝑛) nodes informed, then with probability 1 − 𝑜(1), after 𝑇 = 𝑛 𝑖⁄ ln 𝑛 = 1 + 𝑜 1 ln𝑛 rounds all nodes are informed.

Proof: The probability that a fixed uniformed node remains uninformed for 𝑇 rounds is at most

1 −1𝑛

𝑖𝑇

≤ exp − 𝑖𝑇𝑛

≤1𝑛

[note the 𝟏 + 𝒓 ≤ 𝒆𝒓 argument valid for all real numbers 𝑟] Union bound: The probability that one of the 𝑜(𝑛) uninformed nodes

remains uninformed for 𝑇 rounds is at most 𝑜 𝑛 1 𝑛⁄ = 𝑜(1)

Note: Resembles the coupon collector problem (next slide)


Phase 4: Exponential Shrinking

Coupon collector process: You start with no coupons In each round, you get a coupon having

a random type out of 𝑛 different types Question: How long does it take until

you have at least one coupon of each type?

Theorem: The coupon collector time 𝑇 satisfies 𝐸 𝑇 = 𝑛 1 + 1 2⁄ + 1 3⁄ + ⋯+ 1 𝑛⁄ = 𝑛𝐻𝑛 = 𝑛 ln𝑛 + Θ 1 Pr 𝑇 ≥ 1 + 𝜀 𝑛 ln𝑛 ≤ 𝑛−𝜀 Pr 𝑇 < 1 − 𝜀 𝑛 − 1 ln𝑛 ≤ exp(−𝑛𝜀)

Note: another useful building block in randomized algorithms


Excursus: Coupon Collector

Lemma: Let 𝑓,𝑔 = 𝑜(1). Assume that we have 𝑖 = 𝑛𝑓 nodes informed. Let 𝑇 = 1 (𝑓𝑔)⁄ . Then with probability 1 − 𝑜 1 , after 𝑇 rounds at least 𝑛(1 − 𝑔) nodes are informed.

Proof:

The probability that an uninformed node remains uninformed during

these 𝑇 rounds is at most 1 − 1𝑛

𝑛𝑛𝑇≤ exp −1 𝑔⁄ .

Thus the expected number of uninformed nodes is at most 𝑛 − 𝑖 exp − 1 𝑔⁄ ≤ 𝑛 exp − 1 𝑔⁄ ,

and the probability that we have more than 𝑛𝑔 uninformed nodes is at most 𝑛 exp − 1 𝑔⁄ 𝑛𝑔⁄ = 𝑜(1) by Markov’s inequality.

Comment: Same argument as in shrinking phase. The loss for ignoring nodes informed during this phase is massive, but the phase is very short.


Phase 3: Short Connecting Phase

Reminder: What is missing is the analysis how we go from, say, 𝑛0.4 informed nodes to, say, 𝑛 log log𝑛⁄ informed nodes. very roughly, the number of informed nodes should double, but not fully

Two reasons why we do not have exact doubling: a node may call a node that was informed at the beginning of the round

same failure probability for all nodes collisions: two nodes may call the same uninformed node

this might be more tricky to analyze, because it involves more than one node


Phase 2: Almost Doubling

Let us analyze one round of rumor spreading starting with 𝑖 informed nodes.

We pretend that in this round the 𝑖 informed nodes do their calls one after the other (in an arbitrary order). We call node 𝑘 is successful if it calls a node which at that moment is

uninformed. Number nodes newly informed = number of successful calls

Comment: All we do here is a symmetry-breaking – if a node is called by

several nodes, then we declare one of them as successful


Solving the Collisions

Indicator random variable for successful calls:

𝑋𝑘 ≔ � 1, if the 𝑘th node calls a node uninformed at that time 0, otherwise

The number of newly informed nodes is 𝑋 ≔ ∑ 𝑋𝑘𝑖𝑘=1

The expected number of newly informed nodes is at least

𝐸 𝑋 = 𝐸 ∑ 𝑋𝑘𝑖𝑘=1 = ∑ 𝐸 𝑋𝑘𝑖

𝑘=1 ≥ ∑ 𝑛−𝑖− 𝑘−1𝑛

𝑖𝑘=1 ≥ 𝑖 1 − 3𝑖

2𝑛

Note: 𝐸 𝑋𝑘 = Pr 𝑋𝑘 = 1 ≥ 𝑛−𝑖−(𝑘−1)𝑛

General counting trick: Write things as sum of (not necessarily independent) random variables and use “linearity of expectation”


Counting Random Things

Lemma [previous slide]: A round starting with 𝑖 informed nodes ends with an expected number of at least 2𝑖 1 − 3𝑖

4𝑛 informed nodes

Assume that this was not an expectation, but the truth with prob. 1.

Phase 2a: Going from 𝑖 to 𝑖1 = 𝑛 log𝑛 2⁄ informed nodes. In each round, the number of informed nodes multiplies by at least

2(1 − log𝑛 −2) [see Lemma, skipped a 3 4⁄ for convenience] If we would do this for 𝑇1 = log(𝑛/𝑖) rounds, we would have 𝑖 ⋅ 2 1 − log𝑛 −2 𝑇1 ≥ 𝑛 1 − 𝑇1 log𝑛 −2 ≥ 𝑖1 informed nodes, hence we need at most 𝑇1 rounds to go to 𝑖1 informed nodes Bernoulli inequality: 1 − 𝑥 𝑡 ≥ 1 − 𝑥𝑡

Phase 2b: Going from 𝑖1 to 𝑖2 ≔ 𝑛 log log𝑛 2⁄ informed nodes same argument with (roughly) all log𝑛 replaced by log log𝑛: 𝑇2 = 2 log log𝑛 rounds do the job


If the Expectation Was the Truth

𝑋𝑘 ≔ �1, if the 𝑘th call informs a node uninformed at that time 0, otherwise

Pr 𝑋𝑘 = 1 ≥ 𝑛−𝑖−(𝑘−1)𝑛

Number of newly informed nodes: 𝑋 ≔ ∑ 𝑋𝑘𝑖𝑘=1

𝐸 𝑋 = 𝐸 ∑ 𝑋𝑘𝑖𝑘=1 = ∑ 𝐸 𝑋𝑘𝑖

𝑘=1 ≥ ∑ 𝑛−𝑖− 𝑘−1𝑛

𝑖𝑘=1 ≥ 𝑖(1 − 1.5 𝑖 𝑛⁄ )

Let’s compute (estimate) the variance: 𝑉𝑉𝑟 𝑋 = 𝑉𝑉𝑟 ∑ 𝑋𝑘𝑖

𝑘=1 = ∑ 𝑉𝑉𝑟 𝑋𝑘𝑖𝑘=1 + ∑ ∑ 𝐶𝑜𝐶 𝑋𝑘,𝑋𝑙𝑖

𝑙=𝑘+1𝑖𝑘=1

𝑉𝑉𝑟 𝑋𝑘 = Pr 𝑋𝑘 = 1 Pr [𝑋𝑘 = 0] ≤ 2𝑖 𝑛⁄ 𝐶𝑜𝐶 𝑋𝑘 ,𝑋𝑙 = Pr 𝑋𝑘 = 𝑋𝑙 = 1 − Pr 𝑋𝑘 = 1 Pr 𝑋𝑙 = 1 < 0

𝑉𝑉𝑟 𝑋 ≤ 2𝑖2 𝑛⁄

Chebyshev: Pr 𝑋 ≤ 𝐸 𝑋 − 𝑖0.75 ≤ 𝑉𝑉𝑉 𝑋𝑖0.75 2 ≤ 2𝑖0.5𝑛−1 ≤ 𝑛−0.5 since 𝑖 = 𝑜(𝑛)

Since 𝐸 𝑋 ≥ 0.5𝑖, we have Pr 𝑋 ≤ 𝐸 𝑋 1 − 2𝑖−0.25 ≤ 𝑛−0.5


Chebyshev’s Inequality

Lemma: Consider a round starting with 𝑖 informed nodes. Let 𝑋 denote the number of informed nodes after the round.

𝐸 𝑋 ≥ 2𝑖 1 − 3𝑖4𝑛

Pr 𝑋 ≤ 𝐸 𝑋 1 − 2𝑖−0.25 ≤ 𝑛−0.5

Union bound: With probability 1 − Θ log 𝑛𝑛

, we have

𝑋 ≥ 2𝑖 1 − 3𝑖4𝑛

1 − 2𝑖−0.25 in those of the first 2 log𝑛 rounds that start with 𝑜(𝑛) informed nodes


Chebyshev + Union Bound

Lemma: With probability 1 − Θ log 𝑛𝑛

, we have 𝑋 ≥ 2𝑖 1 − 3𝑖4𝑛

1 − 2𝑖−0.25 in those of the first 2 log𝑛 rounds that start with 𝑜(𝑛) informed nodes

In this good case, the “if the expectation was true” proof still works only modification: in each round our gain is smaller by a factor of at

most (1 − 2𝑖−0.25). Since we only do a logarithmic number of rounds and have 𝑖 ≥ 𝑛0.4, the

total loss (compared to “expectation=truth”) is 1 − 2𝑛−0.1 Θ(log 𝑛) ≥ 1 −𝑜 1 . recall that up to 𝑖 = 𝑛0.4, we have perfect doubling with high

probability

Theorem (Phase 2): With probability 1 − 𝑜(1), log𝑛0.6 + 2 log log𝑛 rounds are enough to go from 𝑛0.4 to 𝑛 log log𝑛⁄ informed nodes.


Deviations from Expectation Can (Essentially) Be Ignored

Strong concentration: A random variable 𝑋 is said to be strongly concentrated (around its mean) when 𝑋 is close to 𝐸 𝑋 , that is, when the deviations Pr[ 𝑋 − 𝐸 𝑋 ≥ 𝜆] are small.

Application 1: If the concentrations are strong enough, we can (almost) pretend that our random variables are not random, but just equal the expectation. example: the previous analysis of phase 2

Application 2: strong concentration good union bounds example: coupon collector with 𝑛 coupons

𝑇𝑘 : time until coupon 𝑘 is obtained 𝐸 𝑇𝑘 = 𝑛 Pr 𝑇𝑘 ≥ 𝜆𝑛 ≤ 1 − 1 𝑛⁄ 𝜆𝑛 ≤ exp −𝜆 union bound: all coupons obtained after ≈ 𝑛 ln𝑛 rounds (take 𝜆 = ln𝑛)

Pr ∃𝑘:𝑇𝑘 ≥ 𝜆𝑛 ≤� Pr 𝑇𝑣 ≥ 𝜆𝑛𝑘

≤ 𝑛 exp −𝜆


Why Did This Work: Concentration!

Proving Strong Concentration How can we show that some random variable 𝑋 is, with high probability, not too far from 𝐸 𝑋 (“strong concentration”)?

Elementary tools:

direct arguments: example coupon collector Markov’s inequality: Pr 𝑋 ≥ 𝜆𝐸 𝑋 ≤ 1/𝜆 for non-negative 𝑋.

relatively weak, only bounds for the “upper tail”

Chebyshev’s inequality: Pr 𝑋 − 𝐸 𝑋 ≥ 𝜆 𝑉𝑉𝑟 𝑋 ≤ 1/𝜆2. equivalent: Pr 𝑋 − 𝐸[𝑋] ≥ 𝜇 ≤ 𝑉𝑉𝑟[𝑋]/𝜇2

Less elementary: Chernoff bounds… (next slide)


Theorem: Let 𝑋1, … ,𝑋𝑛 be independent random variables taking values in [0,1]. Let 𝑋 = 𝑋1 + ⋯+ 𝑋𝑛.

For 𝛿 ∈ 0,1 , Pr 𝑋 ≤ 1 − 𝛿 𝐸 𝑋 ≤ 11−𝛿

1−𝛿𝑒−𝛿

𝐸[𝑋]

.

For 𝛿 ≥ 0, Pr 𝑋 ≥ 1 + 𝛿 𝐸 𝑋 ≤ 𝑒𝛿

1+𝛿 1+𝛿

𝐸[𝑋].

Corollaries for 𝛿 ∈ [0,1]:

Pr 𝑋 ≤ 1 − 𝛿 𝐸 𝑋 ≤ exp − 𝛿2𝐸 𝑋2

.

Pr 𝑋 ≥ 1 + 𝛿 𝐸 𝑋 ≤ exp − 𝛿2𝐸 𝑋3

.

Additive bound: For 𝜆 ≥ 0, Pr 𝑋 ≥ 𝐸 𝑋 + 𝜆 ≤ exp −2𝜆2

𝑛.

Same for 𝑋 ≤ 𝐸 𝑋 − 𝜆. Alea'16, Epidemic Algorithms, Lecture 1 30

Exponentially small deviation probabilities!!!

Proving Concentration: Chernoff Bounds

Theorem (McDiarmid, Azuma, Hoeffding…): Let 𝑋1, … ,𝑋𝑚 be independent random variables taking values in some sets 𝐴1, … ,𝐴𝑚. Let 𝑓:𝐴1 × ⋯× 𝐴𝑚 → ℝ. Assume that 𝑓 𝑉 − 𝑓 𝑏 ≤ 𝑐𝑖 whenever 𝑉, 𝑏 differ only in the 𝑖th

component. Then we have the additive tail bounds

Pr 𝑓 𝑋1, … ,𝑋𝑚 ≤ 𝐸 𝑓 𝑋1, … ,𝑋𝑚 − 𝜆 ≤ exp (−2𝜆2 ∑ 𝑐𝑖2𝑚𝑖=1⁄ ),

Pr 𝑓 𝑋1, … ,𝑋𝑚 ≥ 𝐸 𝑓 𝑋1, … ,𝑋𝑚 + 𝜆 ≤ exp (−2𝜆2 ∑ 𝑐𝑖2𝑚𝑖=1⁄ ).

Example: Application to phase 2:

independent rvs: random calls influence of one call on the number of informed nodes ≤ 1 apply Azuma with 𝑚 = 𝑖 and 𝑐𝑘 = 1:

Pr 𝑋 ≤ 𝐸 𝑋 − 𝑖0.75 ≤ exp(−2𝑖0.5) Alea'16, Epidemic Algorithms, Lecture 1 31

Proving Concentration: Method of Bounded Differences

Split the process into 4 phases that all reach their target with probability at least 1 − 𝑜 1 Phase 1: Perfect doubling up to 𝑜( 𝑛 ) informed nodes [birthday par.] Phase 2: Almost doubling [expectation and concentration]

Phase 1+2: Obtain 𝑖2 = 𝑛 log log𝑛 2⁄ informed nodes within 𝑇12 = log𝑛 + 2 log log𝑛 rounds

Phase 3: From 𝑖2 informed nodes to 𝑖2 uninformed nodes in time 𝑇3 = log log𝑛 4

Phase 4: Exponential shrinking of uninformed by a factor of at least 𝑒(1 − 𝑜(1)) 𝑇4 = ln (𝑛)(1 + 𝑜 1 ) rounds suffice for this phase [coupon collector]

Result: With prob. 1 − 𝑜(1), after T12 + 𝑇3 + 𝑇4 = (1 + 𝑜 1 )(log𝑛 + ln𝑛)

rounds all nodes are informed.


Summary: Analysis of Rumor Spreading in Complete Networks

Strong result for a simple algorithm: Randomized rumor spreading informs the complete graph on 𝑛 vertices with probability 1 − 𝑜(1) in only (1 + o 1 )(log2 𝑛 + ln𝑛) rounds

Simple methods cleverly used: coupon collector birthday paradox union bound Markov’s inequality Chebyshev’s inequality 1 + 𝑥 ≤ 𝑒𝑥 for all real numbers 𝑥, useful for |𝑥| small


Conclusion (almost)

Alea'16, Epidemic Algorithms, Lecture 1

Denote by 𝑆𝑛 the time randomized rumor spreading takes to inform all nodes of a complete graph on 𝑛 vertices.

Frieze&Grimmett’85: With probability 1 − 𝑜(1), 𝑆𝑛 = log2 𝑛 + ln𝑛 ± 𝑜(log𝑛). – the result shown in this lecture with a different proof

Pittel’87: Let ℎ = 𝜔 1 be arbitrary. Then with probability 1 − 𝑜(1), 𝑆𝑛 = log2 𝑛 + ln𝑛 ± 𝑜(ℎ(𝑛)).

D., Künnemann’14:

– 𝑆𝑛 is dominated by ⌈ log2 𝑛⌉ + 1𝑛𝐶𝑜𝐶𝑝𝑜𝑛𝐶𝑜𝐶𝐶 𝑛 + 2.6 + 𝐺𝑒𝑜𝑚(1 − 𝑜 1 ).

– 𝑆𝑛 is subdominated by ⌊ log2 𝑛⌋ + 1𝑛𝐶𝑜𝐶𝑝𝑜𝑛𝐶𝑜𝐶𝐶 𝑛

2→ 𝑛 − 1.

– log2𝑛 + ln 𝑛 − 1.116 ≤ 𝐸 𝑆𝑛 ≤ log2𝑛 + ln𝑛 + 2.765 + 𝑜 1 .

34

More Results: Rumor Spreading in Complete Graphs

Consider the following rumor spreading process in a complete network with 𝑛 nodes having public node labels 1, … ,𝑛. Everything is as in randomized rumor spreading except that each node 𝑘 in the first round after becoming informed does not call a random node, but calls node 𝑘 + 1 (mod n).

Theorem [D, Fouz ‘11]: after log𝑛 + 𝑋( log𝑛 1 2⁄ ) rounds, all nodes are informed w termination criterion: if nodes stop calling after having called 𝑟 times a

node that was already informed, then with prob. 1, all nodes become informed eventually with prob. 1, not more than 𝑟 + 1 𝑛 calls made in total with prob. 1 − 𝑜(1), all nodes become informed within

log𝑛 + 𝑋 log 𝑛𝑉

+ log𝑛 1 2⁄ rounds


Even Better Rumor Spreading

Merci

We frequently use so-called Landau symbols (big-Oh notation) to roughly describe runtimes or other quantities. You should (some of) them from earlier courses, but maybe not all (e.g., 𝑜 . ,𝜔 . , …).

Simple definition 𝑋(. ) (sufficient for most purpose in computer science): Let 𝑋 ⊆ ℝ be unbounded (arbitrary large numbers exist in 𝑋). Let 𝑓,𝑔:𝑋 → ℝ be real functions. Definition: We write 𝑓 = 𝑋(𝑔) if and only if

there are 𝑀, 𝑥0 ∈ ℝ such that 𝑓 𝑥 ≤ 𝑀 𝑔 𝑥 for all 𝑥 ≥ 𝑥0.

Warning: This “=“ is not symmetric. A better notation (used by some, but not many) would be 𝑓 ∈ 𝑋 𝑔 . Then 𝑋(𝑔) would be the class of all functions having asymptotically not larger growth than 𝑔.


Appendix: Reminder Big-Oh Notation (Landau Symbols)

Idea: This big-Oh notation makes a statement on the growth behavior of 𝑓 without caring about constant factors and lower order terms 0.01𝑛2 = 𝑋 𝑛2 and 100𝑛2 = 𝑋(𝑛2) 𝑛2 − 1000𝑛 = 𝑋(𝑛2) and 𝑛2 + 1000𝑛 = 𝑋 𝑛2

Friends of big-Oh:

𝑓 = Ω 𝑔 ∶⟺ 𝑔 = 𝑋 𝑓 𝑓 = Θ 𝑔 ∶⟺ 𝑓 = 𝑋 𝑔 ∧ 𝑓 = Ω(𝑔) 𝑓 = 𝑜 𝑔 ∶⟺ ∀𝜀 > 0 ∃𝑥0 ∈ ℝ ∀𝑥 ≥ 𝑥0: 𝑓 𝑥 ≤ 𝜀|𝑔 𝑥 | 𝑓 = 𝜔 𝑔 ∶⟺ 𝑔 = 𝑜(𝑓)

Notation: We use this also without writing down 𝑔 explicitly. E.g.,

𝑓 = 𝑋(𝑛2) With probability 1 − 𝑜 1 , …


Big-Oh Notation (2)

Lecture 1: Introduction to Gossip-based and Epidemic Algorithmskpanagio/ALEA/alea1_web.pdf ·...

Documents

Transcript of Lecture 1: Introduction to Gossip-based and Epidemic Algorithmskpanagio/ALEA/alea1_web.pdf ·...