Topics in Algorithms and Data Science Random Graphs (2...

49
Topics in Algorithms and Data Science Random Graphs (2 nd part) Omid Etesami

Transcript of Topics in Algorithms and Data Science Random Graphs (2...

Page 1: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Topics in Algorithms and Data Science

Random Graphs (2nd part)

Omid Etesami

Page 2: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Phase transitions for CNF-SAT

Page 3: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Phase transitions for other random structures

• We already saw phase transitions for random graphs

• Other random structures, like Boolean formula in conjunctive normal form (CNF), also have phase transitions

Page 4: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Random k-CNF formula

• n variables

• m clauses

• k literals per clause (k constant)

• literal = variable or negation

• each clause independently chosen from possible clauses.

• Unsatisfiability is an increasing property, so it has phase transition.

Page 5: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Satisfiability conjecture

• Conjecture. There is a constant rk such that m = rkn is a sharp threshold for satisfiability.

The conjecture was recently proved for large k by Ding, Sly, Sun!

Page 6: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Upper bound on rk

• Let m = cn.

• Each truth assignment satisfies the CNF with probability (1 – 2-k)cn.

• The probability that the CNF is satisfiable is at most 2n(1 – 2-k)cn.

• Thus rk ≤ 2k ln 2.

3-SAT solution space (height represents # of unsatisfied constraints)!

Page 7: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Lower bound on rk

• Lower bound more difficult. 2nd moment method doesn’t work.

• We focus on k = 3.

• Smallest Clause (SC) heuristic finds a satisfying solution almost surely when m = cn and constant c < 2/3. Thus r3 ≥ 2/3.

Page 8: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Smallest Clause (SC) heuristic

While not all clauses satisfied

assign true to a random literal in a random smallest-length clause

delete satisfied clauses; delete unsatisfied literals.

If a 0-length clause is ever found, we have failed.

Page 9: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Queue of 1-literal and 2-literal clauses

• While queue is not empty, a member of the queue is satisfied.

• Setting a literal to true, may add other clauses to the queue.

• We will show that while the queue is non-empty, the arrival rate is less than the departure rate.

Page 10: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Principle of deferred decisions

• We pretend that we do not know the literals appearing in each clause.

• During the algorithm, we only know the size of each clause.

Page 11: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Queue arrival rate

• When the t’th literal is assigned value, each 3-literal clause is added to the queue with probability 3/(2(n-t+1)).

• (With the same probability, the clause is satisfied.)

• Therefore, the average # of clauses added to the queue at each step is at most 3(cn – t + 1)/(2(n-t+1)) = 1 – Ω(1).

Page 12: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

The waiting time is O(lg n)

Thm. The # steps any clause remains in the queue is Ω(lg n) with probability at most 1/n3.

The probability that the queue is empty at step t and remains non-empty in steps t, t + 1, …, t + s - 1 is at most exp(-Ω(s)) by multiplicative Chernoff bound: the # arrivals should be at least s while mean # arrivals is s(1 – Ω(1)).

(We upper-bound # arrivals with sum of independent Bernoullies.)

There are only n choices for t. Therefore for suitable choice of s0 = Ө(lg n), any non-empty episode is of length at most s0 with probability 1 – 1/n3.

Page 13: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

The probability that setting a literal in the i’th clause makes the j’th clause false is o(1/n2)

If this trouble happens, then

• either of i’th or j’th clause is added to the queue at some step t,

• j’th clause consists of 1 literal when trouble happens,

• by SC rule i’th clause also consists of 1 literal when its literals is assigned,

• with probability 1 – 1/n3 the waiting time for both clauses is O(lg n).

If a1, a2, … is the sequence of literals that would be set to true (if clauses i and j didn’t exist), then 4 of the literals in these two clauses are the negation of the literals in at, at+1, …, at’ for t’ = t + O(lg n).

This happens with probability O((ln 4 n)/n4) times # choices for t.

Page 14: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Since there are O(n2) pairs of clauses, the algorithm fails with probability o(1) by union bound.

Page 15: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Nonuniform models of random graphs

Page 16: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Nonuniform models

• Fix a degree distribution: there is f(d) vertices of degree d

• Choose a random graph among all graphs with this degree distribution

• Edges are no longer independent

Page 17: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Degree distribution: vertex perspective vs edge perspective • Consider a graph where half of vertices have degree 1, half have degree 2

• A random vertex is equally likely of degree 1 or 2

• A random vertex of a random edge is twice more

likely to be of degree 2

• In many algorithms, we traverse a random edge

to reach an endpoint: the probability of reaching

a vertex of degree i is then proportional to i λi ,

where λi is the fraction of vertices of degree i

Page 18: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Giant component in random graphs with given degree distribution

Page 19: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

[Molloy, Reed] There will be a giant component iff • Intuition: Consider BFS (branching process) from a fixed vertex.

• After the first level, a vertex of degree i has exactly i – 1 children.

• The branching process has probability of extinction < 1 iff the expected # children E[i – 1] ≥ 1, or in other words E[i – 2] >= 0.

• In calculating the expectation, the probability of degree i is from the edge perspective (and not the vertex perspective). Thus it is proportional to i λi.

Page 20: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Example: G(n, p=1/n)

Page 21: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Poisson degree distribution

If vertices have Poisson degree distribution with mean d,

then

random endpoint of a random edge has degree distribution

1 + Poisson(d).

Page 22: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Growth model without preferential attachment

Page 23: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Growing graphs

• Vertices and edges are added over time.

• Preferential attachment = selecting endpoints for a new edge with probability proportional to degrees

• Without preferential attachment = selecting endpoints for a new edge uniformly at random from the set of existing vertices

With preferential attachment

Page 24: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Basic growth model without preferential attachment • Start with zero vertices and zero edges

• At each time t, add a new vertex

• With probability δ, join two random vertices by an edge

The resulting graph may become a multigraph.

But since there are t2 pairs of vertices and O(t) existing edges, a multiple edge or self-loop happens at each step with small probability, and we ignore these cases.

new vertex

new edge

Page 25: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

# vertices of each degree

Let dk(t) be expected # vertices of degree k at time t.

new vertex

new edge

Page 26: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

degree distribution

Let dk(t) = pkt in the limit as t tends to infinity.

Geometric distribution which like the Poisson

Erdos-Renyi distribution falls off exponentially fast,

unlike preferential attachment power-law.

Page 27: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

# components of each finite size

Let nk(t) be expected # components of size k at time t

• A randomly picked component is of size k with probability proportional to nk(t)

• A randomly picked vertex is in a component of size k with probability equal to k nk(t)

Components of size 4 and 2

Page 28: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Recurrence relation for nk(t)

• We use expectations instead of actual # of components of each size?!

• We ignore edges falling inside components since we are interested in small component sizes.

j vertices

k – j vertices

Page 29: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Recurrence relation for ak=nk(t) / t

j vertices

k – j vertices

Page 30: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Phase transition for non-finite components

Page 31: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Size of non-finite components below critical threshold

Page 32: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Summary of phase transition

Page 33: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Comparison with static random graph having degree distribution

• Could you explain why giant components appear for smaller δ in the grown model?

Page 34: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Why is δ = 1/4 the threshold for static model?

Page 35: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Growth model with preferential attachment

Page 36: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Description of the model

• Begin with empty graph

• At each time, add a new vertex

and with probability δ, attach the new vertex

to a vertex selected at random

with probability proportional to its degree

Obviously the graph has no cycles.

Page 37: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Degree of vertex i at time t

Let di(t) be the degree of vertex i at time t

Thus di(t) = a t1/2.

Since di(i) = δ,

we have di(t) = δ (t/i)1/2.

Page 38: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Power-law degree distribution

Vertex number tδ2/d2 has degree d.

Therefore, # of vertices of degree d is

In other words, probability of degree d is 2δ2/d3.

Page 39: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Small world graphs

Page 40: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Milgram’s experiment

• Ask one in Nebraska to send a letter to one in

Massachusetts with given address and occupation

• At each step, send to someone you know on a

“first name” basis who is closer

• In successful experiments, it took 5 or 6 steps

• Called “six degrees of separation”

Page 41: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

The Kleinberg model for random graphs

• n × n grid with local and global edges

• From each vertex u, there is a long-distance edge

to a vertex v

• Vertex v is chosen with probability proportional to

d(u,v)-r where distance is Manhattan distance.

Page 42: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Normalization factor

• Let .

• # nodes of distance k from u is at most 4k.

• # nodes of distance k from u is at least k for k ≤ n/2.

• We have

• cr(u) = Θ(1) when r > 2.

• cr(u) = Θ(lg n) when r = 2.

• cr(u) = Ω(n2-r) when r < 2.

Page 43: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

No short (polylogarithmic) paths exist when r > 2. • Expected # of edges connecting vertices of distance ≥ d* is

• Thus, with high probability there is no edge connecting vertices at distance at least d* for some d* = n1-Ω(1).

• Since many pairs of vertices are at distance Ω(n) from each other, the shortest path between these pairs is at least nΩ(1).

A pair of vertices with distance Ω(n)

Page 44: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Local algorithm when r = 2

The algorithm is local and greedy:

At each step follow the edge that takes us closest

to the target.

Target w

Page 45: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Analysis of the algorithm

Claim: with high probability for any pair of vertices u, t,

within O(ln2 n) steps the distance from u to t decreases by half:

Proof: If distance between u and t is k, there are Θ(k2) vertices at distance ≤ k/2 from t.

All these vertices are at distance Θ(k) from u.

Thus, with probability Θ(k2 k-r/cr(u))=Θ(1/ln n), there is an edge to a vertex half distant to t.

Repeating this process Θ(ln2 n) steps, by independence of edges, we succeed with probability o(1/n4).

Since there are n4 pairs, by union bound, we succeed for every pair (u, t).

Page 46: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Local algorithm when r = 2 takes polylogarithmic steps

Since the distance is at most 2n in the beginning

and halves every O(ln2 n) steps,

we reach the target within O(ln3 n) steps.

Target w

Page 47: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

No local algorithm finds polylogarithmic paths when r < 2

Take u and t at distance ≥ nδ.

We show any local algorithm with high probability takes ≥ nδ steps to go from u to t for small constant δ > 0.

Otherwise, the algorithm should use an edge that takes us to a point at distance < nδ.

At each step, this happens with probability ≤ O(n2δ/cr) = O(n-2+r+2δ), since we cannot plan on the outgoing edges of vertices we haven’t yet visited in a local algorithm.

Since we should find such an edge in the first nδ steps, we can find it only with probability O(n-2+r+3δ), which is o(1) for small δ.

local algorithm for finding short path does not exist, despite existence of short paths

Page 48: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Proof that logarithmic paths exist when r = 0

• We show the diameter is O(lg n) in a way similar to the proof for Erdos-Renyi.

• Partition the grid into 3×3 squares: Now there are 9 non-local edges going out of each square.

• There are Ω(lg n) squares at distance Ө(lg1/2 n)

of any square.

• W.h.p. the non-local neighbors of these squares

are at least twice these squares,

since 9 > 2 and by Chernoff bound.

Page 49: Topics in Algorithms and Data Science Random Graphs (2 part)ce.sharif.edu/courses/94-95/1/ce797-1/resources/... · Topics in Algorithms and Data Science Random Graphs (2nd part) Omid

Proof that logarithmic paths exist when r = 0 (continued) • Similarly, one can show that while half of squares have not been

visited, the neighbors visited at each level is at least twice the number in the previous level

(since # outgoing edges × fraction of remaining squares ≥ 9 × 1/2 > 2)

• Therefore more than half the squares are can be reached with O(lg n) edges from any vertex.

• Any two sets consisting of more than half the squares have nonempty intersection. Q.E.D.