Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

42
Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute

Transcript of Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Page 1: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Locality Sensitive Distributed Computing

Exercise Set 2David Peleg

Weizmann Institute

Page 2: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Basic partition construction algorithm

Simple distributed implementation for Algorithm BasicPart

Single “thread” of computation(single locus of activity at any given moment)

Page 3: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Basic partition construction algorithm

Components

ClusterCons :Procedure for constructing a cluster around a chosen center vNextCtr :Procedure for selecting the next center v around which to grow a clusterRepEdge :Procedure for selecting a representative inter-cluster edge between any two adjacent clusters

Page 4: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Cluster construction procedure ClusterCons

Goal: Invoked at center v, construct cluster and BFS tree (rooted at v) spanning it

Tool: Variant of Dijkstra's algorithm.

Page 5: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Recall: Dijkstra’s BFS algorithm

phase p+1:

Page 6: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Main changes to Algorithm DistDijk

1. Ignoring covered vertices:Global BFS algorithm sends exploration msgs to all neighbors save those known to be in tree

New variant ignores also vertices known to belong to previously constructed clusters

2. Bounding depth: BFS tree grown to limited depth, adding new layers tentatively, based on halting condition(|(S)| < |S|·n1/k)

Page 7: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed Implementation

Before deciding to expand tree T by adding newly discovered layer L: Count # vertices in L by convergecast process:

• Leaf w T: set Zw = # new children in L• Internal vertex: add and upcast counts.

Page 8: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed Implementation

• Root: compare final count Zv to total # vertices in T (known from previous phase).

- If ratio ≥ n1/k, then broadcast next Pulse msg(confirm new layer and start next phase)

- Otherwise, broadcast message Reject(reject new layer, complete current cluster)

Final broadcast step has 2 more goals:- mark cluster by unique name (e.g., ID of root),- inform all vertices of new cluster name

Page 9: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed Implementation (cont)

This information is used to define cluster borders.

I.e., once cluster is complete, each vertex in itinforms all neighbors of its new residence.

nodes of cluster under construction know which neighbors already belong to existing clusters.

Page 10: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Center selection procedure NextCtr

Fact: Algorithm's “center of activity” always located at currently constructed cluster C.

Idea: Select as center for next cluster some vertex v adjacent to C (= v from rejected layer)

Implementation: Via convergecast process.

(leaf: pick arbitrary neighbor from rejected layer, upcast to parent

internal node: upcast arbitrary candidate)

Page 11: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Center selection procedure (NextCtr)

Problem: What if rejected layer is empty?

(It might still be that the entire process is not yet complete: there may be some yet unclustered nodes elsewhere in G)

r0

??

Page 12: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Center selection procedure (NextCtr)

Solution: Traverse the graph(using cluster construction procedure within a global search procedure)

r0

Page 13: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed Implementation

Use DFS algorithm for traversing the tree ofconstructed cluster.• Start at originator vertex r0, invoke

ClusterCons to construct the first cluster. • Whenever the rejected layer is nonempty,

choose one rejected vertex as next cluster center

• Each cluster center marks a parent cluster in the cluster DFS tree, namely, the cluster from which it was selected

Page 14: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed Implementation (cont)

DFS algorithm (cont):• Once the search cannot progress forward

(rejected layer is empty) :the DFS backtracks to previous cluster and looks for new center among neighboring nodes

• If no neighbors are available, the DFS process continues backtracking on the cluster DFS tree

Page 15: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Inter-cluster edge selection RepEdgeGoal: Select one representative inter-cluster edge between every two adjacent clusters C and C'

r0

E(C,C') = edges connecting C and C'

(known to endpoints in C, as C vertices know the cluster-residence of each neighbor)

Page 16: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Inter-cluster edge selection RepEdge

Representative edge can be selected by convergecast process on all edges of E(C,C').

Requirement: C and C' must select same edgeSolution: Using unique ordering of edges -pick minimum E(C,C') edge.

Q: Define unique edge order by unique ID's?

Page 17: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Inter-cluster edge selection (RepEdge)

E.g., Define ID-weight of edge e=(v,w), where ID(v) < ID(w), as pair

h ID(v),ID(w) i,

and order ID-weights lexicographically;

This ensures distinct weights and allows consistent selection of inter-cluster edges

Page 18: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Inter-cluster edge selection (RepEdge)

Problem: Cluster C must carry selection process for every adjacent cluster C' individually

Solution: • Inform each C vertex of identities of all clusters

adjacent to C by convergecast + broadcast• Pipeline individual selection processes

Page 19: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis

(C1,C2,...,Cp) = clusters constructed by algorithm

For cluster Ci:Ei = edges with at least one endpoint in Ci

ni = |Ci|, mi = |Ei|, ri=Rad(Ci)

Page 20: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (cont)ClusterCons:Depth-bounded Dijkstra procedure constructs Ci and BFS tree in: O(ri

2) time and O(niri + mi) messages

Q: Prove O(n) bound

Time(ClusterCons) = ∑i O(ri

2)

≤ ∑i O(rik)

≤ k ∑i O(ni) = O(kn)

Page 21: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (cont)

Ci and BFS tree cost: O(ri

2) time and O(niri + mi) messages

Comm(ClusterCons) = ∑i O(niri + mi)

Each edge occurs in ≤ 2 distinct sets Ei, hence

Comm(ClusterCons) = O(nk + |E|)

Page 22: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (NextCtr)

DFS process on the cluster tree is more expensive than plain DFS:

visiting cluster Ci and deciding the next step requires O(ri) time and O(ni) comm.

DFS step

DFS step

Deciding next step

Page 23: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (NextCtr)

DFS visits clusters in cluster tree O(p) times

Entire DFS process (not counting Procedure ClusterCons invocations) requires:

• Time(NextCtr) = O(pk) = O(nk)

• Comm(NextCtr) = O(pn) = O(n2)

Page 24: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (RepEdge)

si = # neighboring clusters surrounding Ci

Convergecasting ID of neighboring cluster C' in Ci costs O(ri) time and O(ni) messages

For all si neighboring clusters:

O(si+ri) time (pipelining) O(sini) messages

Page 25: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (RepEdge)

Pipelined inter-cluster edge selection – similar.

As si ≤ n, we get

Time(RepEdge) = maxi O(si + ri) = O(n)

Comm(RepEdge) = ∑i O(si ni) = O(n2)

Page 26: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis

Thm: Distributed Algorithm BasicPart requires

Time = O(nk)

Comm = O(n2)

Page 27: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Sparse spanners

Example - m-dimensional hypercube:

Hm=(Vm,Em), Vm={0,1}m,Em = {(x,y) | x and y differ in exactly one bit}|Vm|=2m, |Em|=m 2m-1, diameter m

Ex: Prove that for every m ≥ 0, the m-cube has a 3-spanner with # edges ≤ 7·2m

Page 28: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional Matchings

Locality sensitive tool for distributed match-making

Page 29: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Distributed match making

Paradigm for establishing client-server connection in a distributed system (via specified rendezvous locations in the network)

Ads of server v: written in locations Write(v) v

client u: reads ads in locations Read(u)

u

Page 30: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional MatchingsRequirement:“read” and “write” sets must intersect:

for every v,u V, Write(v) Å Read(u) ≠

v

u

Write(v)

Read(u)

Client u must find an ad of server v

Page 31: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional Matchings (cont)

Distance considerations taken into account:Client u must find an ad of server vonly if they are sufficiently close

-regional matching:“read” and “write” sets = { Read(v) , Write(v) | vV }s.t. for every v,uV,dist(u,v) ≤

Write(v) Å Read(u) ≠

Page 32: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional Matchings (cont)

Degree parameters:

write() = maxvV |Write(v)|

read() = maxvV |Read(v)|

Page 33: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional Matchings (cont)

Radius parameters:

Strwrite() = maxu,vV { dist(u,v) | u Write(v) } /

Strread() = maxu,vV { dist(u,v) | u Read(v)} /

Page 34: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional matching construction

[Given graph G, k, ≥ 1, construct regional matching ,k]

1. Set s(V)

(-neighborhood cover)

Page 35: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional matching construction

2. Build coarsening cover as in Max-Deg-Cover Thm

Page 36: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional matching construction

3. Select a center vertex r0(T) in each cluster T

Page 37: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional matching construction

4. Select for every v a cluster Tv s.t. (v) Tv

v (v)

Tv=T1

Page 38: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Regional matching construction

5. Set Read(v) = {r0(T) | vT}Write(v) = {r0(Tv)}

v (v)

T1

Read(v) = {r1,r2,r3}Write(v) = {r1}

r1

T2 T3

r2 r3

Page 39: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis

Claim:Resulting ,k is an -regional matching.

Proof:Consider u,v such that dist(u,v) ≤ Let Tv be cluster s.t. Write(v) = {r0(Tv)}

Page 40: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (cont)

By definition, u (v).

Also (v) Tv

u Tv

r0(Tv) Read(u)

Read(u) Å Write(v) ≠

Page 41: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (cont)

Thm:For every graph G(V,E,), ,k≥1, there is an -regional matching ,k with

read(,k) ≤ 2k n1/k

write(,k) = 1

Strread(,k) ≤ 2k+1

Strwrite(,k) ≤ 2k+1

Page 42: Locality Sensitive Distributed Computing Exercise Set 2 David Peleg Weizmann Institute.

Analysis (cont)

Taking k=log n we get

Corollary: For every graph G(V,E,), ≥1, there is an -regional matching with

• read() = O(log n)

• write() = 1

• Strread() = O(log n)

• Strwrite() = O(log n)