Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...

125
Linear Sketching and Applications to Distributed Computation Cameron Musco November 7, 2014

Transcript of Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...

Page 1: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching and Applications to DistributedComputation

Cameron Musco

November 7, 2014

Page 2: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Overview

Linear Sketches

I Johnson-Lindenstrauss Lemma

I Heavy-Hitters

Applications

I k-Means Clustering

I Spectral sparsification

Page 3: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Overview

Linear Sketches

I Johnson-Lindenstrauss Lemma

I Heavy-Hitters

Applications

I k-Means Clustering

I Spectral sparsification

Page 4: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Randomly choose Π ∼ D

Page 5: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Oblivious: Π chosen independently of x.

I Composable: Π(x + y) = Πx + Πy.

Page 6: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Oblivious: Π chosen independently of x.

I Composable: Π(x + y) = Πx + Πy.

Page 7: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 8: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 9: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 10: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 11: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 12: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Page 13: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Linear Sketching

Two Main Tools

I Johnson Lindenstrauss sketches (randomized dimensionalityreduction, subspace embedding, etc..)

I Heavy-Hitters sketches (sparse recovery, compressive sensing,`p sampling, graph sketching, etc....)

Page 14: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, x21 + ...+ x2n )...

N (0, x21 + ...+ x2n )

Page 15: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, x21 + ...+ x2n )...

N (0, x21 + ...+ x2n )

Page 16: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, ‖x‖22)...

N (0, ‖x‖22)

‖Πx‖22 =1

m

m∑i=1

N (0, ‖x‖22)2 ≈ε ‖x‖22

Page 17: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, ‖x‖22)...

N (0, ‖x‖22)

‖Πx‖22 =1

m

m∑i=1

N (0, ‖x‖22)2 ≈ε ‖x‖22

Page 18: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Johnson-Lindenstrauss Lemma

That’s it! - basic statistics

I Sparse constructions.

I ±1 replace Gaussians

I Small sketch representation - i.e. small random seed(otherwise storing Π takes more space than storing x)

Page 19: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Heavy-Hitters

I Count sketch, sparse recovery, `p sampling, point query, graphsketching, sparse fourier transform

I Simple idea: Hashing

Page 20: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Heavy-Hitters

I Random signs to deal with negative entries

I Repeat many times ‘decode’ heavy buckets

1 0 0 ... −1 0...

...0 −1 0 ... 0 1

x1x2......xn

=

c1...

c1/ε

...

1 0 0 ... −1 0...

...0 −1 0 ... 0 1

x1x2......xn

=

...c5...

c1/ε

Page 21: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Heavy-Hitters

I Random signs to deal with negative entries

I Repeat many times ‘decode’ heavy buckets

c1......

c1/ε

,

...c5...

c1/ε

,

...c20

...c1/ε

,

...

...

...c1/ε

I h1(2) = 1, h2(2) = 5, h3(2) = 20, h4(2) = 1/ε→ x22

‖x‖22≥ 1

ε

Page 22: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Heavy-Hitters

I Just random encoding

I polylog(n) to recover entires with 1log(n) of the total norm

I Basically best we could hope for.

I Random scalings gives `p sampling.

Page 23: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

I Assign points to k clusters

I k is fixed

I Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

Page 24: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’

I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat

Page 25: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’

I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat

Page 26: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

What if data is distributed?

Page 27: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means ClusteringI At each iteration each server sends out new local centroidsI Adding them together, gives the new global centroids.I O(sdk) communication per iteration

Page 28: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Can we do better?

I Balcan et al. 2013 - O((kd + sk)d)

I Locally computable O(kd + sk) sized coreset.

I All data is aggregated and k-means performed on singleserver.

Page 29: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Page 30: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Page 31: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Page 32: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Page 33: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Page 34: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Can we do better?

I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.

I Apply Johnson-Lindenstrauss!

I Goal: Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

I Equivalently all pairs distances:∑k

i=11|Ci |

∑(j ,k)∈Ci

‖ai − aj‖22

Page 35: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Can we do better?

I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.

I Apply Johnson-Lindenstrauss!

I Goal: Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

I Equivalently all pairs distances:∑k

i=11|Ci |

∑(j ,k)∈Ci

‖ai − aj‖22

Page 36: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

I ‖ai − aj‖22 ≈ε ‖(ai − aj)Π‖22 = ‖aiΠ− ajΠ‖22I O(n2) distance vectors so set failure probability δ = 1

100∗n2 .

I Π needs O(log 1/δ/ε2) = O(log n/ε2) dimensions

Page 37: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Immediately distributes - just need to share randomnessspecifying Π.

Page 38: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Our Paper [Cohen Elder Musco Musco Persu 14]

I Show that Π only needs to have O(k/ε2) columns

I Almost completely removes dependence on input size!

I O(k3 + sk2 + log d) - log d gets swallowed in the word size.

Page 39: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Highest Level Idea for how this works

I Show that the cost of projecting the columns AΠ to anyk-dimensional subspace approximates the cost of projecting Ato that subspace.

I Note that k-means can actually be viewed as a columnprojection problem.

I k-means clustering is ‘constrained’ PCA

I Lots of applications aside from k-means clustering.

Page 40: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 1: k-means Clustering

Open Questions

I (9 + ε)-approximation with only O(log k) dimensions! What isthe right answer?

I We use O(kd + sk) sized coresets blackbox and reduce d .Can we use our linear algebraic understanding to improvecoreset constructions? I feels like we should be able to.

I These algorithms should be practical. I think testing them outwould be useful - for both k-means and PCA.

I Other problems (spectral clustering, SVM, what do peopleactually do?)

Page 41: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

General Idea

I Approximate a dense graph with a much sparser graph.

I Reduce O(n2) edges → O(n log n) edges

Page 42: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

General Idea

I Approximate a dense graph with a much sparser graph.

I Reduce O(n2) edges → O(n log n) edges

Page 43: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 44: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 45: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 46: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 47: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 48: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 49: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 50: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 51: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Page 52: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 53: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 54: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 55: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 56: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 57: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -0e34 0 0 0 0

B

×

1100x

=

010110

Page 58: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 59: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 60: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 61: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 62: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 63: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 64: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Page 65: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.

GoalFind some B such that, for all x ∈ {0, 1}n,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

I x>B>Bx ≈ x>B>Bx.

I L = B>B is the graph Laplacian.

Page 66: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.

GoalFind some B such that, for all x ∈ {0, 1}n,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

I x>B>Bx ≈ x>B>Bx.

I L = B>B is the graph Laplacian.

Page 67: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Spectral Sparsification (Spielman, Teng ’04)

GoalFind some B such that, for all x ∈ {0, 1}n Rn,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.

Page 68: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

Spectral Sparsification (Spielman, Teng ’04)

GoalFind some B such that, for all x ∈ {0, 1}n Rn,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.

Page 69: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 70: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 71: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 72: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 73: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 74: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 75: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 76: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 77: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Page 78: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Page 79: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Page 80: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Page 81: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Highest Level Idea Of Our Approach

Page 82: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Why?

I Semi-streaming model with insertions and deletions

I Near optimal oblivious graph compression

I Distributed Graph Computations

Page 83: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Page 84: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Page 85: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Page 86: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Page 87: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

I Naive to share my data: O(|Vi |n)

I With sketching: O(|Vi | logc n)

Page 88: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral SparsificationAlternatives to Sketching?

I Simulate message passing algorithms over the nodes - this iswhat’s done in practice.

Page 89: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Alternatives to Sketching?

I Koutis ’14 gives distributed algorithm for spectralsparsification

I Iteratively computes O(log n) spanners (alternatively, lowstretch trees) to upper bound effective resistances and sampleedges.

I Combinatorial and local

Page 90: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

I Cost per spanner: O(log2 n) rounds, O(m log n) messages,O(log n) message size.

I If simulating, each server sends O(δ(Vi ) log n) per round.

I O(δ(Vi ) log n) beats our bound of O(|Vi | log n) iffδ(Vi ) ≤ |Vi |

I But in that case, just keep all your outgoing edges andsparsify locally! At worst adds n edges to the final sparsifier.

Page 91: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Application 2: Spectral Sparsification

Moral of That Story?

I I’m not sure.

I Sparsifiers are very strong. Could we do better for otherproblems?

I Can we reduce communication of simulated distributedprotocols using sparsifiers?

I What other things can sketches be applied to? Biggest openquestion is distances - spanners, etc.

Page 92: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Page 93: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Page 94: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Page 95: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Page 96: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Page 97: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Page 98: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Page 99: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Page 100: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Page 101: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Page 102: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Page 103: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Page 104: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Page 105: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Page 106: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Page 107: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Page 108: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Page 109: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Page 110: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Page 111: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Page 112: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Page 113: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Page 114: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 115: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 116: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 117: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 118: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 119: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 120: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Page 121: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Page 122: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Page 123: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Page 124: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Page 125: Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering Can we do better? I O(sdk) inherent in communicating O(k) singular vectors of dimension

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.