Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...

Post on 15-Oct-2020

1 views 0 download

Transcript of Linear Sketching and Applications to Distributed Computation · Application 1: k-means Clustering...

Linear Sketching and Applications to DistributedComputation

Cameron Musco

November 7, 2014

Overview

Linear Sketches

I Johnson-Lindenstrauss Lemma

I Heavy-Hitters

Applications

I k-Means Clustering

I Spectral sparsification

Overview

Linear Sketches

I Johnson-Lindenstrauss Lemma

I Heavy-Hitters

Applications

I k-Means Clustering

I Spectral sparsification

Linear Sketching

I Randomly choose Π ∼ D

Linear Sketching

I Oblivious: Π chosen independently of x.

I Composable: Π(x + y) = Πx + Πy.

Linear Sketching

I Oblivious: Π chosen independently of x.

I Composable: Π(x + y) = Πx + Πy.

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

I Streaming algorithms with polylog(n) space.

I Frequency moments, heavy hitters, entropy estimation

1−20...50

1−10...50

1−10...56

→ ...→

14−23...04

Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)

Linear Sketching

Two Main Tools

I Johnson Lindenstrauss sketches (randomized dimensionalityreduction, subspace embedding, etc..)

I Heavy-Hitters sketches (sparse recovery, compressive sensing,`p sampling, graph sketching, etc....)

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, x21 + ...+ x2n )...

N (0, x21 + ...+ x2n )

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, x21 + ...+ x2n )...

N (0, x21 + ...+ x2n )

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, ‖x‖22)...

N (0, ‖x‖22)

‖Πx‖22 =1

m

m∑i=1

N (0, ‖x‖22)2 ≈ε ‖x‖22

Johnson-Lindenstrauss Lemma

I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)

I ‖Πx‖22 ≈ε ‖x‖22.

1√m

N (0, 1) ... N (0, 1)...

...N (0, 1) ... N (0, 1)

x1x2......xn

=1√m

N (0, ‖x‖22)...

N (0, ‖x‖22)

‖Πx‖22 =1

m

m∑i=1

N (0, ‖x‖22)2 ≈ε ‖x‖22

Johnson-Lindenstrauss Lemma

That’s it! - basic statistics

I Sparse constructions.

I ±1 replace Gaussians

I Small sketch representation - i.e. small random seed(otherwise storing Π takes more space than storing x)

Heavy-Hitters

I Count sketch, sparse recovery, `p sampling, point query, graphsketching, sparse fourier transform

I Simple idea: Hashing

Heavy-Hitters

I Random signs to deal with negative entries

I Repeat many times ‘decode’ heavy buckets

1 0 0 ... −1 0...

...0 −1 0 ... 0 1

x1x2......xn

=

c1...

c1/ε

...

1 0 0 ... −1 0...

...0 −1 0 ... 0 1

x1x2......xn

=

...c5...

c1/ε

Heavy-Hitters

I Random signs to deal with negative entries

I Repeat many times ‘decode’ heavy buckets

c1......

c1/ε

,

...c5...

c1/ε

,

...c20

...c1/ε

,

...

...

...c1/ε

I h1(2) = 1, h2(2) = 5, h3(2) = 20, h4(2) = 1/ε→ x22

‖x‖22≥ 1

ε

Heavy-Hitters

I Just random encoding

I polylog(n) to recover entires with 1log(n) of the total norm

I Basically best we could hope for.

I Random scalings gives `p sampling.

Application 1: k-means Clustering

I Assign points to k clusters

I k is fixed

I Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’

I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat

Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’

I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat

Application 1: k-means Clustering

What if data is distributed?

Application 1: k-means ClusteringI At each iteration each server sends out new local centroidsI Adding them together, gives the new global centroids.I O(sdk) communication per iteration

Application 1: k-means Clustering

Can we do better?

I Balcan et al. 2013 - O((kd + sk)d)

I Locally computable O(kd + sk) sized coreset.

I All data is aggregated and k-means performed on singleserver.

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Can we do even better?

I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .

I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)

I CEMMP ’14: improved O(k/ε2) to dk/εe.

Application 1: k-means Clustering

Can we do better?

I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.

I Apply Johnson-Lindenstrauss!

I Goal: Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

I Equivalently all pairs distances:∑k

i=11|Ci |

∑(j ,k)∈Ci

‖ai − aj‖22

Application 1: k-means Clustering

Can we do better?

I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.

I Apply Johnson-Lindenstrauss!

I Goal: Minimize distance to centroids:∑n

i=1 ‖ai − µC(i)‖22

I Equivalently all pairs distances:∑k

i=11|Ci |

∑(j ,k)∈Ci

‖ai − aj‖22

Application 1: k-means Clustering

I ‖ai − aj‖22 ≈ε ‖(ai − aj)Π‖22 = ‖aiΠ− ajΠ‖22I O(n2) distance vectors so set failure probability δ = 1

100∗n2 .

I Π needs O(log 1/δ/ε2) = O(log n/ε2) dimensions

Application 1: k-means Clustering

Immediately distributes - just need to share randomnessspecifying Π.

Application 1: k-means Clustering

Our Paper [Cohen Elder Musco Musco Persu 14]

I Show that Π only needs to have O(k/ε2) columns

I Almost completely removes dependence on input size!

I O(k3 + sk2 + log d) - log d gets swallowed in the word size.

Application 1: k-means Clustering

Highest Level Idea for how this works

I Show that the cost of projecting the columns AΠ to anyk-dimensional subspace approximates the cost of projecting Ato that subspace.

I Note that k-means can actually be viewed as a columnprojection problem.

I k-means clustering is ‘constrained’ PCA

I Lots of applications aside from k-means clustering.

Application 1: k-means Clustering

Open Questions

I (9 + ε)-approximation with only O(log k) dimensions! What isthe right answer?

I We use O(kd + sk) sized coresets blackbox and reduce d .Can we use our linear algebraic understanding to improvecoreset constructions? I feels like we should be able to.

I These algorithms should be practical. I think testing them outwould be useful - for both k-means and PCA.

I Other problems (spectral clustering, SVM, what do peopleactually do?)

Application 2: Spectral Sparsification

General Idea

I Approximate a dense graph with a much sparser graph.

I Reduce O(n2) edges → O(n log n) edges

Application 2: Spectral Sparsification

General Idea

I Approximate a dense graph with a much sparser graph.

I Reduce O(n2) edges → O(n log n) edges

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Application 2: Spectral Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Preserve every cut value to within (1± ε) factor

Applications: Minimum cut, sparsest cut, etc.

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -0e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96)

I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .

I Let x ∈ {0, 1}n be an “indicator vector” for some cut.

‖Bx‖22 = cut value

v1 v2 v3 v4

e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0

B

×

1100x

=

010110

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.

GoalFind some B such that, for all x ∈ {0, 1}n,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

I x>B>Bx ≈ x>B>Bx.

I L = B>B is the graph Laplacian.

Graph Sparsification

Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.

GoalFind some B such that, for all x ∈ {0, 1}n,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

I x>B>Bx ≈ x>B>Bx.

I L = B>B is the graph Laplacian.

Graph Sparsification

Spectral Sparsification (Spielman, Teng ’04)

GoalFind some B such that, for all x ∈ {0, 1}n Rn,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.

Graph Sparsification

Spectral Sparsification (Spielman, Teng ’04)

GoalFind some B such that, for all x ∈ {0, 1}n Rn,

(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22

Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Randomly sample edges (i.e. rows from B):

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Graph Sparsification

How are sparsifiers constructed?

Sampling probabilities:

I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].

I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].

Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).

Application 2: Spectral Sparsification

Highest Level Idea Of Our Approach

Application 2: Spectral Sparsification

Why?

I Semi-streaming model with insertions and deletions

I Near optimal oblivious graph compression

I Distributed Graph Computations

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Application 2: Spectral SparsificationDistributed Graph Computation

I Trinity, Pregel, Giraph

Application 2: Spectral Sparsification

I Naive to share my data: O(|Vi |n)

I With sketching: O(|Vi | logc n)

Application 2: Spectral SparsificationAlternatives to Sketching?

I Simulate message passing algorithms over the nodes - this iswhat’s done in practice.

Application 2: Spectral Sparsification

Alternatives to Sketching?

I Koutis ’14 gives distributed algorithm for spectralsparsification

I Iteratively computes O(log n) spanners (alternatively, lowstretch trees) to upper bound effective resistances and sampleedges.

I Combinatorial and local

Application 2: Spectral Sparsification

I Cost per spanner: O(log2 n) rounds, O(m log n) messages,O(log n) message size.

I If simulating, each server sends O(δ(Vi ) log n) per round.

I O(δ(Vi ) log n) beats our bound of O(|Vi | log n) iffδ(Vi ) ≤ |Vi |

I But in that case, just keep all your outgoing edges andsparsify locally! At worst adds n edges to the final sparsifier.

Application 2: Spectral Sparsification

Moral of That Story?

I I’m not sure.

I Sparsifiers are very strong. Could we do better for otherproblems?

I Can we reduce communication of simulated distributedprotocols using sparsifiers?

I What other things can sketches be applied to? Biggest openquestion is distances - spanners, etc.

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Sketching a Sparsifier

We are still going to sample by effective resistance.

I Treat graph as resistor network, each edge has resistance 1.

I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Sketching a Sparsifier

Using standard V = IR equations:

If xe =

100-10

, e’s effective resistance is τe = x>e L−1xe .

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Sketching a Sparsifier

Effective resistance of edge e is τe = x>e L−1xe .

Alternatively, τe is the eth entry in the vector:

BL−1xe

AND

τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Sketching a Sparsifier

We just need two more ingredients: BL−1xe

`2 Heavy Hitters [GLPS10]:

I Sketch vector poly(n) vector in polylog(n) space.

I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.

Coarse Sparsifier:

I L such that x>Lx = (1± constant)x>Lx

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Sketching a Sparsifier

Putting it all together: BL−1xe

1. Sketch (Πheavy hitters)B in n logc n space.

2. Compute (Πheavy hitters)BL−1.

3. For every possible edge e, compute (Πheavy hitters)BL−1xe

4. Extract heavy hitters from the vector, check if eth entry is one.

BL−1xe(e)2

‖BL−1xe‖22≈ τ2eτe

= τe

So, as long as τe > O(1/ log n), we will recover the edge!

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Sketching a Sparsifier

How about edges with lower effective resistance? Sketch:

BL−1xe

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sketching a Sparsifier

BL−1xeHow about edges with lower effective resistance?

I First level: τe > 1/ log n with probability 1.

I Second level: τe > 1/2 log n with probability 1/2.

I Third level: τe > 1/4 log n with probability 1/4.

I Forth level: τe > 1/8 log n with probability 1/8.

I ...

So, we can sample every edge by (effective resistance)× O(log n).

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.

Sparsifer Chain

Final Piece [Li, Miller, Peng ’12]

I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.