Linear Sketching and Applications to DistributedComputation
Cameron Musco
November 7, 2014
Overview
Linear Sketches
I Johnson-Lindenstrauss Lemma
I Heavy-Hitters
Applications
I k-Means Clustering
I Spectral sparsification
Overview
Linear Sketches
I Johnson-Lindenstrauss Lemma
I Heavy-Hitters
Applications
I k-Means Clustering
I Spectral sparsification
Linear Sketching
I Randomly choose Π ∼ D
Linear Sketching
I Oblivious: Π chosen independently of x.
I Composable: Π(x + y) = Πx + Πy.
Linear Sketching
I Oblivious: Π chosen independently of x.
I Composable: Π(x + y) = Πx + Πy.
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
I Streaming algorithms with polylog(n) space.
I Frequency moments, heavy hitters, entropy estimation
1−20...50
→
1−10...50
→
1−10...56
→ ...→
14−23...04
Πx0 → Π(x0 + x1)→ ...→ Π(x0 + x1 + ...+ xn)
Linear Sketching
Two Main Tools
I Johnson Lindenstrauss sketches (randomized dimensionalityreduction, subspace embedding, etc..)
I Heavy-Hitters sketches (sparse recovery, compressive sensing,`p sampling, graph sketching, etc....)
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, x21 + ...+ x2n )...
N (0, x21 + ...+ x2n )
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, x21 + ...+ x2n )...
N (0, x21 + ...+ x2n )
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, ‖x‖22)...
N (0, ‖x‖22)
‖Πx‖22 =1
m
m∑i=1
N (0, ‖x‖22)2 ≈ε ‖x‖22
Johnson-Lindenstrauss Lemma
I Low Dimensional Embedding. n→ m = O(log(1/δ)/ε2)
I ‖Πx‖22 ≈ε ‖x‖22.
1√m
N (0, 1) ... N (0, 1)...
...N (0, 1) ... N (0, 1)
x1x2......xn
=1√m
N (0, ‖x‖22)...
N (0, ‖x‖22)
‖Πx‖22 =1
m
m∑i=1
N (0, ‖x‖22)2 ≈ε ‖x‖22
Johnson-Lindenstrauss Lemma
That’s it! - basic statistics
I Sparse constructions.
I ±1 replace Gaussians
I Small sketch representation - i.e. small random seed(otherwise storing Π takes more space than storing x)
Heavy-Hitters
I Count sketch, sparse recovery, `p sampling, point query, graphsketching, sparse fourier transform
I Simple idea: Hashing
Heavy-Hitters
I Random signs to deal with negative entries
I Repeat many times ‘decode’ heavy buckets
1 0 0 ... −1 0...
...0 −1 0 ... 0 1
x1x2......xn
=
c1...
c1/ε
...
1 0 0 ... −1 0...
...0 −1 0 ... 0 1
x1x2......xn
=
...c5...
c1/ε
Heavy-Hitters
I Random signs to deal with negative entries
I Repeat many times ‘decode’ heavy buckets
c1......
c1/ε
,
...c5...
c1/ε
,
...c20
...c1/ε
,
...
...
...c1/ε
I h1(2) = 1, h2(2) = 5, h3(2) = 20, h4(2) = 1/ε→ x22
‖x‖22≥ 1
ε
Heavy-Hitters
I Just random encoding
I polylog(n) to recover entires with 1log(n) of the total norm
I Basically best we could hope for.
I Random scalings gives `p sampling.
Application 1: k-means Clustering
I Assign points to k clusters
I k is fixed
I Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’
I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat
Application 1: k-means ClusteringLloyd’s algorithm aka the ‘k-means algorithm’
I Initalize random clustersI Compute centroidsI Assign each point to closest centroidI Repeat
Application 1: k-means Clustering
What if data is distributed?
Application 1: k-means ClusteringI At each iteration each server sends out new local centroidsI Adding them together, gives the new global centroids.I O(sdk) communication per iteration
Application 1: k-means Clustering
Can we do better?
I Balcan et al. 2013 - O((kd + sk)d)
I Locally computable O(kd + sk) sized coreset.
I All data is aggregated and k-means performed on singleserver.
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
Can we do even better?
I O((sd + sk)d)→ O((sd ′ + sk)d ′) for d ′ << d .
I Liang et al. 2013, Balcan et al. 2014 - O((sk + sk)k + sdk)
I CEMMP ’14: improved O(k/ε2) to dk/εe.
Application 1: k-means Clustering
Can we do better?
I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.
I Apply Johnson-Lindenstrauss!
I Goal: Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
I Equivalently all pairs distances:∑k
i=11|Ci |
∑(j ,k)∈Ci
‖ai − aj‖22
Application 1: k-means Clustering
Can we do better?
I O(sdk) inherent in communicating O(k) singular vectors ofdimension d to s servers.
I Apply Johnson-Lindenstrauss!
I Goal: Minimize distance to centroids:∑n
i=1 ‖ai − µC(i)‖22
I Equivalently all pairs distances:∑k
i=11|Ci |
∑(j ,k)∈Ci
‖ai − aj‖22
Application 1: k-means Clustering
I ‖ai − aj‖22 ≈ε ‖(ai − aj)Π‖22 = ‖aiΠ− ajΠ‖22I O(n2) distance vectors so set failure probability δ = 1
100∗n2 .
I Π needs O(log 1/δ/ε2) = O(log n/ε2) dimensions
Application 1: k-means Clustering
Immediately distributes - just need to share randomnessspecifying Π.
Application 1: k-means Clustering
Our Paper [Cohen Elder Musco Musco Persu 14]
I Show that Π only needs to have O(k/ε2) columns
I Almost completely removes dependence on input size!
I O(k3 + sk2 + log d) - log d gets swallowed in the word size.
Application 1: k-means Clustering
Highest Level Idea for how this works
I Show that the cost of projecting the columns AΠ to anyk-dimensional subspace approximates the cost of projecting Ato that subspace.
I Note that k-means can actually be viewed as a columnprojection problem.
I k-means clustering is ‘constrained’ PCA
I Lots of applications aside from k-means clustering.
Application 1: k-means Clustering
Open Questions
I (9 + ε)-approximation with only O(log k) dimensions! What isthe right answer?
I We use O(kd + sk) sized coresets blackbox and reduce d .Can we use our linear algebraic understanding to improvecoreset constructions? I feels like we should be able to.
I These algorithms should be practical. I think testing them outwould be useful - for both k-means and PCA.
I Other problems (spectral clustering, SVM, what do peopleactually do?)
Application 2: Spectral Sparsification
General Idea
I Approximate a dense graph with a much sparser graph.
I Reduce O(n2) edges → O(n log n) edges
Application 2: Spectral Sparsification
General Idea
I Approximate a dense graph with a much sparser graph.
I Reduce O(n2) edges → O(n log n) edges
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Application 2: Spectral Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Preserve every cut value to within (1± ε) factor
Applications: Minimum cut, sparsest cut, etc.
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -0e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96)
I Let B ∈ R(n2)×n be the vertex-edge incidence matrix for agraph G .
I Let x ∈ {0, 1}n be an “indicator vector” for some cut.
‖Bx‖22 = cut value
v1 v2 v3 v4
e12 1 -1 0 0e13 1 0 -1 0e14 0 0 0 0e23 0 1 -1 0e24 0 1 0 -1e34 0 0 0 0
B
×
1100x
=
010110
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.
GoalFind some B such that, for all x ∈ {0, 1}n,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
I x>B>Bx ≈ x>B>Bx.
I L = B>B is the graph Laplacian.
Graph Sparsification
Cut Sparsification (Benczur, Karger ’96) So, ‖Bx‖22 = cut value.
GoalFind some B such that, for all x ∈ {0, 1}n,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
I x>B>Bx ≈ x>B>Bx.
I L = B>B is the graph Laplacian.
Graph Sparsification
Spectral Sparsification (Spielman, Teng ’04)
GoalFind some B such that, for all x ∈ {0, 1}n Rn,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.
Graph Sparsification
Spectral Sparsification (Spielman, Teng ’04)
GoalFind some B such that, for all x ∈ {0, 1}n Rn,
(1− ε)‖Bx‖22 ≤ ‖Bx‖22 ≤ (1 + ε)‖Bx‖22
Applications: Anything cut sparsifiers can do, Laplacian systemsolves, computing effective resistances, spectral clustering,calculating random walk properties, etc.
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Randomly sample edges (i.e. rows from B):
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
Graph Sparsification
How are sparsifiers constructed?
Sampling probabilities:
I Connectivity for cut sparsifiers [Benczur, Karger ’96], [Fung,Hariharan, Harvey, Panigrahi ’11].
I Effective resistances (i.e statistical leverage scores) forspectral sparsifiers [Spielman, Srivastava ’08].
Actually oversample: by (effective resistance)× O(log n).Gives sparsifiers with O(n log n) edges – reducing from O(n2).
Application 2: Spectral Sparsification
Highest Level Idea Of Our Approach
Application 2: Spectral Sparsification
Why?
I Semi-streaming model with insertions and deletions
I Near optimal oblivious graph compression
I Distributed Graph Computations
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
Application 2: Spectral SparsificationDistributed Graph Computation
I Trinity, Pregel, Giraph
Application 2: Spectral Sparsification
I Naive to share my data: O(|Vi |n)
I With sketching: O(|Vi | logc n)
Application 2: Spectral SparsificationAlternatives to Sketching?
I Simulate message passing algorithms over the nodes - this iswhat’s done in practice.
Application 2: Spectral Sparsification
Alternatives to Sketching?
I Koutis ’14 gives distributed algorithm for spectralsparsification
I Iteratively computes O(log n) spanners (alternatively, lowstretch trees) to upper bound effective resistances and sampleedges.
I Combinatorial and local
Application 2: Spectral Sparsification
I Cost per spanner: O(log2 n) rounds, O(m log n) messages,O(log n) message size.
I If simulating, each server sends O(δ(Vi ) log n) per round.
I O(δ(Vi ) log n) beats our bound of O(|Vi | log n) iffδ(Vi ) ≤ |Vi |
I But in that case, just keep all your outgoing edges andsparsify locally! At worst adds n edges to the final sparsifier.
Application 2: Spectral Sparsification
Moral of That Story?
I I’m not sure.
I Sparsifiers are very strong. Could we do better for otherproblems?
I Can we reduce communication of simulated distributedprotocols using sparsifiers?
I What other things can sketches be applied to? Biggest openquestion is distances - spanners, etc.
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
Sketching a Sparsifier
We are still going to sample by effective resistance.
I Treat graph as resistor network, each edge has resistance 1.
I Flow 1 unit of current from node i to j and measure voltagedrop between the nodes.
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
Sketching a Sparsifier
Using standard V = IR equations:
If xe =
100-10
, e’s effective resistance is τe = x>e L−1xe .
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
Sketching a Sparsifier
Effective resistance of edge e is τe = x>e L−1xe .
Alternatively, τe is the eth entry in the vector:
BL−1xe
AND
τe = x>e L−1xe = x>e (L−1)>B>BL−1xe = ‖BL−1xe‖22
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
Sketching a Sparsifier
We just need two more ingredients: BL−1xe
`2 Heavy Hitters [GLPS10]:
I Sketch vector poly(n) vector in polylog(n) space.
I Extract any element who’s square is a O(1/ log n) fraction ofthe vector’s squared norm.
Coarse Sparsifier:
I L such that x>Lx = (1± constant)x>Lx
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
Sketching a Sparsifier
Putting it all together: BL−1xe
1. Sketch (Πheavy hitters)B in n logc n space.
2. Compute (Πheavy hitters)BL−1.
3. For every possible edge e, compute (Πheavy hitters)BL−1xe
4. Extract heavy hitters from the vector, check if eth entry is one.
BL−1xe(e)2
‖BL−1xe‖22≈ τ2eτe
= τe
So, as long as τe > O(1/ log n), we will recover the edge!
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
Sketching a Sparsifier
How about edges with lower effective resistance? Sketch:
BL−1xe
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sketching a Sparsifier
BL−1xeHow about edges with lower effective resistance?
I First level: τe > 1/ log n with probability 1.
I Second level: τe > 1/2 log n with probability 1/2.
I Third level: τe > 1/4 log n with probability 1/4.
I Forth level: τe > 1/8 log n with probability 1/8.
I ...
So, we can sample every edge by (effective resistance)× O(log n).
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
Sparsifer Chain
Final Piece [Li, Miller, Peng ’12]
I We needed a constant error spectral sparsifier to get our(1± ε) sparsifier.
Top Related