Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph...
Transcript of Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph...
![Page 1: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/1.jpg)
Parallel Algorithms for Geometric Graph Problems
Grigory Yaroslavtsev 361 Levine
http://grigory.us
STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.
![Page 2: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/2.jpg)
Theory Seminar: Fall 14
โข Fridays 12pmโ1pm
โข 512 Levine / 612 Levine
โ Homepage: http://theory.cis.upenn.edu/seminar/
โ Google Calendar: see link above
โข Announcements:
โ Theory list: http://lists.seas.upenn.edu/mailman/listinfo/theory-group
![Page 3: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/3.jpg)
โThe Big Bang Data Theoryโ
What should TCS say about big data?
โข Usually:
โ Running time: (almost) linear, sublinear, โฆ
โ Space: linear, sublinear, โฆ
โ Approximation: 1 + ๐ , best possible, โฆ
โ Randomness: as little as possible, โฆ
โข Special focus today: round complexity
![Page 4: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/4.jpg)
Round Complexity
Information-theoretic measure of performance
โข Tools from information theory (Shannonโ48)
โข Unconditional results (lower bounds)
Example today:
โข Approximating Geometric Graph Problems
![Page 5: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/5.jpg)
Approximation in Graphs
1930-50s: Given a graph and an optimization problemโฆ
Transportation Problem: Tolstoi [1930]
Minimum Cut (RAND): Harris and Ross [1955] (declassified, 1999)
![Page 6: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/6.jpg)
Approximation in Graphs
1960s: Single processor, main memory (IBM 360)
![Page 7: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/7.jpg)
Approximation in Graphs
1970s: NP-complete problem โ hard to solve exactly in time polynomial in the input size
โBlack Bookโ
![Page 8: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/8.jpg)
Approximation in Graphs
Approximate with multiplicative error ๐ถ on the worst-case graph ๐บ:
๐๐๐ฅ๐บ ๐ด๐๐๐๐๐๐กโ๐(๐บ)
๐๐๐ก๐๐๐ข๐(๐บ)โค ๐ถ
Generic methods:
โข Linear programming
โข Semidefinite programming
โข Hierarchies of linear and semidefinite programs
โข Sum-of-squares hierarchies
โข โฆ
![Page 9: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/9.jpg)
The New: Approximating Geometric Problems in Parallel Models
1930-70s to 2014
![Page 10: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/10.jpg)
The New: Approximating Geometric Problems in Parallel Models
Geometric graph (implicit):
Euclidean distances between n points in โ๐
Already have solutions for old NP-hard problems (Traveling Salesman, Steiner Tree, etc.)
โข Minimum Spanning Tree (clustering, vision)
โข Minimum Cost Bichromatic Matching (vision)
![Page 11: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/11.jpg)
Polynomial time (โeasyโ)
โข Minimum Spanning Tree
โข Earth-Mover Distance =
Min Weight Bi-chromatic Matching
NP-hard (โhardโ)
โข Steiner Tree
โข Traveling Salesman
โข Clustering (k-medians, facility location, etc.)
Geometric Graph Problems
Combinatorial problems on graphs in โ๐
Arora-Mitchell-style โDivide and Conquerโ, easy to implement in Massively Parallel Computational Models
Need new theory!
![Page 12: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/12.jpg)
MST: Single Linkage Clustering โข *Zahnโ71+ Clustering via MST (Single-linkage):
k clusters: remove ๐ โ ๐ longest edges from MST
โข Maximizes minimum intercluster distance
[Kleinberg, Tardos]
![Page 13: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/13.jpg)
Earth-Mover Distance
โข Computer vision: compare two pictures of moving objects (stars, MRI scans)
![Page 14: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/14.jpg)
Computational Model โข Input: n points in a d-dimensional space (d constant)
โข ๐ด machines, space ๐บ on each (๐บ = ๐๐ผ , 0 < ๐ผ < 1 )
โ Constant overhead in total space: ๐ด โ ๐บ = ๐(๐)
โข Output: solution to a geometric problem (size O(๐))
โ Doesnโt fit on a single machine (๐บ โช ๐)
๐ด machines S space
๐๐ง๐ฉ๐ฎ๐ญ: ๐ points โ โ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ: ๐ ๐๐ง๐ ๐ถ(๐)
![Page 15: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/15.jpg)
๐ด machines S space
Computational Model โข Computation/Communication in ๐น rounds:
โ Every machine performs a near-linear time computation => Total running time ๐(๐๐+๐(๐)๐น)
โ Every machine sends/receives at most ๐บ bits of information => Total communication ๐(๐๐น).
Goal: Minimize ๐น. Our work: ๐น = constant.
๐ถ(๐บ๐+๐(๐)) time
โค ๐บ bits
![Page 16: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/16.jpg)
MapReduce-style computations
What I wonโt discuss today โข PRAMs (shared memory, multiple processors) (see
e.g. [Karloff, Suri, Vassilvitskiiโ10+) โ Computing XOR requires ฮฉ (log ๐) rounds in CRCW PRAM โ Can be done in ๐(log๐ ๐) rounds of MapReduce
โข Pregel-style systems, Distributed Hash Tables (see e.g. Ashish Goelโs class notes and papers)
โข Lower-level implementation details (see e.g. Rajaraman-Leskovec-Ullman book)
![Page 17: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/17.jpg)
Models of parallel computation โข Bulk-Synchronous Parallel Model (BSP) [Valiant,90]
Pro: Most general, generalizes all other models
Con: Many parameters, hard to design algorithms
โข Massive Parallel Computation [Feldman-Muthukrishnan-Sidiropoulos-Stein-Svitkinaโ07, Karloff-Suri-Vassilvitskiiโ10, Goodrich-Sitchinava-Zhangโ11, ..., Beame, Koutris, Suciuโ13]
Pros:
โข Inspired by modern systems (Hadoop, MapReduce, Dryad, โฆ )
โข Few parameters, simple to design algorithms
โข New algorithmic ideas, robust to the exact model specification
โข # Rounds is an information-theoretic measure => can prove unconditional lower bounds
โข Between linear sketching and streaming with sorting
![Page 18: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/18.jpg)
Previous work
โข Dense graphs vs. sparse graphs
โ Dense: ๐บ โซ ๐ (or ๐บ โซ solution size)
โFilteringโ (Output fits on a single machine) [Karloff, Suri Vassilvitskii, SODAโ10; Ene, Im, Moseley, KDDโ11; Lattanzi, Moseley, Suri, Vassilvitskii, SPAAโ11; Suri, Vassilvitskii, WWWโ11+
โ Sparse: ๐บ โช ๐ (or ๐บ โช solution size)
Sparse graph problems appear hard (Big open question: (s,t)-connectivity in o(log ๐) rounds?)
VS.
![Page 19: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/19.jpg)
Large geometric graphs โข Graph algorithms: Dense graphs vs. sparse graphs
โ Dense: ๐บ โซ ๐.
โ Sparse: ๐บ โช ๐.
โข Our setting: โ Dense graphs, sparsely represented: O(n) space
โ Output doesnโt fit on one machine (๐บ โช ๐)
โข Today: (1 + ๐)-approximate MST โ ๐ = 2 (easy to generalize)
โ ๐น = log๐บ ๐= O(1) rounds (๐บ = ๐๐(๐))
![Page 20: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/20.jpg)
๐(log ๐)-MST in ๐ = ๐(log ๐) rounds
โข Assume points have integer coordinates 0,โฆ , ฮ , where ฮ = ๐ ๐๐ .
Impose an ๐(log ๐)-depth quadtree Bottom-up: For each cell in the quadtree
โ compute optimum MSTs in subcells โ Use only one representative from each cell on the next level
Wrong representative: O(1)-approximation per level
![Page 21: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/21.jpg)
Wrong representative: O(1)-approximation per level
๐๐ณ-nets โข ๐๐ณ-net for a cell C with side length ๐ณ:
Collection S of vertices in C, every vertex is at distance <= ๐๐ณ from some vertex in S. (Fact: Can efficiently compute ๐-net of size ๐
1
๐2 )
Bottom-up: For each cell in the quadtree โ Compute optimum MSTs in subcells โ Use ๐๐ณ-net from each cell on the next level
โข Idea: Pay only O(๐๐ณ) for an edge cut by cell with side ๐ณ โข Randomly shift the quadtree:
Pr ๐๐ข๐ก ๐๐๐๐ ๐๐ ๐๐๐๐๐ก๐ โ ๐๐ฆ ๐ณ โผ โ/๐ณ โ charge errors
๐ณ ๐ณ ๐๐ณ
![Page 22: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/22.jpg)
Randomly shifted quadtree โข Top cell shifted by a random vector in 0, ๐ณ 2
Impose a randomly shifted quadtree (top cell length ๐๐ซ)
Bottom-up: For each cell in the quadtree
โ Compute optimum MSTs in subcells
โ Use ๐๐ณ-net from each cell on the next level
Pay 5 instead of 4 Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐(1)
2
1
๐๐๐ ๐๐ฎ๐ญ
![Page 23: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/23.jpg)
1 + ๐ -MST in ๐ = ๐(log ๐) rounds โข Idea: Only use short edges inside the cells
Impose a randomly shifted quadtree (top cell length ๐๐ซ
๐ )
Bottom-up: For each node (cell) in the quadtree
โ compute optimum Minimum Spanning Forests in subcells, using edges of length โค ๐๐ณ
โ Use only ๐๐๐ณ-net from each cell on the next level
Sketch of analysis (๐ปโ = optimum MST): ๐ผ[Extra cost] = ๐ผ[ Pr ๐ ๐๐ ๐๐ข๐ก ๐๐ฆ ๐๐๐๐ ๐ค๐๐ก๐ ๐ ๐๐๐ ๐ณ โ ๐๐ณ๐โ๐ปโ ]
โค ๐ log ๐ ๐ ๐
๐โ๐ปโ
=
๐ log ๐ โ ๐๐๐ ๐ก(๐ปโ)
2
1
Pr[๐๐๐ ๐๐ฎ๐ญ] = ๐ถ(๐)
๐ณ = ๐(๐
๐)
![Page 24: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/24.jpg)
1 + ๐ -MST in ๐ = ๐(1) rounds
โข ๐(log ๐) rounds => O(log๐บ ๐) = O(1) rounds
โ Flatten the tree: ( ๐ ร ๐)-grids instead of (2x2) grids at each level.
Impose a randomly shifted ( ๐ ร ๐)-tree
Bottom-up: For each node (cell) in the tree
โ compute optimum MSTs in subcells via edges of length โค ๐๐ณ
โ Use only ๐๐๐ณ-net from each cell on the next level
โ ๐ = ๐ฮฉ(1)
![Page 25: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/25.jpg)
1 + ๐ -MST in ๐ = ๐(1) rounds
Theorem: Let ๐ = # levels in a random tree P ๐ผ๐ท ๐๐๐ โค 1 + ๐ ๐๐๐ ๐๐๐
Proof (sketch): โข ๐ซ๐ท(๐ข, ๐ฃ) = cell length, which first partitions (๐ข, ๐ฃ)
โข New weights: ๐๐ท ๐ข, ๐ฃ = ๐ข โ ๐ฃ2+ ๐๐ซ๐ท ๐ข, ๐ฃ
๐ข โ ๐ฃ
2โค ๐ผ๐ท[๐๐ท ๐ข, ๐ฃ ] โค 1 + ๐ ๐๐๐ ๐ข โ ๐ฃ
2
โข Our algorithm implements Kruskal for weights ๐๐ท
๐ข ๐ฃ
๐ซ๐ท ๐ข, ๐ฃ
![Page 26: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/26.jpg)
โSolve-And-Sketchโ Framework
(1 + ๐)-MST:
โ โLoad balancingโ: partition the tree into parts of the same size
โ Almost linear time: Approximate Nearest Neighbor data structure *Indykโ99+
โ Dependence on dimension d (size of ๐-net is ๐๐
๐
๐ )
โ Generalizes to bounded doubling dimension
โ Basic version is teachable (Jelani Nelsonโs ``Big Dataโโ class at Harvard)
โ Implementation in progressโฆ
![Page 27: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/27.jpg)
โSolve-And-Sketchโ Framework
(1 + ๐)-Earth-Mover Distance, Transportation Cost
โข No simple โdivide-and-conquerโ Arora-Mitchell-style algorithm (unlike for general matching)
โข Only recently sequential 1 + ๐ -apprxoimation in
๐๐ ๐ log๐ 1 ๐ time [Sharathkumar, Agarwal โ12]
Our approach (convex sketching):
โข Switch to the flow-based version
โข In every cell, send the flow to the closest net-point until we can connect the net points
![Page 28: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/28.jpg)
โSolve-And-Sketchโ Framework
Convex sketching the cost function for ๐ net points
โข ๐น:โ๐โ1 โ โ = the cost of routing fixed amounts of flow through the net points
โข Function ๐นโ = ๐น + โnormalizationโ is monotone, convex and Lipschitz, (1 + ๐)-approximates ๐น
โข We can (1 + ๐)-sketch it using a lower convex hull
![Page 29: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ Already have solutions for old NP-hard](https://reader034.fdocuments.in/reader034/viewer/2022042605/5f58fd07fd833b4893483984/html5/thumbnails/29.jpg)
Thank you! http://grigory.us
Open problems:
โข Exetension to high dimensions? โ Probably no, reduce from connectivity => conditional
lower bound โถ ฮฉ log ๐ rounds for MST in โโ๐
โ The difficult setting is ๐ = ฮ(log๐) (can do JL)
โข Streaming alg for EMD and Transporation Cost?
โข Our work: โ First near-linear time algorithm for Transportation
Cost
โ Is it possible to reconstruct the solution itself?