Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph...

29
Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev 361 Levine http://grigory.us STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.

Transcript of Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph...

Page 1: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Parallel Algorithms for Geometric Graph Problems

Grigory Yaroslavtsev 361 Levine

http://grigory.us

STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.

Page 2: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Theory Seminar: Fall 14

โ€ข Fridays 12pmโˆ’1pm

โ€ข 512 Levine / 612 Levine

โ€“ Homepage: http://theory.cis.upenn.edu/seminar/

โ€“ Google Calendar: see link above

โ€ข Announcements:

โ€“ Theory list: http://lists.seas.upenn.edu/mailman/listinfo/theory-group

Page 3: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

โ€œThe Big Bang Data Theoryโ€

What should TCS say about big data?

โ€ข Usually:

โ€“ Running time: (almost) linear, sublinear, โ€ฆ

โ€“ Space: linear, sublinear, โ€ฆ

โ€“ Approximation: 1 + ๐œ– , best possible, โ€ฆ

โ€“ Randomness: as little as possible, โ€ฆ

โ€ข Special focus today: round complexity

Page 4: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Round Complexity

Information-theoretic measure of performance

โ€ข Tools from information theory (Shannonโ€™48)

โ€ข Unconditional results (lower bounds)

Example today:

โ€ข Approximating Geometric Graph Problems

Page 5: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Approximation in Graphs

1930-50s: Given a graph and an optimization problemโ€ฆ

Transportation Problem: Tolstoi [1930]

Minimum Cut (RAND): Harris and Ross [1955] (declassified, 1999)

Page 6: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Approximation in Graphs

1960s: Single processor, main memory (IBM 360)

Page 7: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Approximation in Graphs

1970s: NP-complete problem โ€“ hard to solve exactly in time polynomial in the input size

โ€œBlack Bookโ€

Page 8: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Approximation in Graphs

Approximate with multiplicative error ๐œถ on the worst-case graph ๐บ:

๐‘š๐‘Ž๐‘ฅ๐บ ๐ด๐‘™๐‘”๐‘œ๐‘Ÿ๐‘–๐‘กโ„Ž๐‘š(๐บ)

๐‘‚๐‘๐‘ก๐‘–๐‘š๐‘ข๐‘š(๐บ)โ‰ค ๐œถ

Generic methods:

โ€ข Linear programming

โ€ข Semidefinite programming

โ€ข Hierarchies of linear and semidefinite programs

โ€ข Sum-of-squares hierarchies

โ€ข โ€ฆ

Page 9: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

The New: Approximating Geometric Problems in Parallel Models

1930-70s to 2014

Page 10: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

The New: Approximating Geometric Problems in Parallel Models

Geometric graph (implicit):

Euclidean distances between n points in โ„๐’…

Already have solutions for old NP-hard problems (Traveling Salesman, Steiner Tree, etc.)

โ€ข Minimum Spanning Tree (clustering, vision)

โ€ข Minimum Cost Bichromatic Matching (vision)

Page 11: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Polynomial time (โ€œeasyโ€)

โ€ข Minimum Spanning Tree

โ€ข Earth-Mover Distance =

Min Weight Bi-chromatic Matching

NP-hard (โ€œhardโ€)

โ€ข Steiner Tree

โ€ข Traveling Salesman

โ€ข Clustering (k-medians, facility location, etc.)

Geometric Graph Problems

Combinatorial problems on graphs in โ„๐’…

Arora-Mitchell-style โ€œDivide and Conquerโ€, easy to implement in Massively Parallel Computational Models

Need new theory!

Page 12: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

MST: Single Linkage Clustering โ€ข *Zahnโ€™71+ Clustering via MST (Single-linkage):

k clusters: remove ๐’Œ โˆ’ ๐Ÿ longest edges from MST

โ€ข Maximizes minimum intercluster distance

[Kleinberg, Tardos]

Page 13: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Earth-Mover Distance

โ€ข Computer vision: compare two pictures of moving objects (stars, MRI scans)

Page 14: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Computational Model โ€ข Input: n points in a d-dimensional space (d constant)

โ€ข ๐‘ด machines, space ๐‘บ on each (๐‘บ = ๐’๐›ผ , 0 < ๐›ผ < 1 )

โ€“ Constant overhead in total space: ๐‘ด โ‹… ๐‘บ = ๐‘‚(๐’)

โ€ข Output: solution to a geometric problem (size O(๐’))

โ€“ Doesnโ€™t fit on a single machine (๐‘บ โ‰ช ๐’)

๐‘ด machines S space

๐ˆ๐ง๐ฉ๐ฎ๐ญ: ๐’ points โ‡’ โ‡’ ๐Ž๐ฎ๐ญ๐ฉ๐ฎ๐ญ: ๐‘ ๐‘–๐‘ง๐‘’ ๐‘ถ(๐’)

Page 15: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

๐‘ด machines S space

Computational Model โ€ข Computation/Communication in ๐‘น rounds:

โ€“ Every machine performs a near-linear time computation => Total running time ๐‘‚(๐’๐Ÿ+๐’(๐Ÿ)๐‘น)

โ€“ Every machine sends/receives at most ๐‘บ bits of information => Total communication ๐‘‚(๐’๐‘น).

Goal: Minimize ๐‘น. Our work: ๐‘น = constant.

๐‘ถ(๐‘บ๐Ÿ+๐’(๐Ÿ)) time

โ‰ค ๐‘บ bits

Page 16: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

MapReduce-style computations

What I wonโ€™t discuss today โ€ข PRAMs (shared memory, multiple processors) (see

e.g. [Karloff, Suri, Vassilvitskiiโ€˜10+) โ€“ Computing XOR requires ฮฉ (log ๐‘›) rounds in CRCW PRAM โ€“ Can be done in ๐‘‚(log๐’” ๐‘›) rounds of MapReduce

โ€ข Pregel-style systems, Distributed Hash Tables (see e.g. Ashish Goelโ€™s class notes and papers)

โ€ข Lower-level implementation details (see e.g. Rajaraman-Leskovec-Ullman book)

Page 17: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Models of parallel computation โ€ข Bulk-Synchronous Parallel Model (BSP) [Valiant,90]

Pro: Most general, generalizes all other models

Con: Many parameters, hard to design algorithms

โ€ข Massive Parallel Computation [Feldman-Muthukrishnan-Sidiropoulos-Stein-Svitkinaโ€™07, Karloff-Suri-Vassilvitskiiโ€™10, Goodrich-Sitchinava-Zhangโ€™11, ..., Beame, Koutris, Suciuโ€™13]

Pros:

โ€ข Inspired by modern systems (Hadoop, MapReduce, Dryad, โ€ฆ )

โ€ข Few parameters, simple to design algorithms

โ€ข New algorithmic ideas, robust to the exact model specification

โ€ข # Rounds is an information-theoretic measure => can prove unconditional lower bounds

โ€ข Between linear sketching and streaming with sorting

Page 18: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Previous work

โ€ข Dense graphs vs. sparse graphs

โ€“ Dense: ๐‘บ โ‰ซ ๐’ (or ๐‘บ โ‰ซ solution size)

โ€œFilteringโ€ (Output fits on a single machine) [Karloff, Suri Vassilvitskii, SODAโ€™10; Ene, Im, Moseley, KDDโ€™11; Lattanzi, Moseley, Suri, Vassilvitskii, SPAAโ€™11; Suri, Vassilvitskii, WWWโ€™11+

โ€“ Sparse: ๐‘บ โ‰ช ๐’ (or ๐‘บ โ‰ช solution size)

Sparse graph problems appear hard (Big open question: (s,t)-connectivity in o(log ๐‘›) rounds?)

VS.

Page 19: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Large geometric graphs โ€ข Graph algorithms: Dense graphs vs. sparse graphs

โ€“ Dense: ๐‘บ โ‰ซ ๐’.

โ€“ Sparse: ๐‘บ โ‰ช ๐’.

โ€ข Our setting: โ€“ Dense graphs, sparsely represented: O(n) space

โ€“ Output doesnโ€™t fit on one machine (๐‘บ โ‰ช ๐’)

โ€ข Today: (1 + ๐œ–)-approximate MST โ€“ ๐’… = 2 (easy to generalize)

โ€“ ๐‘น = log๐‘บ ๐’= O(1) rounds (๐‘บ = ๐’๐›€(๐Ÿ))

Page 20: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

๐‘‚(log ๐‘›)-MST in ๐‘… = ๐‘‚(log ๐‘›) rounds

โ€ข Assume points have integer coordinates 0,โ€ฆ , ฮ” , where ฮ” = ๐‘‚ ๐’๐Ÿ .

Impose an ๐‘‚(log ๐’)-depth quadtree Bottom-up: For each cell in the quadtree

โ€“ compute optimum MSTs in subcells โ€“ Use only one representative from each cell on the next level

Wrong representative: O(1)-approximation per level

Page 21: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Wrong representative: O(1)-approximation per level

๐๐‘ณ-nets โ€ข ๐๐‘ณ-net for a cell C with side length ๐‘ณ:

Collection S of vertices in C, every vertex is at distance <= ๐๐‘ณ from some vertex in S. (Fact: Can efficiently compute ๐-net of size ๐‘‚

1

๐2 )

Bottom-up: For each cell in the quadtree โ€“ Compute optimum MSTs in subcells โ€“ Use ๐๐‘ณ-net from each cell on the next level

โ€ข Idea: Pay only O(๐๐‘ณ) for an edge cut by cell with side ๐‘ณ โ€ข Randomly shift the quadtree:

Pr ๐‘๐‘ข๐‘ก ๐‘’๐‘‘๐‘”๐‘’ ๐‘œ๐‘“ ๐‘™๐‘’๐‘›๐‘”๐‘ก๐‘• โ„“ ๐‘๐‘ฆ ๐‘ณ โˆผ โ„“/๐‘ณ โ€“ charge errors

๐‘ณ ๐‘ณ ๐œ–๐‘ณ

Page 22: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Randomly shifted quadtree โ€ข Top cell shifted by a random vector in 0, ๐‘ณ 2

Impose a randomly shifted quadtree (top cell length ๐Ÿ๐šซ)

Bottom-up: For each cell in the quadtree

โ€“ Compute optimum MSTs in subcells

โ€“ Use ๐๐‘ณ-net from each cell on the next level

Pay 5 instead of 4 Pr[๐๐š๐ ๐‚๐ฎ๐ญ] = ๐›€(1)

2

1

๐๐š๐ ๐‚๐ฎ๐ญ

Page 23: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

1 + ๐ -MST in ๐‘ = ๐‘‚(log ๐‘›) rounds โ€ข Idea: Only use short edges inside the cells

Impose a randomly shifted quadtree (top cell length ๐Ÿ๐šซ

๐ )

Bottom-up: For each node (cell) in the quadtree

โ€“ compute optimum Minimum Spanning Forests in subcells, using edges of length โ‰ค ๐๐‘ณ

โ€“ Use only ๐๐Ÿ๐‘ณ-net from each cell on the next level

Sketch of analysis (๐‘ปโˆ— = optimum MST): ๐”ผ[Extra cost] = ๐”ผ[ Pr ๐’† ๐‘–๐‘  ๐‘๐‘ข๐‘ก ๐‘๐‘ฆ ๐‘๐‘’๐‘™๐‘™ ๐‘ค๐‘–๐‘ก๐‘• ๐‘ ๐‘–๐‘‘๐‘’ ๐‘ณ โ‹… ๐๐‘ณ๐’†โˆˆ๐‘ปโˆ— ]

โ‰ค ๐ log ๐’ ๐‘‘ ๐’†

๐’†โˆˆ๐‘ปโˆ—

=

๐ log ๐’ โ‹… ๐‘๐‘œ๐‘ ๐‘ก(๐‘ปโˆ—)

2

1

Pr[๐๐š๐ ๐‚๐ฎ๐ญ] = ๐‘ถ(๐)

๐‘ณ = ๐›€(๐Ÿ

๐)

Page 24: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

1 + ๐ -MST in ๐‘ = ๐‘‚(1) rounds

โ€ข ๐‘‚(log ๐’) rounds => O(log๐‘บ ๐’) = O(1) rounds

โ€“ Flatten the tree: ( ๐’Ž ร— ๐’Ž)-grids instead of (2x2) grids at each level.

Impose a randomly shifted ( ๐’Ž ร— ๐’Ž)-tree

Bottom-up: For each node (cell) in the tree

โ€“ compute optimum MSTs in subcells via edges of length โ‰ค ๐๐‘ณ

โ€“ Use only ๐๐Ÿ๐‘ณ-net from each cell on the next level

โ‡’ ๐’Ž = ๐’ฮฉ(1)

Page 25: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

1 + ๐ -MST in ๐‘ = ๐‘‚(1) rounds

Theorem: Let ๐’ = # levels in a random tree P ๐”ผ๐‘ท ๐€๐‹๐† โ‰ค 1 + ๐‘‚ ๐๐’๐’… ๐Ž๐๐“

Proof (sketch): โ€ข ๐šซ๐‘ท(๐‘ข, ๐‘ฃ) = cell length, which first partitions (๐‘ข, ๐‘ฃ)

โ€ข New weights: ๐’˜๐‘ท ๐‘ข, ๐‘ฃ = ๐‘ข โˆ’ ๐‘ฃ2+ ๐๐šซ๐‘ท ๐‘ข, ๐‘ฃ

๐‘ข โˆ’ ๐‘ฃ

2โ‰ค ๐”ผ๐‘ท[๐’˜๐‘ท ๐‘ข, ๐‘ฃ ] โ‰ค 1 + ๐‘‚ ๐๐’๐’… ๐‘ข โˆ’ ๐‘ฃ

2

โ€ข Our algorithm implements Kruskal for weights ๐’˜๐‘ท

๐‘ข ๐‘ฃ

๐šซ๐‘ท ๐‘ข, ๐‘ฃ

Page 26: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

โ€œSolve-And-Sketchโ€ Framework

(1 + ๐œ–)-MST:

โ€“ โ€œLoad balancingโ€: partition the tree into parts of the same size

โ€“ Almost linear time: Approximate Nearest Neighbor data structure *Indykโ€™99+

โ€“ Dependence on dimension d (size of ๐-net is ๐‘‚๐’…

๐

๐’…)

โ€“ Generalizes to bounded doubling dimension

โ€“ Basic version is teachable (Jelani Nelsonโ€™s ``Big Dataโ€™โ€™ class at Harvard)

โ€“ Implementation in progressโ€ฆ

Page 27: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

โ€œSolve-And-Sketchโ€ Framework

(1 + ๐œ–)-Earth-Mover Distance, Transportation Cost

โ€ข No simple โ€œdivide-and-conquerโ€ Arora-Mitchell-style algorithm (unlike for general matching)

โ€ข Only recently sequential 1 + ๐œ– -apprxoimation in

๐‘‚๐œ– ๐’ log๐‘‚ 1 ๐’ time [Sharathkumar, Agarwal โ€˜12]

Our approach (convex sketching):

โ€ข Switch to the flow-based version

โ€ข In every cell, send the flow to the closest net-point until we can connect the net points

Page 28: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

โ€œSolve-And-Sketchโ€ Framework

Convex sketching the cost function for ๐‰ net points

โ€ข ๐น:โ„๐‰โˆ’1 โ†’ โ„ = the cost of routing fixed amounts of flow through the net points

โ€ข Function ๐นโ€™ = ๐น + โ€œnormalizationโ€ is monotone, convex and Lipschitz, (1 + ๐)-approximates ๐น

โ€ข We can (1 + ๐)-sketch it using a lower convex hull

Page 29: Parallel Algorithms for Geometric Graph ProblemsProblems in Parallel Models Geometric graph (implicit): Euclidean distances between n points in โ„ Already have solutions for old NP-hard

Thank you! http://grigory.us

Open problems:

โ€ข Exetension to high dimensions? โ€“ Probably no, reduce from connectivity => conditional

lower bound โˆถ ฮฉ log ๐‘› rounds for MST in โ„“โˆž๐‘›

โ€“ The difficult setting is ๐‘‘ = ฮ˜(log๐’) (can do JL)

โ€ข Streaming alg for EMD and Transporation Cost?

โ€ข Our work: โ€“ First near-linear time algorithm for Transportation

Cost

โ€“ Is it possible to reconstruct the solution itself?