Filtering: a method for solving graph problems in map reduce

120
Filtering: A Method for Solving Graph Problems in MapReduce Benjamin Moseley Silvio Lattanzi Siddharth Suri Sergei Vassilvitski UIUC Google Yahoo! Yahoo! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012

description

 

Transcript of Filtering: a method for solving graph problems in map reduce

Page 1: Filtering: a method for solving graph problems in map reduce

Filtering: A Method for Solving Graph Problems in

MapReduceBenjamin Moseley Silvio Lattanzi Siddharth Suri Sergei Vassilvitski

UIUC Google Yahoo! Yahoo!

Large Scale Distributed Computation, Jan 2012Tuesday, January 17, 2012

Page 2: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Overview

• Introduction to MapReduce model

• Our settings

• Our results

• Open questions

Tuesday, January 17, 2012

Page 3: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Introduction to the MapReduce model

Tuesday, January 17, 2012

Page 4: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

XXL Data

• Huge amount of data

• Main problem is to analyze information quickly

Tuesday, January 17, 2012

Page 5: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

XXL Data

• Huge amount of data

• Main problem is to analyze information quickly

• New tools

• Suitable efficient algorithms

Tuesday, January 17, 2012

Page 6: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

MapReduce

• MapReduce is the platform of choice for processing massive data

INPUT

MAPPHASE

REDUCEPHASE

Tuesday, January 17, 2012

Page 7: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

MapReduce

• MapReduce is the platform of choice for processing massive data

• Data are represented as tuple < key, value >

INPUT

MAPPHASE

REDUCEPHASE

Tuesday, January 17, 2012

Page 8: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

MapReduce

• MapReduce is the platform of choice for processing massive data

• Data are represented as tuple < key, value >

• Mapper decides how data is distributed

INPUT

MAPPHASE

REDUCEPHASE

Tuesday, January 17, 2012

Page 9: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

MapReduce

• MapReduce is the platform of choice for processing massive data

• Data are represented as tuple < key, value >

• Mapper decides how data is distributed

• Reducer performs non-trivial computation locally

INPUT

MAPPHASE

REDUCEPHASE

Tuesday, January 17, 2012

Page 10: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

How can we model MapReduce?

PRAM

•No limit on the number of processors

•Memory is uniformly accessible from any processor

•No limit on the memory available

Tuesday, January 17, 2012

Page 11: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

How can we model MapReduce?

STREAMING

•Just one processor

•There is a limited amount of memory

•No parallelization

Tuesday, January 17, 2012

Page 12: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

[Karloff, Suri and Vassilvitskii]

• is the input size and is some fixed constantN � > 0

MapReduce model

Tuesday, January 17, 2012

Page 13: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

[Karloff, Suri and Vassilvitskii]

• is the input size and is some fixed constant

•Less than machines

N � > 0

N1−�

MapReduce model

Tuesday, January 17, 2012

Page 14: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

[Karloff, Suri and Vassilvitskii]

• is the input size and is some fixed constant

•Less than machines

•Less than memory on each machine

N � > 0

N1−�

N1−�

MapReduce model

Tuesday, January 17, 2012

Page 15: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

[Karloff, Suri and Vassilvitskii]

• is the input size and is some fixed constant

•Less than machines

•Less than memory on each machine

N � > 0

N1−�

N1−�

: problem that can be solved in roundsMRCi

O(logiN)

MapReduce model

Tuesday, January 17, 2012

Page 16: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Combining map and reduce phase

•Mapper and Reducer work only on a subgraph

Tuesday, January 17, 2012

Page 17: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Combining map and reduce phase

•Mapper and Reducer work only on a subgraph

•Keep time in each round polynomial

Tuesday, January 17, 2012

Page 18: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Combining map and reduce phase

•Mapper and Reducer work only on a subgraph

•Keep time in each round polynomial

•Time is constrained by the number of rounds

Tuesday, January 17, 2012

Page 19: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic challenges

•No machine can see the entire input

Tuesday, January 17, 2012

Page 20: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic challenges

•No machine can see the entire input

•No communication between machines during each phase

Tuesday, January 17, 2012

Page 21: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic challenges

•No machine can see the entire input

•No communication between machines during each phase

•Total memory is N2−2�

Tuesday, January 17, 2012

Page 22: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

MapReduce vs MUD algorithms

• In MUD framework each reducer operates on a stream of data.

• In MUD, each reducer is restricted to only using polylogarithmic space.

Tuesday, January 17, 2012

Page 23: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our settings

Tuesday, January 17, 2012

Page 24: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our settings

•We study the Karloff, Suri and Vassilvitskii model

Tuesday, January 17, 2012

Page 25: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our settings

•We study the Karloff, Suri and Vassilvitskii model

•We focus on class MRC0

Tuesday, January 17, 2012

Page 26: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our settings

•We assume to work with dense graph , for some constant

m = n1+c

c > 0

Tuesday, January 17, 2012

Page 27: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our settings

•We assume to work with dense graph , for some constant

•Empirical evidences that social networks are dense graphs[Leskovec, Kleinberg and Faloutsos]

m = n1+c

c > 0

Tuesday, January 17, 2012

Page 28: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Dense graph motivation

•They study 9 different social networks

[Leskovec, Kleinberg and Faloutsos]

Tuesday, January 17, 2012

Page 29: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Dense graph motivation

•They study 9 different social networks

•They show that several graphs from different domains have edges

[Leskovec, Kleinberg and Faloutsos]

n1+c

Tuesday, January 17, 2012

Page 30: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Dense graph motivation

•They study 9 different social networks

•They show that several graphs from different domains have edges

•Lowest value of founded and four graphs have

[Leskovec, Kleinberg and Faloutsos]

n1+c

c .08c > .5

Tuesday, January 17, 2012

Page 31: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Our results

Tuesday, January 17, 2012

Page 32: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Results

Constant rounds algorithms for

•Maximal matching

•Minimum cut

•8-approx for maximum weighted matching

•2-approx for vertex cover

•3/2-approx for edge cover

Tuesday, January 17, 2012

Page 33: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Notation

• : Input graph

• : number of nodes

• : number of edges

• : memory available on each machine

• : input size

G = (V,E)

n

m

η

N

Tuesday, January 17, 2012

Page 34: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

•Part of the input is dropped or filtered on the first stage in parallel

Tuesday, January 17, 2012

Page 35: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

•Part of the input is dropped or filtered on the first stage in parallel

•Next some computation is performed on the filtered input

Tuesday, January 17, 2012

Page 36: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

•Part of the input is dropped or filtered on the first stage in parallel

•Next some computation is performed on the filtered input

•Finally some patchwork is done to ensure a proper solution

Tuesday, January 17, 2012

Page 37: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

Tuesday, January 17, 2012

Page 38: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

FILTER

Tuesday, January 17, 2012

Page 39: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

FILTER

COMPUTESOLUTION

Tuesday, January 17, 2012

Page 40: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

FILTER

COMPUTESOLUTION

PATCHWORKON REST OF THE

GRAPH

Tuesday, January 17, 2012

Page 41: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Filtering

FILTER

COMPUTESOLUTION

PATCHWORKON REST OF THE

GRAPH

Tuesday, January 17, 2012

Page 42: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

m = n1+c

Tuesday, January 17, 2012

Page 43: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

m = n1+c

nc/2PARTITIONEDGES

Tuesday, January 17, 2012

Page 44: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

m = n1+c

nc/2PARTITIONEDGES

n1+c/2 edges

Tuesday, January 17, 2012

Page 45: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

n1+c/2 edges

Tuesday, January 17, 2012

Page 46: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

nc/2

n1+c/2 edges

COMPUTEMST

Tuesday, January 17, 2012

Page 47: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Warm-up: Compute the minimum spanning tree

nc/2

n1+c/2 edges

COMPUTEMST

nc/2SECONDROUND

n1+c/2edges

Tuesday, January 17, 2012

Page 48: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Show that the algorithm is in MRC0

•The algorithm is correct

Tuesday, January 17, 2012

Page 49: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Show that the algorithm is in MRC0

•The algorithm is correct

•No edge in the final solution is discarded in partial solution

Tuesday, January 17, 2012

Page 50: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Show that the algorithm is in MRC0

•The algorithm is correct

•No edge in the final solution is discarded in partial solution

•The algorithm runs in two rounds

Tuesday, January 17, 2012

Page 51: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Show that the algorithm is in MRC0

•The algorithm is correct

•No edge in the final solution is discarded in partial solution

•The algorithm runs in two rounds

•No more than machines are usedO�n

c2�

Tuesday, January 17, 2012

Page 52: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Show that the algorithm is in MRC0

•The algorithm is correct

•No edge in the final solution is discarded in partial solution

•The algorithm runs in two rounds

•No more than machines are used

•No machine uses memory greater than O(n1+c/2)

O�n

c2�

Tuesday, January 17, 2012

Page 53: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximal matching

•Algorithmic difficulty is that each machine can only see edges assigned to the machine

Tuesday, January 17, 2012

Page 54: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximal matching

•Algorithmic difficulty is that each machine can only see edges assigned to the machine

• Is a partitioning based algorithm feasible?

Tuesday, January 17, 2012

Page 55: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Partitioning algorithm

Tuesday, January 17, 2012

Page 56: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Partitioning algorithm

PARTITIONING

Tuesday, January 17, 2012

Page 57: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Partitioning algorithm

PARTITIONING

COMBINEMATCHINGS

Tuesday, January 17, 2012

Page 58: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Partitioning algorithm

PARTITIONING

COMBINEMATCHINGS

It is impossible to build a maximal matching using only red edges!!

Tuesday, January 17, 2012

Page 59: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•Consider any subset of the edges E�

Tuesday, January 17, 2012

Page 60: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•Consider any subset of the edges

•Let be a maximal matching on

E�

M �

G[E�]

Tuesday, January 17, 2012

Page 61: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•Consider any subset of the edges

•Let be a maximal matching on

•The unmatched vertices form a independent set

E�

M �

G[E�]

Tuesday, January 17, 2012

Page 62: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•We pick each edge with probability p

Tuesday, January 17, 2012

Page 63: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•We pick each edge with probability

•Find a matching on a sample and then find a matching on unmatched vertices

p

Tuesday, January 17, 2012

Page 64: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•We pick each edge with probability

•Find a matching on a sample and then find a matching on unmatched vertices

•For dense portions of the graph, manyedges should be sampled

p

Tuesday, January 17, 2012

Page 65: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithmic insight

•We pick each edge with probability

•Find a matching on a sample and then find a matching on unmatched vertices

•For dense portions of the graph, manyedges should be sampled

•Sparse portions of the graph are small and can be placed on a single machine

p

Tuesday, January 17, 2012

Page 66: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

•Sample each edge independently with probability p =

10 log n

nc/2

Tuesday, January 17, 2012

Page 67: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

•Sample each edge independently with probability

•Find a matching on the sample

p =10 log n

nc/2

Tuesday, January 17, 2012

Page 68: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

•Sample each edge independently with probability

•Find a matching on the sample

•Consider the induced subgraph on unmatched vertices

p =10 log n

nc/2

Tuesday, January 17, 2012

Page 69: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

•Sample each edge independently with probability

•Find a matching on the sample

•Consider the induced subgraph on unmatched vertices

•Find a matching on this graph and output the union

p =10 log n

nc/2

Tuesday, January 17, 2012

Page 70: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

Tuesday, January 17, 2012

Page 71: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

SAMPLE

Tuesday, January 17, 2012

Page 72: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

SAMPLE

FIRSTMATCHING

Tuesday, January 17, 2012

Page 73: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

SAMPLE

FIRSTMATCHING

MATCHING ONUNMATCHED NODES

Tuesday, January 17, 2012

Page 74: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Algorithm

SAMPLE

FIRSTMATCHING

UNION

MATCHING ONUNMATCHED NODES

Tuesday, January 17, 2012

Page 75: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Correctness

•Consider the last step of the algorithm

Tuesday, January 17, 2012

Page 76: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Correctness

•Consider the last step of the algorithm

•All unmatched vertices are placed on a machine

Tuesday, January 17, 2012

Page 77: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Correctness

•Consider the last step of the algorithm

•All unmatched vertices are placed on a machine

•All unmatched vertices or are matched at the last step or have only matched neighbors

Tuesday, January 17, 2012

Page 78: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the rounds

Three rounds:

•Sampling the edges

Tuesday, January 17, 2012

Page 79: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the rounds

Three rounds:

•Sampling the edges

•Find a matching on a single machine for the sampled edges

Tuesday, January 17, 2012

Page 80: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the rounds

Three rounds:

•Sampling the edges

•Find a matching on a single machine for the sampled edges

•Find a matching for the unmatched vertices

Tuesday, January 17, 2012

Page 81: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the memory

•Each edge was sampled with probability p =10 log n

nc/2

Tuesday, January 17, 2012

Page 82: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the memory

•Each edge was sampled with probability

•Using Chernoff the sampled graph has sizewith high probability

p =10 log n

nc/2

O(n1+c/2)

Tuesday, January 17, 2012

Page 83: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the memory

•Each edge was sampled with probability

•Using Chernoff the sampled graph has sizewith high probability

•Can we bound the size of the induced subgraph on the unmatched vertices?

p =10 log n

nc/2

O(n1+c/2)

Tuesday, January 17, 2012

Page 84: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the memory

•For a fixed induced subgraph with at least edges the probability an edge is not sampled is:

n1+c/2

�1− 10 log n

nc/2

�n1+c/2

≤ exp(−10n log n)

Tuesday, January 17, 2012

Page 85: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the memory

•For a fixed induced subgraph with at least edges the probability an edge is not sampled is:

•Union bounding over all induced subgraphs shows that at least one edge is sampled from every dense subgraph with probability

n1+c/2

�1− 10 log n

nc/2

�n1+c/2

≤ exp(−10n log n)

2n

1− exp(−10 log n) ≥ 1− 1

n10

Tuesday, January 17, 2012

Page 86: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Can we run the algorithm with less then memory?

O(n1+c/2)

Tuesday, January 17, 2012

Page 87: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Can we run the algorithm with less then memory?

•We can amplify our sampling technique!

O(n1+c/2)

Tuesday, January 17, 2012

Page 88: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Sample as many edges as possible that fits in memory

Tuesday, January 17, 2012

Page 89: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

Tuesday, January 17, 2012

Page 90: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

• If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices

Tuesday, January 17, 2012

Page 91: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

• If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes

Tuesday, January 17, 2012

Page 92: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

• If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes

•With memory each iteration a factor of edges are removed resulting in rounds

n1+� n�

O(c/�)

Tuesday, January 17, 2012

Page 93: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Parallel computation power

•Maximal matching algorithm does not use parallelization

Tuesday, January 17, 2012

Page 94: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Parallel computation power

•Maximal matching algorithm does not use parallelization

•We use a single machine in every step

Tuesday, January 17, 2012

Page 95: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Parallel computation power

•Maximal matching algorithm does not use parallelization

•We use a single machine in every step

•When parallelization is used?

Tuesday, January 17, 2012

Page 96: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

2

2

2

24

4

8

7

3

1

Tuesday, January 17, 2012

Page 97: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

2

2

2

24

4

8

7

3

2

2

2

244 3

8

7

1

1

SPLIT THE GRAPH

Tuesday, January 17, 2012

Page 98: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

2

2

2

244 3

8

71

Tuesday, January 17, 2012

Page 99: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

2

2

2

244 3

8

71

MATCHINGS

2

22

43

7

Tuesday, January 17, 2012

Page 100: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

2

22

43

7

Tuesday, January 17, 2012

Page 101: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Maximum weighted matching

SCAN THE EDGE SEQUENTIALLY

2

22

43

7

2

22

43

7

Tuesday, January 17, 2012

Page 102: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Bounding the rounds and memory

Four rounds:

•Splitting the graph and running the maximal matching:3 rounds and memory

•Compute the final solution:1 round and memory

O(n1+c/2)

O(n log n)

Tuesday, January 17, 2012

Page 103: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Approximation guarantee (sketch)

•An edge in the solution can block at most 2 edges ineach subgraph of smaller weight

2

22

43

7

8

Tuesday, January 17, 2012

Page 104: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Approximation guarantee (sketch)

•An edge in the solution can block at most 2 edges ineach subgraph of smaller weight

•We loose a factor or 2 because we do not consider the weight

2

22

43

7

8

Tuesday, January 17, 2012

Page 105: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Other algorithms

Based on the same intuition:

•2-approx for vertex cover

•3/2-approx for edge cover

Tuesday, January 17, 2012

Page 106: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

•Partition does not work, because we loose structural informations

Tuesday, January 17, 2012

Page 107: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

•Partition does not work, because we loose structural informations

•Sampling does not seem to work either

Tuesday, January 17, 2012

Page 108: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

•Partition does not work, because we loose structural informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a filtering technique

Tuesday, January 17, 2012

Page 109: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

•Partition does not work, because we loose structural informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a filtering technique

•The random choices made in the early rounds succeed with high probability, whereas the later rounds have a much lower probability of success

Tuesday, January 17, 2012

Page 110: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

1

Tuesday, January 17, 2012

Page 111: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

nδ1RANDOMWEIGHT

1

1

11 1

2

2 2 23

3

3

3

3 3

34

4

44

4

5

Tuesday, January 17, 2012

Page 112: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

nδ1RANDOMWEIGHT

COMPRESSEDGES WITH

SMALL WEIGTH

1

1

11 1

2

2 2 23

3

3

3

3 3

34

4

44

4

5

2 2

22

2

2

2

Tuesday, January 17, 2012

Page 113: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

nδ1RANDOMWEIGHT

COMPRESSEDGES WITH

SMALL WEIGTH

1

1

11 1

2

2 2 23

3

3

3

3 3

34

4

44

4

5

2 2

22

2

2

2

From Karger at least one graph will contain a min cut w.h.p.

Tuesday, January 17, 2012

Page 114: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

2 2

22

2

2

2

Tuesday, January 17, 2012

Page 115: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Minimum cut

COMPUTEMINIMUM

CUT

2 2

22

2

2

2

2 2

22

2

2

2

We will find the minimum cut w.h.p.

Tuesday, January 17, 2012

Page 116: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Empirical result (matching)

!"

#!"

$!!"

$#!"

%!!"

%#!"

!&!!$" !&!!#" !&!$" !&!%" !&!#" !&$" $"

!"#$

%&'(%

)#"*&+,'

-.%/0&'1234.4)0)*5'(/,'

'()(**+*",*-.)/012"

30)+(2/4-""

Tuesday, January 17, 2012

Page 117: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Open problems

Tuesday, January 17, 2012

Page 118: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Open problems 1

•Maximum matching

•Shortest path

•Dynamic programming

Tuesday, January 17, 2012

Page 119: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Open problems 2

•Algorithms for sparse graph

•Does connected components require more than 2 rounds?

•Lower bounds

Tuesday, January 17, 2012

Page 120: Filtering: a method for solving graph problems in map reduce

Large Scale Distributed Computation, Jan 2012

Thank you!

Tuesday, January 17, 2012