science.uu.nl project csg - A GPU Algorithm for Greedy Graph …bisse101/Software/match... · 2015....

Post on 02-Oct-2020

0 views 0 download

Transcript of science.uu.nl project csg - A GPU Algorithm for Greedy Graph …bisse101/Software/match... · 2015....

A GPU Algorithm for Greedy Graph Matching

B. O. Fagginger Auer R. H. Bisseling

Utrecht University

September 29, 2011

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 1 / 27

Outline

1 Introduction

2 CPU matching

3 GPU matching

4 Implementation

5 Results

6 Conclusion

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 2 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications in

I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications in

I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications in

I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications inI minimising wireless network power consumption,

I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,

I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,

I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Introduction

We will discuss generating greedy graph matchings on the GPU.

Graph matching ≈ a pairing of neighbouring vertices within a graph.

Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .

Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27

Graph Matching

A graph is a pair G = (V ,E ) with vertices V and edges E .

All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .

A matching is a collection M ⊆ E of edges that are disjoint.

We will view matchings as a map π : V → N such that

π(v) = π(w) ⇐⇒ {v ,w} ∈ M.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27

Graph Matching

A graph is a pair G = (V ,E ) with vertices V and edges E .

All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .

A matching is a collection M ⊆ E of edges that are disjoint.

We will view matchings as a map π : V → N such that

π(v) = π(w) ⇐⇒ {v ,w} ∈ M.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27

Graph Matching

A graph is a pair G = (V ,E ) with vertices V and edges E .

All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .

A matching is a collection M ⊆ E of edges that are disjoint.

We will view matchings as a map π : V → N such that

π(v) = π(w) ⇐⇒ {v ,w} ∈ M.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27

Graph Matching

A graph is a pair G = (V ,E ) with vertices V and edges E .

All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .

A matching is a collection M ⊆ E of edges that are disjoint.

We will view matchings as a map π : V → N such that

π(v) = π(w) ⇐⇒ {v ,w} ∈ M.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27

Maximal Matching

A matching is maximal if we cannot enlarge it further by addinganother edge to it.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 5 / 27

Maximum Matching

A matching is maximum if it possesses the largest possible number ofedges, compared to all other matchings.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 6 / 27

Graph Matching

If the edges are provided with weights ω : E → R>0, finding amatching M which maximises

ω(M) =∑e∈M

ω(e),

is called weighted matching.

Greedy matching provides us with maximal matchings, but notnecessarily of maximum possible weight or maximum number ofvertices/edges.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 7 / 27

Graph Matching

If the edges are provided with weights ω : E → R>0, finding amatching M which maximises

ω(M) =∑e∈M

ω(e),

is called weighted matching.

Greedy matching provides us with maximal matchings, but notnecessarily of maximum possible weight or maximum number ofvertices/edges.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 7 / 27

CPU matching

We will now look at a serial greedy algorithm which generates amaximal matching.

In random order, vertices v ∈ V select and match neighboursone-by-one.

Here, we can pick

I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27

CPU matching

We will now look at a serial greedy algorithm which generates amaximal matching.

In random order, vertices v ∈ V select and match neighboursone-by-one.

Here, we can pick

I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27

CPU matching

We will now look at a serial greedy algorithm which generates amaximal matching.

In random order, vertices v ∈ V select and match neighboursone-by-one.

Here, we can pick

I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27

CPU matching

We will now look at a serial greedy algorithm which generates amaximal matching.

In random order, vertices v ∈ V select and match neighboursone-by-one.

Here, we can pickI the first available neighbour w of v (random matching),

I the neighbour w for which ω({v ,w}) is maximal (weighted matching).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27

CPU matching

We will now look at a serial greedy algorithm which generates amaximal matching.

In random order, vertices v ∈ V select and match neighboursone-by-one.

Here, we can pickI the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27

CPU matching

9

8

65

73

1

4

2

We will create a random matching for this graph.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Consider the vertices one-by-one.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Select unmatched neighbour. . .

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

. . . and match.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Skip matched vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Skip already matched neighbours.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Skip already matched neighbours.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Skip already matched neighbours.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

Keep matching until we have treated all vertices.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

CPU matching

9

8

65

73

1

4

2

We have obtained a maximal matching (also maximum in this case).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27

Problematic parallelism

Directly extending this to a parallel algorithm is problematic.

Disjoint edges requirement leads to serialisation.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 10 / 27

Problematic parallelism

Directly extending this to a parallel algorithm is problematic.

Disjoint edges requirement leads to serialisation.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 10 / 27

Problematic parallelism

9

8

65

73

1

4

2

Suppose we match vertices simultaneously.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27

Problematic parallelism

9

8

65

73

1

4

2

Vertices find an unmatched neighbour. . .

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27

Problematic parallelism

9

8

65

73

1

4

2

. . . but generate an invalid matching.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27

GPU matching

To solve this we create two groups of vertices: blue and red.

Blue vertices propose.

Red vertices respond.

Proposals that were responded to are matched.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27

GPU matching

To solve this we create two groups of vertices: blue and red.

Blue vertices propose.

Red vertices respond.

Proposals that were responded to are matched.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27

GPU matching

To solve this we create two groups of vertices: blue and red.

Blue vertices propose.

Red vertices respond.

Proposals that were responded to are matched.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27

GPU matching

To solve this we create two groups of vertices: blue and red.

Blue vertices propose.

Red vertices respond.

Proposals that were responded to are matched.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27

GPU implementation

The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.

We create one thread for each vertex in V .

Each vertex v ∈ V only updates

I its colour/matching value π(v);I and its proposal/response value σ(v).

Both π and σ are stored in 1D arrays in global memory.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27

GPU implementation

The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.

We create one thread for each vertex in V .

Each vertex v ∈ V only updates

I its colour/matching value π(v);I and its proposal/response value σ(v).

Both π and σ are stored in 1D arrays in global memory.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27

GPU implementation

The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.

We create one thread for each vertex in V .

Each vertex v ∈ V only updatesI its colour/matching value π(v);I and its proposal/response value σ(v).

Both π and σ are stored in 1D arrays in global memory.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27

GPU implementation

The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.

We create one thread for each vertex in V .

Each vertex v ∈ V only updatesI its colour/matching value π(v);I and its proposal/response value σ(v).

Both π and σ are stored in 1D arrays in global memory.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π - - - - - - - - -σ - - - - - - - - -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b r r b b r b b rσ - - - - - - - - -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b r r b b r b b rσ 3 - - 3 6 - 3 2 -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespond

Match

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b r r b b r b b rσ 3 8 7 3 6 5 3 2 -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b 2 3 b 5 5 3 2 rσ 3 8 7 3 6 5 3 2 -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π r 2 3 r 5 5 3 2 bσ 3 8 7 3 6 5 3 2 -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π r 2 3 r 5 5 3 2 bσ - - - - - - - - d

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespond

Match

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π r 2 3 r 5 5 3 2 bσ - - - - - - - - d

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π r 2 3 r 5 5 3 2 dσ - - - - - - - - d

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b 2 3 r 5 5 3 2 dσ - - - - - - - - d

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b 2 3 r 5 5 3 2 dσ 4 - - - - - - - -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespond

Match

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π b 2 3 r 5 5 3 2 dσ 4 - - 1 - - - - -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

GPU matching

ColourProposeRespondMatch

9

8

65

73

1

4

2

1 2 3 4 5 6 7 8 9

π 1 2 3 1 5 5 3 2 dσ 4 - - 1 - - - - -

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27

Matching saturation

0

20

40

60

80

100

0 5 10 15 20Mat

ched

ver

tices

/tota

l nr.

of v

ertic

es (

%)

Number of iterations

Saturation of matching size

ecology2 (1,997,996)ecology1 (1,998,000)

G3_circuit (3,037,674)thermal2 (3,676,134)

kkt_power (6,482,320)af_shell9 (8,542,010)

ldoor (22,785,136)af_shell10 (25,582,130)

audikw1 (38,354,076)nlpkkt120 (46,651,696)

cage15 (47,022,346)

Fraction of matched vertices as function of the number of iterations.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 15 / 27

Colouring vertices

To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]

colour(v) =

{blue with probability p,red with probability 1− p.

(1)

How to choose p? Maximise the number of matched vertices.

For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)

2 (1− p)(

1− e−p

1−p

). (2)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27

Colouring vertices

To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]

colour(v) =

{blue with probability p,red with probability 1− p.

(1)

How to choose p? Maximise the number of matched vertices.

For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)

2 (1− p)(

1− e−p

1−p

). (2)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27

Colouring vertices

To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]

colour(v) =

{blue with probability p,red with probability 1− p.

(1)

How to choose p? Maximise the number of matched vertices.

For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)

2 (1− p)(

1− e−p

1−p

). (2)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27

Choosing p

0

20

40

60

80

100

0 20 40 60 80 100Fra

ctio

n of

max

imum

val

ue (

%)

Fraction of vertices that are blue (%)

Influence of relative blue/red group size

Matching weightMatching sizeMatching time

0

20

40

60

80

100

0 20 40 60 80 100Fra

ctio

n of

mat

ched

ver

tices

(%

)

Fraction of vertices that are blue (%)

Influence of relative blue/red group size

ObservedEquation (2)

Equation (2): we should choose p ≈ 0.53406.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 17 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results

Created an implementation on the GPU using CUDA and on the CPUusing TBB.

We consider both random and weighted matching.

Vertex orderings are randomised and results are averaged over 32randomisations.

Time only pertains to matching, not I/O or randomisation.

Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.

Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27

Results (scaling)

10

20

30

40

50

60 70 80 90

100

1 2 4 8 16

Rel

ativ

e m

atch

ing

time

(%)

Number of CPU threads

Matching time scaling

ecology2 (1,997,996)ecology1 (1,998,000)

G3_circuit (3,037,674)thermal2 (3,676,134)

kkt_power (6,482,320)af_shell9 (8,542,010)

ldoor (22,785,136)af_shell10 (25,582,130)

audikw1 (38,354,076)nlpkkt120 (46,651,696)

cage15 (47,022,346)ideal scaling

Scaling of TBB implementation (8 physical cores + hyperthreading).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 19 / 27

Results (vs. local random matching)

80

85

90

95

100

105

110

115

120

101 102 103 104 105 106 107 108

Mat

chin

g si

ze r

el. t

o A

lg. 1

(%

)

Number of graph edges

Matching size for random parallel matching (vs. Alg. 1)

CUDATBB

0

1

2

3

4

5

6

7

101 102 103 104 105 106 107 108S

peed

up r

el. t

o A

lg. 1

Number of graph edges

Speedup for random parallel matching (vs. Alg. 1)

CUDATBB

Matching size and speedup for parallel vs. serial local random matching.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 20 / 27

Results (vs. local weighted matching)

0

50

100

150

200

250

101 102 103 104 105 106 107 108

Mat

chin

g w

eigh

t rel

. to

Alg

. 1 (

%)

Number of graph edges

Matching weight for weighted parallel matching (vs. Alg. 1)

CUDATBB

0

1

2

3

4

5

6

101 102 103 104 105 106 107 108S

peed

up r

el. t

o A

lg. 1

Number of graph edges

Speedup for weighted parallel matching (vs. Alg. 1)

CUDATBB

Matching weight and speedup for parallel vs. serial local weightedmatching.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 21 / 27

Results (vs. global weighted matching)

0

50

100

150

200

250

101 102 103 104 105 106 107 108

Mat

chin

g w

eigh

t rel

. to

Alg

. 2 (

%)

Number of graph edges

Matching weight for weighted parallel matching (vs. Alg. 2)

CUDATBB

0

5

10

15

20

25

30

35

40

101 102 103 104 105 106 107 108S

peed

up r

el. t

o A

lg. 2

Number of graph edges

Speedup for weighted parallel matching (vs. Alg. 2)

CUDATBB

Matching weight and speedup for parallel local vs. serial global weightedmatching.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 22 / 27

Conclusion

We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.

The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.

The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.

Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.

We look forward to employ this algorithm in (hyper)graph coarsening.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27

Conclusion

We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.

The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.

The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.

Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.

We look forward to employ this algorithm in (hyper)graph coarsening.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27

Conclusion

We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.

The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.

The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.

Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.

We look forward to employ this algorithm in (hyper)graph coarsening.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27

Conclusion

We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.

The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.

The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.

Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.

We look forward to employ this algorithm in (hyper)graph coarsening.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27

Conclusion

We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.

The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.

The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.

Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.

We look forward to employ this algorithm in (hyper)graph coarsening.

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27

Questions

∃ any questions?

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 24 / 27

Choosing p

We should maximise the relative number of matched vertices eachround.

The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N

|V | , where

N := number of red vertices receiving at least one proposal.

For a random graph with n vertices, we can approximate(independent of edge density)

limn→∞

2E (N(n))

n≈ 2 (1− p)

(1− e−

p1−p

). (3)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27

Choosing p

We should maximise the relative number of matched vertices eachround.

The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N

|V | , where

N := number of red vertices receiving at least one proposal.

For a random graph with n vertices, we can approximate(independent of edge density)

limn→∞

2E (N(n))

n≈ 2 (1− p)

(1− e−

p1−p

). (3)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27

Choosing p

We should maximise the relative number of matched vertices eachround.

The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N

|V | , where

N := number of red vertices receiving at least one proposal.

For a random graph with n vertices, we can approximate(independent of edge density)

limn→∞

2E (N(n))

n≈ 2 (1− p)

(1− e−

p1−p

). (3)

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27

Choosing p

Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by

∑v∈V

P(π(v) = red)P(v is proposed to | π(v) = red)

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(w proposes to v | π(v) = red))

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(π(w) = blue)P({v ,w} ∈ E )

nr. of red neighb. of w

)≈ n (1− p)

(1−

(1− p d

1 + (1− p) (d (n − 1)− 1)

)n−1).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27

Choosing p

Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V

P(π(v) = red)P(v is proposed to | π(v) = red)

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(w proposes to v | π(v) = red))

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(π(w) = blue)P({v ,w} ∈ E )

nr. of red neighb. of w

)≈ n (1− p)

(1−

(1− p d

1 + (1− p) (d (n − 1)− 1)

)n−1).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27

Choosing p

Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V

P(π(v) = red)P(v is proposed to | π(v) = red)

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(w proposes to v | π(v) = red))

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(π(w) = blue)P({v ,w} ∈ E )

nr. of red neighb. of w

)≈ n (1− p)

(1−

(1− p d

1 + (1− p) (d (n − 1)− 1)

)n−1).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27

Choosing p

Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V

P(π(v) = red)P(v is proposed to | π(v) = red)

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(w proposes to v | π(v) = red))

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(π(w) = blue)P({v ,w} ∈ E )

nr. of red neighb. of w

)

≈ n (1− p)

(1−

(1− p d

1 + (1− p) (d (n − 1)− 1)

)n−1).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27

Choosing p

Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V

P(π(v) = red)P(v is proposed to | π(v) = red)

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(w proposes to v | π(v) = red))

=∑v∈V

P(π(v) = red)

1−∏

w∈V\{v}

(1− P(π(w) = blue)P({v ,w} ∈ E )

nr. of red neighb. of w

)≈ n (1− p)

(1−

(1− p d

1 + (1− p) (d (n − 1)− 1)

)n−1).

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27

NVIDIA visual profiler

Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 27 / 27