A GPU Algorithm for Greedy Graph Matching
B. O. Fagginger Auer R. H. Bisseling
Utrecht University
September 29, 2011
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 1 / 27
Outline
1 Introduction
2 CPU matching
3 GPU matching
4 Implementation
5 Results
6 Conclusion
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 2 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications in
I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications in
I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications in
I minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications inI minimising wireless network power consumption,
I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,
I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,
I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Introduction
We will discuss generating greedy graph matchings on the GPU.
Graph matching ≈ a pairing of neighbouring vertices within a graph.
Matching has applications inI minimising wireless network power consumption,I Travelling salesman problem heuristics,I organ donation,I . . .
Our primary interest is graph coarsening, where we contract matchedvertices to obtain a coarser version of the original graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 3 / 27
Graph Matching
A graph is a pair G = (V ,E ) with vertices V and edges E .
All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .
A matching is a collection M ⊆ E of edges that are disjoint.
We will view matchings as a map π : V → N such that
π(v) = π(w) ⇐⇒ {v ,w} ∈ M.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27
Graph Matching
A graph is a pair G = (V ,E ) with vertices V and edges E .
All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .
A matching is a collection M ⊆ E of edges that are disjoint.
We will view matchings as a map π : V → N such that
π(v) = π(w) ⇐⇒ {v ,w} ∈ M.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27
Graph Matching
A graph is a pair G = (V ,E ) with vertices V and edges E .
All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .
A matching is a collection M ⊆ E of edges that are disjoint.
We will view matchings as a map π : V → N such that
π(v) = π(w) ⇐⇒ {v ,w} ∈ M.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27
Graph Matching
A graph is a pair G = (V ,E ) with vertices V and edges E .
All edges e ∈ E are of the form e = {v ,w} for vertices v ,w ∈ V .
A matching is a collection M ⊆ E of edges that are disjoint.
We will view matchings as a map π : V → N such that
π(v) = π(w) ⇐⇒ {v ,w} ∈ M.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 4 / 27
Maximal Matching
A matching is maximal if we cannot enlarge it further by addinganother edge to it.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 5 / 27
Maximum Matching
A matching is maximum if it possesses the largest possible number ofedges, compared to all other matchings.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 6 / 27
Graph Matching
If the edges are provided with weights ω : E → R>0, finding amatching M which maximises
ω(M) =∑e∈M
ω(e),
is called weighted matching.
Greedy matching provides us with maximal matchings, but notnecessarily of maximum possible weight or maximum number ofvertices/edges.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 7 / 27
Graph Matching
If the edges are provided with weights ω : E → R>0, finding amatching M which maximises
ω(M) =∑e∈M
ω(e),
is called weighted matching.
Greedy matching provides us with maximal matchings, but notnecessarily of maximum possible weight or maximum number ofvertices/edges.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 7 / 27
CPU matching
We will now look at a serial greedy algorithm which generates amaximal matching.
In random order, vertices v ∈ V select and match neighboursone-by-one.
Here, we can pick
I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27
CPU matching
We will now look at a serial greedy algorithm which generates amaximal matching.
In random order, vertices v ∈ V select and match neighboursone-by-one.
Here, we can pick
I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27
CPU matching
We will now look at a serial greedy algorithm which generates amaximal matching.
In random order, vertices v ∈ V select and match neighboursone-by-one.
Here, we can pick
I the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27
CPU matching
We will now look at a serial greedy algorithm which generates amaximal matching.
In random order, vertices v ∈ V select and match neighboursone-by-one.
Here, we can pickI the first available neighbour w of v (random matching),
I the neighbour w for which ω({v ,w}) is maximal (weighted matching).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27
CPU matching
We will now look at a serial greedy algorithm which generates amaximal matching.
In random order, vertices v ∈ V select and match neighboursone-by-one.
Here, we can pickI the first available neighbour w of v (random matching),I the neighbour w for which ω({v ,w}) is maximal (weighted matching).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 8 / 27
CPU matching
9
8
65
73
1
4
2
We will create a random matching for this graph.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Consider the vertices one-by-one.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Select unmatched neighbour. . .
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
. . . and match.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Skip matched vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Skip already matched neighbours.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Skip already matched neighbours.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Skip already matched neighbours.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
Keep matching until we have treated all vertices.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
CPU matching
9
8
65
73
1
4
2
We have obtained a maximal matching (also maximum in this case).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 9 / 27
Problematic parallelism
Directly extending this to a parallel algorithm is problematic.
Disjoint edges requirement leads to serialisation.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 10 / 27
Problematic parallelism
Directly extending this to a parallel algorithm is problematic.
Disjoint edges requirement leads to serialisation.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 10 / 27
Problematic parallelism
9
8
65
73
1
4
2
Suppose we match vertices simultaneously.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27
Problematic parallelism
9
8
65
73
1
4
2
Vertices find an unmatched neighbour. . .
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27
Problematic parallelism
9
8
65
73
1
4
2
. . . but generate an invalid matching.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 11 / 27
GPU matching
To solve this we create two groups of vertices: blue and red.
Blue vertices propose.
Red vertices respond.
Proposals that were responded to are matched.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27
GPU matching
To solve this we create two groups of vertices: blue and red.
Blue vertices propose.
Red vertices respond.
Proposals that were responded to are matched.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27
GPU matching
To solve this we create two groups of vertices: blue and red.
Blue vertices propose.
Red vertices respond.
Proposals that were responded to are matched.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27
GPU matching
To solve this we create two groups of vertices: blue and red.
Blue vertices propose.
Red vertices respond.
Proposals that were responded to are matched.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 12 / 27
GPU implementation
The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.
We create one thread for each vertex in V .
Each vertex v ∈ V only updates
I its colour/matching value π(v);I and its proposal/response value σ(v).
Both π and σ are stored in 1D arrays in global memory.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27
GPU implementation
The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.
We create one thread for each vertex in V .
Each vertex v ∈ V only updates
I its colour/matching value π(v);I and its proposal/response value σ(v).
Both π and σ are stored in 1D arrays in global memory.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27
GPU implementation
The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.
We create one thread for each vertex in V .
Each vertex v ∈ V only updatesI its colour/matching value π(v);I and its proposal/response value σ(v).
Both π and σ are stored in 1D arrays in global memory.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27
GPU implementation
The graph (neighbour ranges, indices, and weights) is stored as atriplet of 1D textures on the GPU.
We create one thread for each vertex in V .
Each vertex v ∈ V only updatesI its colour/matching value π(v);I and its proposal/response value σ(v).
Both π and σ are stored in 1D arrays in global memory.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 13 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π - - - - - - - - -σ - - - - - - - - -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b r r b b r b b rσ - - - - - - - - -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b r r b b r b b rσ 3 - - 3 6 - 3 2 -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespond
Match
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b r r b b r b b rσ 3 8 7 3 6 5 3 2 -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b 2 3 b 5 5 3 2 rσ 3 8 7 3 6 5 3 2 -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π r 2 3 r 5 5 3 2 bσ 3 8 7 3 6 5 3 2 -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π r 2 3 r 5 5 3 2 bσ - - - - - - - - d
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespond
Match
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π r 2 3 r 5 5 3 2 bσ - - - - - - - - d
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π r 2 3 r 5 5 3 2 dσ - - - - - - - - d
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b 2 3 r 5 5 3 2 dσ - - - - - - - - d
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b 2 3 r 5 5 3 2 dσ 4 - - - - - - - -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespond
Match
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π b 2 3 r 5 5 3 2 dσ 4 - - 1 - - - - -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
GPU matching
ColourProposeRespondMatch
9
8
65
73
1
4
2
1 2 3 4 5 6 7 8 9
π 1 2 3 1 5 5 3 2 dσ 4 - - 1 - - - - -
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 14 / 27
Matching saturation
0
20
40
60
80
100
0 5 10 15 20Mat
ched
ver
tices
/tota
l nr.
of v
ertic
es (
%)
Number of iterations
Saturation of matching size
ecology2 (1,997,996)ecology1 (1,998,000)
G3_circuit (3,037,674)thermal2 (3,676,134)
kkt_power (6,482,320)af_shell9 (8,542,010)
ldoor (22,785,136)af_shell10 (25,582,130)
audikw1 (38,354,076)nlpkkt120 (46,651,696)
cage15 (47,022,346)
Fraction of matched vertices as function of the number of iterations.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 15 / 27
Colouring vertices
To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]
colour(v) =
{blue with probability p,red with probability 1− p.
(1)
How to choose p? Maximise the number of matched vertices.
For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)
2 (1− p)(
1− e−p
1−p
). (2)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27
Colouring vertices
To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]
colour(v) =
{blue with probability p,red with probability 1− p.
(1)
How to choose p? Maximise the number of matched vertices.
For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)
2 (1− p)(
1− e−p
1−p
). (2)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27
Colouring vertices
To colour vertices v ∈ V , we use for a fixed p ∈ [0, 1]
colour(v) =
{blue with probability p,red with probability 1− p.
(1)
How to choose p? Maximise the number of matched vertices.
For a large random graphs, the expected fraction of matched verticescan be approximated by (independent of edge density)
2 (1− p)(
1− e−p
1−p
). (2)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 16 / 27
Choosing p
0
20
40
60
80
100
0 20 40 60 80 100Fra
ctio
n of
max
imum
val
ue (
%)
Fraction of vertices that are blue (%)
Influence of relative blue/red group size
Matching weightMatching sizeMatching time
0
20
40
60
80
100
0 20 40 60 80 100Fra
ctio
n of
mat
ched
ver
tices
(%
)
Fraction of vertices that are blue (%)
Influence of relative blue/red group size
ObservedEquation (2)
Equation (2): we should choose p ≈ 0.53406.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 17 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results
Created an implementation on the GPU using CUDA and on the CPUusing TBB.
We consider both random and weighted matching.
Vertex orderings are randomised and results are averaged over 32randomisations.
Time only pertains to matching, not I/O or randomisation.
Test set: ongoing 10th DIMACS challenge on graph partitioning andUniversity of Florida Sparse Matrix Collection.
Test hardware: dual quad-core Xeon E5620 and an NVIDIA TeslaC2050 (thanks: the Little Green Machine project).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 18 / 27
Results (scaling)
10
20
30
40
50
60 70 80 90
100
1 2 4 8 16
Rel
ativ
e m
atch
ing
time
(%)
Number of CPU threads
Matching time scaling
ecology2 (1,997,996)ecology1 (1,998,000)
G3_circuit (3,037,674)thermal2 (3,676,134)
kkt_power (6,482,320)af_shell9 (8,542,010)
ldoor (22,785,136)af_shell10 (25,582,130)
audikw1 (38,354,076)nlpkkt120 (46,651,696)
cage15 (47,022,346)ideal scaling
Scaling of TBB implementation (8 physical cores + hyperthreading).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 19 / 27
Results (vs. local random matching)
80
85
90
95
100
105
110
115
120
101 102 103 104 105 106 107 108
Mat
chin
g si
ze r
el. t
o A
lg. 1
(%
)
Number of graph edges
Matching size for random parallel matching (vs. Alg. 1)
CUDATBB
0
1
2
3
4
5
6
7
101 102 103 104 105 106 107 108S
peed
up r
el. t
o A
lg. 1
Number of graph edges
Speedup for random parallel matching (vs. Alg. 1)
CUDATBB
Matching size and speedup for parallel vs. serial local random matching.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 20 / 27
Results (vs. local weighted matching)
0
50
100
150
200
250
101 102 103 104 105 106 107 108
Mat
chin
g w
eigh
t rel
. to
Alg
. 1 (
%)
Number of graph edges
Matching weight for weighted parallel matching (vs. Alg. 1)
CUDATBB
0
1
2
3
4
5
6
101 102 103 104 105 106 107 108S
peed
up r
el. t
o A
lg. 1
Number of graph edges
Speedup for weighted parallel matching (vs. Alg. 1)
CUDATBB
Matching weight and speedup for parallel vs. serial local weightedmatching.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 21 / 27
Results (vs. global weighted matching)
0
50
100
150
200
250
101 102 103 104 105 106 107 108
Mat
chin
g w
eigh
t rel
. to
Alg
. 2 (
%)
Number of graph edges
Matching weight for weighted parallel matching (vs. Alg. 2)
CUDATBB
0
5
10
15
20
25
30
35
40
101 102 103 104 105 106 107 108S
peed
up r
el. t
o A
lg. 2
Number of graph edges
Speedup for weighted parallel matching (vs. Alg. 2)
CUDATBB
Matching weight and speedup for parallel local vs. serial global weightedmatching.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 22 / 27
Conclusion
We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.
The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.
The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.
Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.
We look forward to employ this algorithm in (hyper)graph coarsening.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27
Conclusion
We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.
The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.
The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.
Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.
We look forward to employ this algorithm in (hyper)graph coarsening.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27
Conclusion
We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.
The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.
The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.
Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.
We look forward to employ this algorithm in (hyper)graph coarsening.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27
Conclusion
We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.
The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.
The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.
Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.
We look forward to employ this algorithm in (hyper)graph coarsening.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27
Conclusion
We have presented a fine-grain, shared-memory parallel greedy graphalgorithm, suited for GPUs.
The algorithm provides similar quality random matching withspeedups up to 6.8 for large graphs.
The algorithm provides better quality than local weighted matchingswith speedups up to 5.6.
Compared to a global greedy weighted matching algorithm quality isworse, but speedups up to 37 are achieved.
We look forward to employ this algorithm in (hyper)graph coarsening.
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 23 / 27
Questions
∃ any questions?
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 24 / 27
Choosing p
We should maximise the relative number of matched vertices eachround.
The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N
|V | , where
N := number of red vertices receiving at least one proposal.
For a random graph with n vertices, we can approximate(independent of edge density)
limn→∞
2E (N(n))
n≈ 2 (1− p)
(1− e−
p1−p
). (3)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27
Choosing p
We should maximise the relative number of matched vertices eachround.
The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N
|V | , where
N := number of red vertices receiving at least one proposal.
For a random graph with n vertices, we can approximate(independent of edge density)
limn→∞
2E (N(n))
n≈ 2 (1− p)
(1− e−
p1−p
). (3)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27
Choosing p
We should maximise the relative number of matched vertices eachround.
The number of matched vertices equals twice the number of redvertices that receive at least one proposal: maximise 2N
|V | , where
N := number of red vertices receiving at least one proposal.
For a random graph with n vertices, we can approximate(independent of edge density)
limn→∞
2E (N(n))
n≈ 2 (1− p)
(1− e−
p1−p
). (3)
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 25 / 27
Choosing p
Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by
∑v∈V
P(π(v) = red)P(v is proposed to | π(v) = red)
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(w proposes to v | π(v) = red))
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(π(w) = blue)P({v ,w} ∈ E )
nr. of red neighb. of w
)≈ n (1− p)
(1−
(1− p d
1 + (1− p) (d (n − 1)− 1)
)n−1).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27
Choosing p
Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V
P(π(v) = red)P(v is proposed to | π(v) = red)
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(w proposes to v | π(v) = red))
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(π(w) = blue)P({v ,w} ∈ E )
nr. of red neighb. of w
)≈ n (1− p)
(1−
(1− p d
1 + (1− p) (d (n − 1)− 1)
)n−1).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27
Choosing p
Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V
P(π(v) = red)P(v is proposed to | π(v) = red)
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(w proposes to v | π(v) = red))
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(π(w) = blue)P({v ,w} ∈ E )
nr. of red neighb. of w
)≈ n (1− p)
(1−
(1− p d
1 + (1− p) (d (n − 1)− 1)
)n−1).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27
Choosing p
Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V
P(π(v) = red)P(v is proposed to | π(v) = red)
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(w proposes to v | π(v) = red))
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(π(w) = blue)P({v ,w} ∈ E )
nr. of red neighb. of w
)
≈ n (1− p)
(1−
(1− p d
1 + (1− p) (d (n − 1)− 1)
)n−1).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27
Choosing p
Let G = ({1, . . . , n},E ) with P({v ,w} ∈ E ) = d for d ∈]0, 1]. ThenE (N(n)) is given by∑v∈V
P(π(v) = red)P(v is proposed to | π(v) = red)
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(w proposes to v | π(v) = red))
=∑v∈V
P(π(v) = red)
1−∏
w∈V\{v}
(1− P(π(w) = blue)P({v ,w} ∈ E )
nr. of red neighb. of w
)≈ n (1− p)
(1−
(1− p d
1 + (1− p) (d (n − 1)− 1)
)n−1).
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 26 / 27
NVIDIA visual profiler
Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 27 / 27
Top Related