Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik...
-
Upload
allan-hodge -
Category
Documents
-
view
221 -
download
0
Transcript of Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik...
Parallel Graph PartioningUsing Simulated Annealing
Parallel and Distributed Computing I
Sadik Gokhan Caglar
Graph Partitioning Problem
Given a Graph G = (N,E) and a integer p
Find subsets N1,N2,…,Np such that
1. pi=1 Ni= N and Ni Nj = 0 for i j
2. W(i) W / p, i = 1,2,…,p, where W(i) and W are the sums of node weights in Ni and N respectively
3. The cut size is minimized
A Partitioned Graph
A partitioned graph with edge-cut of seven
Solutions To The Problem• Geometric Algorithms: Use the geometric
coordinates– Recursive coordinate (or orthogonal) bisection– Recursive circle bisection
• Structural Algorithms: – Graph-Walking Algorithms – Spectral Algorithms
• Refinement Algorithms:– Kernighan-Lin Algorithm– Simulated Annealing Algorithm
Solutions To The Problem
Multilevel technique:
• Coarsen
• Partition
• Refinement
Simulated Annealing
Implementation of SA
• Cost: The number of edges that has vertices in different sets
• Acceptation: The new cost is less than the old• Rejection: The new cost is more than the old,
a probabilistic calculation can change a rejection into an acceptation (ecost/Temp)
• Equilibrium: Number of rejections < (10 * vertexsize of the graph * number of sets)
Implementation of SA
• Frozen state: The temperature starts from 1, the cooling constant is 0.95, it is considered frozen at temperature 0.2
currentcost = cost(graph);printf ("The cost of the graph1 is %f \n", currentcost);while (temp > 0.2){
while (reject < (10 * graph.vertexsize * graph.setsize)){
makenewgraph (graph, &newgraph);tempcost = cost(newgraph);if (tempcost < currentcost){
currentcost = tempcost;graphfree(&graph);graph = newgraph;
}
Implementation of SAelse{
reject++;if (tempcost == currentcost)
prob = e(1, temp);else
prob = e((tempcost - currentcost), temp);prob2 = drand48();if (prob > prob2){
currentcost = tempcost;graphfree(&graph);graph = newgraph;
}else
graphfree(&newgraph);}//1st else
}//rejecttemp = temp * coolconst;reject = 0;printf("cooled!!! temp = %f \n", temp);printf ("currentcost %f\n", currentcost);
}printf ("The cost of the graph2 is %f \n", currentcost);
Input File Format
Data Structures
typedef struct Edge
{
int v1;
int v2;
} Edge;
typedef struct Set
{
int size;
int* vertex;
} Set;
typedef struct Graph
{
int vertexsize;
int edgesize;
int setsize;
struct Edge* edgelist;
struct Set* setlist;
} Graph;
Parallelization Approach 1
• Problem independent tried to implement a general parallel simulated annealing
• Every process will generate a new graph and calculate the new cost
• The results will be sent to the root process
• The root process will choose the best result and broadcast it.
Parallelization Approach 1
• The array that root process gathers: 0 – Acceptation ( 0 no, 1 yes, 2 probability)
1 – Cost
2 – The set number of the first vertex
3 – The set number of the second vertex
4 – The first vertex
5 – The second vertex
Parallelization Approach 1
• The array that root process broadcasts:0 - Temperature update1 – Change done2 – The set number of the first vertex3 – The set number of the second vertex4 – The first vertex5 – The second vertex6 – The cost of the new graph
Parallelization Approach 1
• The equilibrium function has changed. From Number of rejections < (10 * vertexsize of the graph * number of sets) to Number of rejections < (10 * vertexsize of the graph * number of sets / number of processes)
• The rest of the program is the same the data is not distributed
Parallelization Approach 2• Problem dependent, works for only graph
partition problem.• Most of the work in graph partitioning problem
is to calculate the cost of the graph.• This is dependent on the number of edges
that the graph has, the edges array can be scattered to the processes
• The processes only needs the edges it has to calculate the partial sum. It is perfectly parallelizable.
Parallelization Approach 2
• After each process calculates its partial sum and MPI_Reduce with add operation is done to calculate the total sum.
• All the simulated annealing operation is done on the root process the others only calculate their partial sums.
ParSA1 16 Nodes
1 .27
0.87
0.29
1
2.351851852
3.432432432
4.379310345
0.370.54
1 .459770115
0
0.5
1
1 .5
2
2.5
3
3.5
4
4.5
5
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA1 100 Nodes
16.01
8.53
4.6
2.57
1 .61
3.480434783
6.229571984
10.00625
1 .876905041
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12 14 16 18
T ime Speedup
ParSA1 300 Nodes
144.66
74.52
38.24
1
20.44
11 .23.7829497911 .9412238337.077299413
12.91607143
0
20
40
60
80
100
120
140
160
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA1 500 Nodes
483.32
223.57
113.85
115.64648754
30.8958.66
2.16182851 4.245234958 8.239345380
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA1 1000 Nodes
1809.9
901 .92
456.31
231 .62
1
118.97
15.213078937.8140920473.9663825032.006718999
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA2 16 Nodes
1 .88
2.78
1
0.334532374
0.93
2.33
0.3991416310.494680851
0
0.5
1
1 .5
2
2.5
3
0 1 2 3 4 5 6 7 8 9
P r oc es s or s
T ime Speedup
ParSA2 100 Nodes
15.17
10.77
11 .408542247
11 .82
10.09
9.88
1 .535425101
1 .283417936
1 .503468781
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA2 300 Nodes
134.76
92.13
45.28
1 2.97614841
61 .67
50.31
2.6785927252.185179181 .462715728
0
20
40
60
80
100
120
140
160
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA2 500 Nodes
456.86
256.87
163.8
1
98.8115.21
4.6240890693.9654543882.7891330891 .778565033
0
50
100
150
200
250
300
350
400
450
500
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA2 1000 Nodes
1232.36
677.44
335.9
1
1777.57
420.12
5.2919618934.2311006382.6239519371 .442411308
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2 4 6 8 10 12 14 16 18
P r oc es s or s
T ime Speedup
ParSA2 10000 Nodes 40000 Edges
23.32
17.12
11 .24
7.47
3.66
1
6.371584699
4.693.121820616
1 .362149533
2.074733096
4.97228145
0
5
10
15
20
25
0 5 10 15 20 25 30 35
P r oc es s or s
T ime Speedup
ParSA2 10000 Nodes 80000 Edges
45.75
18.62
10.76
1
29.96
4.7
7.39
1 .527036048
9.734042553
6.1907983764.251858736
2.457035446
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25 30 35
P r oc es s or s
T ime Speedup