Real time image processing: algorithm parallelization on ...
Optimization and Parallelization of FIND Algorithm
Transcript of Optimization and Parallelization of FIND Algorithm
![Page 1: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/1.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Optimization and Parallelization of FINDAlgorithm
Song Li Eric Darve
Institute for Computational and Mathematical Engineering, Stanford [email protected]
SIAM CSE09March 4, 2009
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 2: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/2.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Outline
1 Background
2 Serial FIND (Fast Inverse using Nested Dissection)
3 Simulation Results
4 Parallel Methods
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 3: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/3.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Outline
1 Background
2 Serial FIND (Fast Inverse using Nested Dissection)
3 Simulation Results
4 Parallel Methods
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 4: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/4.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Introduction
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Modeling the current throughnano-devices by Non-EquilibriumGreen’s Function approachSystem of Schrödinger-PoissonequationsBest known algorithm (RGF) hasrunning time O(n3
xny )
Our method (FIND): O(n2xny )
Other devices: nanotubes andnanowires
![Page 5: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/5.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
The Math Problem
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
What we want: thediagonal of Gr = A−1
What we have: a sparsematrix A from adiscretized 2D mesh
![Page 6: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/6.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
The Math Problem
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
4× 5 mesh
ny = 5
nx = 4
What we want: thediagonal of Gr = A−1
What we have: a sparsematrix A from adiscretized 2D mesh
![Page 7: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/7.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
The Math Problem
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
20× 20 matrix A4× 5 mesh
ny = 5
nx = 4
What we want: thediagonal of Gr = A−1
What we have: a sparsematrix A from adiscretized 2D mesh
![Page 8: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/8.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
The Math Problem
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
20× 20 matrix A4× 5 mesh
ny = 5
nx = 4
What we want: thediagonal of Gr = A−1
What we have: a sparsematrix A from adiscretized 2D mesh
![Page 9: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/9.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
The Math Problem
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
20× 20 matrix A4× 5 mesh
ny = 5
nx = 4
What we want: thediagonal of Gr = A−1
What we have: a sparsematrix A from adiscretized 2D mesh
![Page 10: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/10.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Outline
1 Background
2 Serial FIND (Fast Inverse using Nested Dissection)
3 Simulation Results
4 Parallel Methods
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 11: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/11.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Key Observations
Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1
Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 12: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/12.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Key Observations
Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1
Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 13: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/13.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Key Observations
Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1
Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 14: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/14.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Key Observations
Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1
Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 15: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/15.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Key Observations
Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1
Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 16: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/16.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Overall Structure: Partition Tree
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Order the mesh nodesin a way similar tonested dissection
Partition the wholemesh and form a treestructure to exploit thesubproblem overlap
![Page 17: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/17.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
One Step of Elimination
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Gaussian elimination: A∗( t, t) def= A( t, t)− A( t, t)A( t, t)−1A( t, t)
A( t, t) A( t, t) 0A( t, t) A( t, t) A( t, t)
0 A( t, t) A( t, t) elimination
=⇒
A( t, t) A( t, t) 00 A∗( t, t) A( t, t)0 A( t, t) A( t, t)
eliminated node
inner node
bounary node
outer node⇒
![Page 18: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/18.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 19: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/19.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 20: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/20.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 21: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/21.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 22: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/22.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 23: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/23.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 24: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/24.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 25: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/25.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 26: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/26.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 27: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/27.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 28: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/28.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 29: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/29.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Two Full Elimination Processes
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused
eliminated node
inner node
bounary node
outer node
target node
![Page 30: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/30.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
![Page 31: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/31.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
![Page 32: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/32.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 33: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/33.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 34: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/34.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 35: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/35.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 36: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/36.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 37: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/37.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Extensions and Optimizations
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity
rewrite the one step elimination:A∗( t, t) def
= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse
Exploit to optimize!The elimination preserves symmetry andthis further reduces cost
t t t tt × r × 0t r × 0 ×t × 0 × ×t 0 × × ×
![Page 38: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/38.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Outline
1 Background
2 Serial FIND (Fast Inverse using Nested Dissection)
3 Simulation Results
4 Parallel Methods
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 39: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/39.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Simulation Device
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 40: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/40.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Running Time ComparisonLog-Log Scale with Reference Lines
1
8
64
512
4096
32768
64 128 256 512 1024
Run
ning
tim
e (s
econ
d)
n (=Nx=Ny)
Running Time ComparisonBetween FIND and RGF
FINDO(n3)RGF
O(n4)
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 41: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/41.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Memory Cost Comparison
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
FIND: O(N log(N))
RGF: O(N3/2)
![Page 42: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/42.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Outline
1 Background
2 Serial FIND (Fast Inverse using Nested Dissection)
3 Simulation Results
4 Parallel Methods
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 43: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/43.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
How to Parallelize?
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
Straightforward for leaf clustersTop level clusters dominate runningtime with less degree of parallelismUse the idle processors for redundantcomputationsMore floating point operations butshorter wall clock timeWorks for 1D, 2D, and 3D domains
![Page 44: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/44.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Problem and Processor Settings
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
16 processors, 16 clusters in 1DOne target cluster per processorKeep merging all the other clustersuntil we have them all merged as thecomplement of the target clusterEliminate the merged complementclusters and compute the inverse
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 45: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/45.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Problem and Processor Settings
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
16 processors, 16 clusters in 1DOne target cluster per processorKeep merging all the other clustersuntil we have them all merged as thecomplement of the target clusterEliminate the merged complementclusters and compute the inverse
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 46: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/46.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Problem and Processor Settings
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
16 processors, 16 clusters in 1DOne target cluster per processorKeep merging all the other clustersuntil we have them all merged as thecomplement of the target clusterEliminate the merged complementclusters and compute the inverse
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 47: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/47.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Problem and Processor Settings
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
16 processors, 16 clusters in 1DOne target cluster per processorKeep merging all the other clustersuntil we have them all merged as thecomplement of the target clusterEliminate the merged complementclusters and compute the inverse
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 48: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/48.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Problem and Processor Settings
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
16 processors, 16 clusters in 1DOne target cluster per processorKeep merging all the other clustersuntil we have them all merged as thecomplement of the target clusterEliminate the merged complementclusters and compute the inverse
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 49: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/49.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 50: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/50.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 51: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/51.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 52: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/52.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 53: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/53.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 54: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/54.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 55: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/55.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 56: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/56.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 57: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/57.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 58: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/58.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Detailed Merging Process
P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0P0
P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1P1
P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2P2
P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3P3
P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4P4
P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5P5
P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6P6
P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7P7
P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8P8
P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9P9
P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10P10
P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11P11
P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12P12
P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13P13
P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14P14
P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15P15
Each processor keeps thecomplement of its target cluster withrespect to the current subdomainStart with subdomain of size 2Expand to subdomains of size 4Some processors are idleUse them to prepare for the nextsubdomain expansionUntil the subdomain is expanded tothe whole domainAdditional speedup of factor 2
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 59: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/59.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Communication Pattern
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm
![Page 60: Optimization and Parallelization of FIND Algorithm](https://reader035.fdocuments.in/reader035/viewer/2022071601/613d31f5736caf36b75a7199/html5/thumbnails/60.jpg)
BackgroundSerial FIND (Fast Inverse using Nested Dissection)
Simulation ResultsParallel Methods
Summary
Direct method for fast inverseTwo extensions, two optimizationsAn optimal parallel schemeCollaboration with other groups for moreapplications
Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm