Design of parallel algorithms
-
Upload
libby-reynolds -
Category
Documents
-
view
53 -
download
1
description
Transcript of Design of parallel algorithms
![Page 1: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/1.jpg)
Design of parallel algorithms
Linear equations
Jari Porras
![Page 2: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/2.jpg)
Linear equations
a0,0x0 + ... + a0,n-1xn-1 = b0
...
an-1,0x0 + ... + an-1,n-1xn-1 = bn-1
• Ax = b• Usually solved in 2 stages
– reduce into upper triangular system Ux = y– back-substitution xn-1 ... x0
• Gaussian elimination
![Page 3: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/3.jpg)
Gaussian elimination
![Page 4: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/4.jpg)
Gaussian elimination
![Page 5: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/5.jpg)
Gaussian elimination
• Gaussian elimination requires– n2/2 divisions (line 6)– (n3/3) – (n2/2) subtractions and multiplications
(line 12)
• Sequential run time 2n3/3
• How is the gausian elimination peformed in parallel ?
![Page 6: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/6.jpg)
Parallel Gaussian elimination
• Row/column striping vs. chackerboarding ?
• Block vs cyclic striped ?
• Number of processors p < n, p = n, p > n
• Active processors ?
• Required steps ?
![Page 7: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/7.jpg)
![Page 8: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/8.jpg)
Analysis
• 1st step– kth iteration requires n – k – 1 divisions at
processor Pk
• 2nd step– (ts + tw(n – k – 1)) log n time on hypercube
• 3rd step– kth iteration requires n – k – 1 multiplications
and subtractions at all processors Pi
• Tp = 3/2 n(n-1) + tsnlog n + ½ twn(n-1)logn
![Page 9: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/9.jpg)
Analysis
• Not cost-optimal since pTp = (n3logn)
• What is the main reason ?– Inefficient parallelization ?– What could be done ?
![Page 10: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/10.jpg)
![Page 11: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/11.jpg)
![Page 12: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/12.jpg)
Analysis
• Pipelined operation– all n steps are executed in parallel– last step starts in nth step and is completed in
constant time (changes only the bottm right corner element)
(n) steps– Each step takes O(n) time– Thus parallel run time O(n2) and cost (n3)– Cost-optimal !!
![Page 13: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/13.jpg)
p < n ?
• Block striping– several rows / processor
• Does the activity change ?– Block vs. cyclic striping
![Page 14: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/14.jpg)
![Page 15: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/15.jpg)
![Page 16: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/16.jpg)
Analysis
• With block striping– processor with all rows belonging to the active
part performs (n – k – 1)n/p multiplications and subtractions
– if the pipelined version is used the number of arithmetic operations (2(n-k-1)n/p) is higher than number of words communicated (n-k-1)
– computation dominates– parallel run time n3/p
![Page 17: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/17.jpg)
Checkeboard partitioning
• Use n x n mesh
• Same approach as before, but– requires two broadcasts (rowwise and
columnwise)– Analyse the cost-optimality
• How about the pipelining ?
![Page 18: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/18.jpg)
![Page 19: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/19.jpg)
Pipelined checkerboard
![Page 20: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/20.jpg)
Pipelined checkerboard
![Page 21: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/21.jpg)
p < n2
• Map matrix onto p x p mesh by usin block checkerboard partitioning
• Remember the effect of active processors !!
• Number of multiplications and subtractions n2/p and n/ p word communication– computation dominates !
![Page 22: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/22.jpg)
![Page 23: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/23.jpg)
![Page 24: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/24.jpg)
Partial pivoting
• Basic algorithm fails if any elemnt on diagonal is zero
• Partial pivoting helps– select row that has the largest element on the
wanted column and exchange rows
• What is the effect to the partitioning strategy ?
• How about pipelining
![Page 25: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/25.jpg)
Back-substitution
• The second stage of solving linear equations
• Back-substitution is used to determine vector x
• Complexity n2
– use partitioning scheme that is suitable for Gaussian elimination
![Page 26: Design of parallel algorithms](https://reader036.fdocuments.in/reader036/viewer/2022062321/568137c9550346895d9f66c4/html5/thumbnails/26.jpg)
Back substitution