Post on 31-Jan-2018
Lecture Notes 8: Gaussian Elimination—Sequential and Basic
Parallel Algorithms
Shantanu Dutt
ECE Dept., UIC
Acknowledgements
• Parallel Implementations of Gaussian Elimination, Vasilije Perovi´c, Western Michigan University
• Parallel Gaussian Elimination, Univ. of Frankfurt
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Parallel Implementations of Gaussian Elimination
Vasilije Perovic
Western Michigan University
vasilije.perovic@wmich.edu
January 27, 2012
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Linear systems of equations
General form of a linear system of equations is given by
a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...
...am1x1 + · · · + amnxn = bm
where aij ’s and bi ’s are known and we are solving for xi ’s.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Ax = b
More compactly, we can rewrite system of linear equations in theform
Ax = b
where
A =
a11 a12 . . . a1na21 a22 . . . a2n...
.... . .
...am1 am2 . . . amn
, x =
x1x2...xn
, b =
b1b2...bm
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Why study solutions of Ax = b
1 One of the most fundamental problems in the field ofscientific computing
2 Arises in many applications:
Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Why study solutions of Ax = b
1 One of the most fundamental problems in the field ofscientific computing
2 Arises in many applications:Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods
2 Iterative methods
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
Gaussian elimination (LU factorization)QR factorizationWZ factorization
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
Gaussian elimination (LU factorization)
QR factorizationWZ factorization
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
When solving Ax = b we will assume throughout this presentationthat A is non-singular and A and b are known
a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...
...an1x1 + · · · + annxn = bn .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations
a11x1+ a12x2 + · · · + a1nxn = b1a(1)22
x2 + · · · + a(1)2n xn = b(1)
2
......
...
a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,
where
a(1)ij = aij − a1jai1a11
b(1)i = bi − b1ai1a11
i , j = 1, 2, . . . , n .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations
a11x1+ a12x2 + · · · + a1nxn = b1a(1)22
x2 + · · · + a(1)2n xn = b(1)
2
......
...
a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,
where
a(1)ij = aij − a1jai1a11
b(1)i = bi − b1ai1a11
i , j = 1, 2, . . . , n .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination (Forward Reduction)
Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form
a11 a12 · · · a1na(1)22
· · · a(1)2n
. . ....
a(n−1)nn
x1x2· · ·xn
=
b1b(1)2
...
b(n−1)n
.
• The superscripts indicate the number of times the elements hadto be changed.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination (Forward Reduction)
Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form
a11 a12 · · · a1na(1)22
· · · a(1)2n
. . ....
a(n−1)nn
x1x2· · ·xn
=
b1b(1)2
...
b(n−1)n
.
• The superscripts indicate the number of times the elements hadto be changed.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
• Perform the forward reduction the the system given below
4 −9 22 −4 4
−1 2 2
x1x2x3
=
231
We start by writing down the augmented matrix for the givensystem:
4 −9 2 22 −4 4 3
−1 2 2 1
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
• Perform the forward reduction the the system given below
4 −9 22 −4 4
−1 2 2
x1x2x3
=
231
We start by writing down the augmented matrix for the givensystem:
4 −9 2 22 −4 4 3
−1 2 2 1
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
−−→
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
R2−( 2
4)R1→R2−−−−−−−−−−→
R3−(−1
4)R1→R3
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
R2−( 2
4)R1→R2−−−−−−−−−−→
R3−(−1
4)R1→R3
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
R3−(−0.250.5 )R2→R3−−−−−−−−−−−→
4 −9 2 20 0.5 3 20 0 4 2.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 0
−0.5 1 00.25 0 1
·
4 −9 2 22 −4 4 3
−1 2 2 1
=
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 0
−0.5 1 00.25 0 1
� �� �L1
·
4 −9 2 22 −4 4 3
−1 2 2 1
=
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 00 1 00 0.5 1
� �� �L2
·
1 0 0
−0.5 1 00.25 0 1
� �� �L1
· [A|b] =
4 −9 2 20 0.5 3 20 0 4 2.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
1 0 00 1 00 0.5 1
� �� �L2
·
1 0 0
−0.5 1 00.25 0 1
� �� �L1� �� �
�L= L2·L1
· [A|b] =
4 −9 2 20 0.5 3 20 0 4 2.5
Thus, we can write
(L2 · L1) · A = �L · A = U
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
In general, we can think of row operations as multiplying withmatrices on the left, thus the i th elimination step is equivalent asmultiplication on the left by
Li =
1. . .
1−li+1,i
.... . .
−ln,i 1
.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
�LAx = �Lb , �L = Ln−1 · · · L2L1 ,
A = �L−1U = LU .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
�LAx = �Lb , �L = Ln−1 · · · L2L1 ,
A = �L−1U = LU .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
A = �L−1U = LU .
Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:
1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .
• Order of operations:n3
3+ n2 − n
3multiplications/divisions
n3
3+ n2
2− 5n
6.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
A = �L−1U = LU .
Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:
1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .
• Order of operations:n3
3+ n2 − n
3multiplications/divisions
n3
3+ n2
2− 5n
6.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
Apply LU factorization without pivoting to
A =
�0.0001 1
1 1
�
in three-decimal-digit floating point arithmetic.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�0.0001 1
1 1
�
Solution: L and U are easily obtainable and are given by:
L =
�1 0
fl(1/10−4) 1
�, fl(1/10−4) rounds to 104,
U =
�10−4 10 fl(1− 104 · 1)
�, fl(1− 104 · 1) rounds to − 104,
so LU =
�1 0104 1
� �10−4 10 −104
�=
�10−4 11 0
�
but A =
�10−4 11 1
�.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�10−4 11 1
�but LU =
�10−4 11 0
�
Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.
Remark 2: Suppose we attempted to solve system Ax = [1, 2]T
for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�10−4 11 1
�but LU =
�10−4 11 0
�
Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.Remark 2: Suppose we attempted to solve system Ax = [1, 2]T
for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Sequential Algorithm
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
1 Determination of the local pivot element,2 Determination of the global pivot element,3 Exchange of the pivot row,4 Distribution of the pivot row,5 Computation of the elimination factors,6 Computation of the matrix elements.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.
Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
block-cyclic row distribution
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 3: We could also consider column-oriented datadistribution. In that case, at the k th step the processor that ownsk th column would have all needed data to compute the new pivot.On the one hand, this would reduce communication betweenprocessors that we had when considering row-oriented datadistribution. On the other hand, with column orientation pivotdetermination is all done serially. Thus, in case when n >> p(which is often case) row oriented data distribution might be moreadvantageous since the pivot determination is done in parallel
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Parallel Gaussian Elimination
Gaussian Elimination
We work with the rowwise decomposition:– Each processor receives n/p rows of the extended matrix
(A,b)
We parallelize the individual pivoting steps– In step i we find the maximum A(j,i) for i≤ j≤ n
– Swap row j with row i– Broadcast row i to processors holding rows i+1…n– Each processor computes the new
rowj=rowj-A(j,i)/A(i,i) ·rowi
(or O(nP) for a linear array)
(or O(n2P) for a linear array)
for a linear array
and O(n2 log P) for a hypercube • Need to independently reduce comm.
time, as due to much larger constants than computation time, it can dominate or at least significantly affect parallel time
Based on the fact that a pivot is picked from every row, except one, exactly once
recv. row 2 latest @ t = 3n + n2/P + n(P-2) = 2n + n2/P + n(P-1) So a new set of col elim. + row operations done every n +
n2/P time units (of which n time units is idling due to commun. delay) after an initial wait for n(P-1) time unit. When pivot oper. is included for its rows, computation done every 2n + n2/P time units (idling is again n time units). This is the worst-case stage delay.
So total time = n(P-1) + (n-1)(2n + n2/P) Comp. time = Q(n3/P); Commun. time is only Q(nP + n2) [initial pipeline fill time due to commun. only, and subsequent commun. delay per stage of n (causing idling) n(P-1)+(n-1)n time
send row 1 @ t = n recv. row 1 @ t = n + n(P-1)
row oper. done @ t = n2/P + n + n(P-1)
send its piv. row @ t= 3n + n2/P Note: This is the max. delay at proc. 1 in sending the next pivot row ( for the one it owns; next pivot rows from proc. 0 will be sent faster, since the row oper. time is then not needed). Thus worst case stage delay is 2n + n2/P.
recv. row 1 @ t = 2n row oper. done @ t = n2/P + 2n pivot col sel. @ t = n2/P + 3n
start @ t = 0, pivot col sel. @ t = n
Proc. 0 Proc. 1 Proc. P-1
recv. row 2 latest @ t = 3n + n2/P + n(P-2) = 2n + n2/P + n(P-1) So a new set of col elim. + row operations done every n +
n2/P time units (of which n time units is idling due to commun. delay) after an initial wait for n(P-1) time unit. When pivot oper. is included for its rows, computation done every 2n + n2/P time units (idling is again n time units). This is the worst-case stage delay.
So total time = n(P-1) + (n-1)(2n + n2/P) Comp. time = Q(n3/P); Commun. time is only Q(nP + n2) [initial pipeline fill time due to commun. only, and subsequent commun. delay per stage of n (causing idling) n(P-1)+(n-1)n time
send row 1 @ t = n recv. row 1 @ t = n + n(P-1)
row oper. done @ t = n2/P + n + n(P-1)
send its piv. row @ t= 3n + n2/P Note: This is the max. delay at proc. 1 in sending the next pivot row ( for the one it owns; next pivot rows from proc. 0 will be sent faster, since the row oper. time is then not needed). Thus worst case stage delay is 2n + n2/P.
recv. row 1 @ t = 2n row oper. done @ t = n2/P + 2n pivot col sel. @ t = n2/P + 3n
start @ t = 0, pivot col sel. @ t = n
Proc. 0 Proc. 1 Proc. P-1