Lec 08 Triangular Factorization
-
Upload
arpan-seth -
Category
Documents
-
view
218 -
download
0
Transcript of Lec 08 Triangular Factorization
-
8/3/2019 Lec 08 Triangular Factorization
1/22
Triangular Factorization: 1
LU Factorization
system o near equat ons s g ven as
where A is nxn matrix, b is a vector of length n, and x is an unknown
bAx
vector of length n; solving the above system involves determining the
value of each element of x
Any nonsingular matrix A can be expressed as a product of a lower
triangular matrix, L, and an upper triangular matrix U, such that
and the linear system can be written as
LUA
Using this form, the linear system can be solved as follows:
=
bLUx
Solve Ux = y
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
2/22
Triangular Factorization: 2
Gaussian Elimination
auss an e m nat on can e use to so ve t e near system w t t e
nxn matrix A and the right hand side vector b
The first stage converts A to an upper triangular form; identical
opera ons are app e o e r g an s e
This stage overwrites A with L-1A and b with L-1b
Gaussian Elimination
for k = 1:n-1,for I = k+1:n,
m(i) = A(i,k)/A(k,k);
end;
for i = k+1:n,
for j=k+1:n,
A(i,j) = A(i,j)- m(i)*A(k,j);
end;b(i) = b(i) - m(i)*b(k);
end;
for I = k+1:n,
A(k+1:n,k) = m(k+1:n);
CPSC 659 Spring 2011 2011 Vivek Sarin
end;
end;
-
8/3/2019 Lec 08 Triangular Factorization
3/22
Triangular Factorization: 3
Back Substitution
t t e en o auss e m nat on, upper tr angu ar part o , nc u ng
diagonal, stores U; lower triangular part of A, excluding diagonal,stores L (L is unit lower triangular, i.e., diagonal entries of L are
-1
Next use back substitution to solve Ux = y; b gets overwritten with x
Back Substitution
,
for k = n-1:-1:1,
for i=1:k,
b(i) = b(i) - A(i,k+1)*b(k+1);
end;
b(k) = b(k)/A(k,k);
end;
Complexity
Stage Operations Data
Compute L & U 2n3/3 n2
CPSC 659 Spring 2011 2011 Vivek Sarin
Solve Ly = b n2 n2/2
Solve Ux = y n2 n2/2
-
8/3/2019 Lec 08 Triangular Factorization
4/22
Triangular Factorization: 4
Standard LU Factorization
for k=1:n-1,for i=k+1:n,
m(i) = A(i,k)/A(k,k);
for j=k+1:n,
A(i,j) = A(i,j) - m(i)*A(k,j);
end;
A i k = m i
end;
end;
lu
mnsofL
A
(k-1)c
Columnkof
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
5/22
Triangular Factorization: 5
Parallel LU with Row Partitioning
oc m consecut ve rows toget er nto a s ng e tas t at s ass gne to
each processor (p=n/m)
At the kth step, the kth row of U is broadcast to each processor
Processors in charge of matrix rows k thru n compute the kth column
of L and update the active part of matrix A with rank1 update
P0
co
lumnsofL
fA
P2
P1
(k-1)
Columnko
P3
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
6/22
Triangular Factorization: 6
Parallel LU with Row Partitioning
oa a anc ng
Each processor becomes idle when its last row has been processed
Work reduces as the algorithm progresses, i.e., work at the kth step is
proportional to (nk)2
Concurrency and load balance can be improved by assigning rows to
tasks in a cyclic manner; in this case,
2 31
121
2( ) 2
3
n
comp c
kn
n k nT t
p p
1
( )2
bcomm s b s
k
T t t n k nt
ommun ca on can e over appe w compu a on
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
7/22
Triangular Factorization: 7
Cyclic Row Partitioning
P0
P3
P2
P1
P0
lumnsofL
A
P3
P2
1
P0
P
P1
(k-1)c
Columnkof
P3
P0
P3
P2
P1
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
8/22
Triangular Factorization: 8
Parallel LU with Column Partitioning
oc consecut ve co umns toget er nto a s ng e tas t at s ass gne
to each processor (p=n/m)
At the kth step, processor that owns column k computes the kth
co umn o , an roa cas s o a processors
Processors in charge of columns k+1 thru n update the active part of
matrix A with rank-1 update 1st blockof n/p
pth
block
of n/pBlock of
2nd
block
of n/p
rows rowsn p rowsrows
lu
mnsofL
A
P0 P3P2P1
(k-1)co
Columnkof
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
9/22
Triangular Factorization: 9
Parallel LU with Column Partitioning
oa a anc ng
Each processor becomes idle when its last column has been processed
Work reduces as the algorithm progresses, i.e., work at the kth step is
proportional to (nk)2.
Concurrency and load balance can be improved by assigning columns
to tasks in a cyclic manner; in this case,
3
2)(2
21
31
1
2
nt
p
n
p
kntT
n
n
k
ccomp
21nn s
k
bscomm
ommun ca on can e over appe w compu a on
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
10/22
Triangular Factorization: 10
Cyclic Column Partitioning
lumnsofL
A
(k-1)c
Columnkof
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
11/22
Triangular Factorization: 11
Parallel LU with Submatrix Partitioning
art t on matr x nto su matr ces o s ze mm, w ere m=n p an
assign a submatrix to each processor
At the kth step
Processors that own column k compute the kth column of L
The kth column of L is broadcast along processor row, and the kth
row of U is broadcast along processor column
Processors in charge of matrix columns k+1 thru n update theactive part of matrix A with rank-1 update
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
12/22
Triangular Factorization: 12
Parallel LU with Submatrix Partitioning
P0 P3P2P1
P4 P7P6P5
P11P10P9P8lumnsofL
A
P15P14P13P12
(k-1)c
Columnkof
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
13/22
Triangular Factorization: 13
Parallel LU with Submatrix Partitioning
oa a anc ng
Each processor becomes idle when its last row and column has been
processed
Work reduces as the algorithm progresses, i.e., work at the kth step is
proportional to (nk)2.
Concurrency and load balance can be improved by assigning smaller
submatrices to tasks in a cyclic manner along rows as well as columns
2 31 2( ) 2n n k nT t
1
21
1
3
( )2 2
k
nb
comm s b s
k
p
t nn kT t t nt
p
Communication can be overlapped with computation
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
14/22
Triangular Factorization: 14
Cyclic Submatrix Partitioning
0
P4
321
P7P6P5
P11P10
P15P14
P9
P13P12
P8
0
P4
321
P7P6P5
P11P10
P15P14
P9
P13P12
P8
0
P4
321
P7P6P5
P11P10
P15P14
P9
P13P12
P8
lumnsofL
A
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
(k-1)c
Columnkof
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P15P14P13P12P15P14P13P12P15P14P13P12
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
15/22
Triangular Factorization: 15
Pivoting
tan ar actor zat on a gor t m may not pro uce very accurate
LU factors Reordering the rows and columns of A before computing the LU
ac or za on mproves s a y o e a gor m; o course, suc
reordering does not change the solution of the linear system
If P1 and P2 are the row and column permutation matrices, we solve
The LU factorization of the reordered matrix, P1AP2=LU, is used to
solve the system as follows:
zPbPPP 2121
Partial Pivoting: only rows are reordered; produces stable LU factors
in most cases
zPxyUzbPLy 21 ,,
Complete Pivoting:both rows and columns are reordered; guaranteed
to produce stable LU factors
Pivotin is costl on a arallel com uter as it re uires communication
CPSC 659 Spring 2011 2011 Vivek Sarin
and disrupts overlap of communication with computation
-
8/3/2019 Lec 08 Triangular Factorization
16/22
Triangular Factorization: 16
Partial Pivoting
t t e t step, t e row av ng t e e ement w t t e argest magn tu e
in column k is exchanged with the kth row before computing columnof L and performing rank1 update
earc or e arges e emen , .e., e p vo , can e cos y on a
parallel computer
Column partitioned approach: pivot search is within the processor
own ng co umn Row partitioned approach: pivot search is a reduction operation among
active processors
Submatrix partitioned approach: pivot search is a reduction operation
among active processors along a column that own column k of the
matrix
Alternative approaches to partial pivoting trade-off stability for
parallelism
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
17/22
Triangular Factorization: 17
Alternate Forms of LU Factorization
-
for i=2:n,for k=1:i-1,
m(k) = A(i,k)/A(k,k);
for j=k+1:n,
A(i,j)=A(i,j)-m(k)*A(k,j);
end;
A i k = m k
Read
end;
end; (i-1) rows of U
(i-1) rows of L
pivotRow i of L Row i of A
Computed
Active part of matrix A, which is
modified by matrix-vector product
CPSC 659 Spring 2011 2011 Vivek Sarin
nc ange
-
8/3/2019 Lec 08 Triangular Factorization
18/22
-
8/3/2019 Lec 08 Triangular Factorization
19/22
Triangular Factorization: 19
Symmetric Positive Definite Matrices
ymmetr c os t ve e n te matr ces are an mportant c ass o
matrices that have real, positive eigenvalues, and satisfy the followingconditions
=
xTAx > 0 for any non-zero vector
A symmetric matrix can be written as A = LDLT, where
D is a diagonal matrix L is a lower triangular matrix with ones on the diagonal
LU factorization of A ives U=DLT therefore D = dia U
An SPD matrix has D with positive elements, and can be written as
A=RTR, where R is an upper triangular matrix such that RT = LD1/2
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
20/22
Triangular Factorization: 20
Cholesky Factorization of SPD Matrices
ct ve part o matr x s a ways
Square roots are computed for positive numbers
No pivoting is needed for numerical stability
Only upper triangle of A is accessed; overwritten by R
Complexity = n3/3 operations, half as many as Gaussian Elimination
Cholesky Factorizationfor k=1:n,
for i=k+1:n,
= , ,
for j=i:n,
A(i,j)=A(i,j)- m(i)*A(k,j);
end;A(k,i) = A(k,i)/sqrt(A(k,k));
end;
A(k,k) = sqrt(A(k,k));
CPSC 659 Spring 2011 2011 Vivek Sarin
end;
-
8/3/2019 Lec 08 Triangular Factorization
21/22
Triangular Factorization: 21
Alternate Forms of Cholesky Factorization
njofR
Co
lu
CPSC 659 Spring 2011 2011 Vivek Sarin
-
8/3/2019 Lec 08 Triangular Factorization
22/22
Triangular Factorization: 22
Parallel Cholesky Factorization
P0
P4
P3P2P1
P7P6P5
P11P10
P15P14
P9
P13P12
P8
P0
P4
P3P2P1
P7P6P5
P11P10
P15P14
P9
P13P12
P8
P0 P3P2P1
P7P6P5
P11P10
P15
(k-1) rows of R
P0
P4
P3P2P1
P7P6P5
P11P10P9P8
P0 P3P2P1
P7P6P5
P11P10text
pivot Row k of A
1514131215
P0 P3P2P1
P7P6P5
P11P10
Active part of matrix A, which is
modified by rank-1 update
Updated
P15
CPSC 659 Spring 2011 2011 Vivek Sarin