Lec 08 Triangular Factorization

download Lec 08 Triangular Factorization

of 22

Transcript of Lec 08 Triangular Factorization

  • 8/3/2019 Lec 08 Triangular Factorization

    1/22

    Triangular Factorization: 1

    LU Factorization

    system o near equat ons s g ven as

    where A is nxn matrix, b is a vector of length n, and x is an unknown

    bAx

    vector of length n; solving the above system involves determining the

    value of each element of x

    Any nonsingular matrix A can be expressed as a product of a lower

    triangular matrix, L, and an upper triangular matrix U, such that

    and the linear system can be written as

    LUA

    Using this form, the linear system can be solved as follows:

    =

    bLUx

    Solve Ux = y

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    2/22

    Triangular Factorization: 2

    Gaussian Elimination

    auss an e m nat on can e use to so ve t e near system w t t e

    nxn matrix A and the right hand side vector b

    The first stage converts A to an upper triangular form; identical

    opera ons are app e o e r g an s e

    This stage overwrites A with L-1A and b with L-1b

    Gaussian Elimination

    for k = 1:n-1,for I = k+1:n,

    m(i) = A(i,k)/A(k,k);

    end;

    for i = k+1:n,

    for j=k+1:n,

    A(i,j) = A(i,j)- m(i)*A(k,j);

    end;b(i) = b(i) - m(i)*b(k);

    end;

    for I = k+1:n,

    A(k+1:n,k) = m(k+1:n);

    CPSC 659 Spring 2011 2011 Vivek Sarin

    end;

    end;

  • 8/3/2019 Lec 08 Triangular Factorization

    3/22

    Triangular Factorization: 3

    Back Substitution

    t t e en o auss e m nat on, upper tr angu ar part o , nc u ng

    diagonal, stores U; lower triangular part of A, excluding diagonal,stores L (L is unit lower triangular, i.e., diagonal entries of L are

    -1

    Next use back substitution to solve Ux = y; b gets overwritten with x

    Back Substitution

    ,

    for k = n-1:-1:1,

    for i=1:k,

    b(i) = b(i) - A(i,k+1)*b(k+1);

    end;

    b(k) = b(k)/A(k,k);

    end;

    Complexity

    Stage Operations Data

    Compute L & U 2n3/3 n2

    CPSC 659 Spring 2011 2011 Vivek Sarin

    Solve Ly = b n2 n2/2

    Solve Ux = y n2 n2/2

  • 8/3/2019 Lec 08 Triangular Factorization

    4/22

    Triangular Factorization: 4

    Standard LU Factorization

    for k=1:n-1,for i=k+1:n,

    m(i) = A(i,k)/A(k,k);

    for j=k+1:n,

    A(i,j) = A(i,j) - m(i)*A(k,j);

    end;

    A i k = m i

    end;

    end;

    lu

    mnsofL

    A

    (k-1)c

    Columnkof

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    5/22

    Triangular Factorization: 5

    Parallel LU with Row Partitioning

    oc m consecut ve rows toget er nto a s ng e tas t at s ass gne to

    each processor (p=n/m)

    At the kth step, the kth row of U is broadcast to each processor

    Processors in charge of matrix rows k thru n compute the kth column

    of L and update the active part of matrix A with rank1 update

    P0

    co

    lumnsofL

    fA

    P2

    P1

    (k-1)

    Columnko

    P3

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    6/22

    Triangular Factorization: 6

    Parallel LU with Row Partitioning

    oa a anc ng

    Each processor becomes idle when its last row has been processed

    Work reduces as the algorithm progresses, i.e., work at the kth step is

    proportional to (nk)2

    Concurrency and load balance can be improved by assigning rows to

    tasks in a cyclic manner; in this case,

    2 31

    121

    2( ) 2

    3

    n

    comp c

    kn

    n k nT t

    p p

    1

    ( )2

    bcomm s b s

    k

    T t t n k nt

    ommun ca on can e over appe w compu a on

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    7/22

    Triangular Factorization: 7

    Cyclic Row Partitioning

    P0

    P3

    P2

    P1

    P0

    lumnsofL

    A

    P3

    P2

    1

    P0

    P

    P1

    (k-1)c

    Columnkof

    P3

    P0

    P3

    P2

    P1

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    8/22

    Triangular Factorization: 8

    Parallel LU with Column Partitioning

    oc consecut ve co umns toget er nto a s ng e tas t at s ass gne

    to each processor (p=n/m)

    At the kth step, processor that owns column k computes the kth

    co umn o , an roa cas s o a processors

    Processors in charge of columns k+1 thru n update the active part of

    matrix A with rank-1 update 1st blockof n/p

    pth

    block

    of n/pBlock of

    2nd

    block

    of n/p

    rows rowsn p rowsrows

    lu

    mnsofL

    A

    P0 P3P2P1

    (k-1)co

    Columnkof

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    9/22

    Triangular Factorization: 9

    Parallel LU with Column Partitioning

    oa a anc ng

    Each processor becomes idle when its last column has been processed

    Work reduces as the algorithm progresses, i.e., work at the kth step is

    proportional to (nk)2.

    Concurrency and load balance can be improved by assigning columns

    to tasks in a cyclic manner; in this case,

    3

    2)(2

    21

    31

    1

    2

    nt

    p

    n

    p

    kntT

    n

    n

    k

    ccomp

    21nn s

    k

    bscomm

    ommun ca on can e over appe w compu a on

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    10/22

    Triangular Factorization: 10

    Cyclic Column Partitioning

    lumnsofL

    A

    (k-1)c

    Columnkof

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    11/22

    Triangular Factorization: 11

    Parallel LU with Submatrix Partitioning

    art t on matr x nto su matr ces o s ze mm, w ere m=n p an

    assign a submatrix to each processor

    At the kth step

    Processors that own column k compute the kth column of L

    The kth column of L is broadcast along processor row, and the kth

    row of U is broadcast along processor column

    Processors in charge of matrix columns k+1 thru n update theactive part of matrix A with rank-1 update

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    12/22

    Triangular Factorization: 12

    Parallel LU with Submatrix Partitioning

    P0 P3P2P1

    P4 P7P6P5

    P11P10P9P8lumnsofL

    A

    P15P14P13P12

    (k-1)c

    Columnkof

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    13/22

    Triangular Factorization: 13

    Parallel LU with Submatrix Partitioning

    oa a anc ng

    Each processor becomes idle when its last row and column has been

    processed

    Work reduces as the algorithm progresses, i.e., work at the kth step is

    proportional to (nk)2.

    Concurrency and load balance can be improved by assigning smaller

    submatrices to tasks in a cyclic manner along rows as well as columns

    2 31 2( ) 2n n k nT t

    1

    21

    1

    3

    ( )2 2

    k

    nb

    comm s b s

    k

    p

    t nn kT t t nt

    p

    Communication can be overlapped with computation

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    14/22

    Triangular Factorization: 14

    Cyclic Submatrix Partitioning

    0

    P4

    321

    P7P6P5

    P11P10

    P15P14

    P9

    P13P12

    P8

    0

    P4

    321

    P7P6P5

    P11P10

    P15P14

    P9

    P13P12

    P8

    0

    P4

    321

    P7P6P5

    P11P10

    P15P14

    P9

    P13P12

    P8

    lumnsofL

    A

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    (k-1)c

    Columnkof

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P15P14P13P12P15P14P13P12P15P14P13P12

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    15/22

    Triangular Factorization: 15

    Pivoting

    tan ar actor zat on a gor t m may not pro uce very accurate

    LU factors Reordering the rows and columns of A before computing the LU

    ac or za on mproves s a y o e a gor m; o course, suc

    reordering does not change the solution of the linear system

    If P1 and P2 are the row and column permutation matrices, we solve

    The LU factorization of the reordered matrix, P1AP2=LU, is used to

    solve the system as follows:

    zPbPPP 2121

    Partial Pivoting: only rows are reordered; produces stable LU factors

    in most cases

    zPxyUzbPLy 21 ,,

    Complete Pivoting:both rows and columns are reordered; guaranteed

    to produce stable LU factors

    Pivotin is costl on a arallel com uter as it re uires communication

    CPSC 659 Spring 2011 2011 Vivek Sarin

    and disrupts overlap of communication with computation

  • 8/3/2019 Lec 08 Triangular Factorization

    16/22

    Triangular Factorization: 16

    Partial Pivoting

    t t e t step, t e row av ng t e e ement w t t e argest magn tu e

    in column k is exchanged with the kth row before computing columnof L and performing rank1 update

    earc or e arges e emen , .e., e p vo , can e cos y on a

    parallel computer

    Column partitioned approach: pivot search is within the processor

    own ng co umn Row partitioned approach: pivot search is a reduction operation among

    active processors

    Submatrix partitioned approach: pivot search is a reduction operation

    among active processors along a column that own column k of the

    matrix

    Alternative approaches to partial pivoting trade-off stability for

    parallelism

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    17/22

    Triangular Factorization: 17

    Alternate Forms of LU Factorization

    -

    for i=2:n,for k=1:i-1,

    m(k) = A(i,k)/A(k,k);

    for j=k+1:n,

    A(i,j)=A(i,j)-m(k)*A(k,j);

    end;

    A i k = m k

    Read

    end;

    end; (i-1) rows of U

    (i-1) rows of L

    pivotRow i of L Row i of A

    Computed

    Active part of matrix A, which is

    modified by matrix-vector product

    CPSC 659 Spring 2011 2011 Vivek Sarin

    nc ange

  • 8/3/2019 Lec 08 Triangular Factorization

    18/22

  • 8/3/2019 Lec 08 Triangular Factorization

    19/22

    Triangular Factorization: 19

    Symmetric Positive Definite Matrices

    ymmetr c os t ve e n te matr ces are an mportant c ass o

    matrices that have real, positive eigenvalues, and satisfy the followingconditions

    =

    xTAx > 0 for any non-zero vector

    A symmetric matrix can be written as A = LDLT, where

    D is a diagonal matrix L is a lower triangular matrix with ones on the diagonal

    LU factorization of A ives U=DLT therefore D = dia U

    An SPD matrix has D with positive elements, and can be written as

    A=RTR, where R is an upper triangular matrix such that RT = LD1/2

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    20/22

    Triangular Factorization: 20

    Cholesky Factorization of SPD Matrices

    ct ve part o matr x s a ways

    Square roots are computed for positive numbers

    No pivoting is needed for numerical stability

    Only upper triangle of A is accessed; overwritten by R

    Complexity = n3/3 operations, half as many as Gaussian Elimination

    Cholesky Factorizationfor k=1:n,

    for i=k+1:n,

    = , ,

    for j=i:n,

    A(i,j)=A(i,j)- m(i)*A(k,j);

    end;A(k,i) = A(k,i)/sqrt(A(k,k));

    end;

    A(k,k) = sqrt(A(k,k));

    CPSC 659 Spring 2011 2011 Vivek Sarin

    end;

  • 8/3/2019 Lec 08 Triangular Factorization

    21/22

    Triangular Factorization: 21

    Alternate Forms of Cholesky Factorization

    njofR

    Co

    lu

    CPSC 659 Spring 2011 2011 Vivek Sarin

  • 8/3/2019 Lec 08 Triangular Factorization

    22/22

    Triangular Factorization: 22

    Parallel Cholesky Factorization

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10

    P15P14

    P9

    P13P12

    P8

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10

    P15P14

    P9

    P13P12

    P8

    P0 P3P2P1

    P7P6P5

    P11P10

    P15

    (k-1) rows of R

    P0

    P4

    P3P2P1

    P7P6P5

    P11P10P9P8

    P0 P3P2P1

    P7P6P5

    P11P10text

    pivot Row k of A

    1514131215

    P0 P3P2P1

    P7P6P5

    P11P10

    Active part of matrix A, which is

    modified by rank-1 update

    Updated

    P15

    CPSC 659 Spring 2011 2011 Vivek Sarin