Intro Sparse

download Intro Sparse

of 112

Transcript of Intro Sparse

  • 8/6/2019 Intro Sparse

    1/112

    CR07 Sparse Matrix Computations /Cours de Matrices Creuses (2010-2011)

    Jean-Yves LExcellent (INRIA) and Bora Ucar (CNRS)

    LIP-ENS Lyon, Team: ROMA

    ([email protected], [email protected])

    Fridays 10h15-12h15

    prepared in collaboration with P. Amestoy (ENSEEIHT-IRIT)

    1/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    2/112

    Motivations Applications ayant un besoin croissant en puissance de calcul:

    modelisation,

    simulation (plutot quexperimentation), optimisation numerique

    Typiquement:Probleme continu Discretisation (maillage)

    Algorithme numerique de resolution (selon lois physiques)

    Probleme matriciel (Ax = b, . . .) Besoins:

    Modelisations de plus en plus precises Problemes de plus en plus complexes Applications critiques en temps de reponse Minimisation des couts du calcul

    Calculateurs haute performance, parallelisme Algorithmes permettant de tirer le meilleur parti de ces

    calculateurs et des proprietes des matrices considerees

    2/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    3/112

    Example of preprocessing for Ax = b

    Original (A =lhr01) Preprocessed matrix (A

    (lhr01))

    0 200 400 600 800 1000 1200 1400

    0

    200

    400

    600

    800

    1000

    1200

    1400

    nz = 18427

    0 200 400 600 800 1000 1200 1400

    0

    200

    400

    600

    800

    1000

    1200

    1400

    nz = 18427

    Modified Problem:Ax = b with A = PnPDrAQDcPt

    3/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    4/112

    Quelques exemples dans le domaine du calculscientifique

    Constraints of duration: weather forecast

    4/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    5/112

    A few examples in the field of scientific computing

    Cost constraints: wind tunnels, crash simulation, . . .

    5/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    6/112

    Scale Constraints

    large scale: climate modelling, pollution, astrophysics tiny scale: combustion, quantum chemistry

    6/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    7/112

    Contents of the course

    Introduction, reminders on graph theory and on numerical

    linear algebra- Introduction to sparse matrices- Graphs and hypergraphs, trees, some classical algorithms- Gaussian elimination, LU factorization, fill-in- Conditioning/sensitivity of a problem and error analysis

    7/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    8/112

    Contents of the course

    Graph

    Sparse Gaussian elimination

    - Sparse matrices and graphs- Gaussian elimination of sparse matrices- Ordering and permuting sparse matrices

    7/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    9/112

    Contents of the course

    Maximum (weighted) Matching algorithms and their use in

    sparse linear algebra

    Efficient factorization methods

    - Implementation of sparse direct solvers- Parallel sparse solvers

    - Scheduling of computations to optimize memory usageand/or performance

    Graph and hypergraph partitioning

    Iterative methods

    Current research activities

    7/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    10/112

    Tentative outline

    A. INTRODUCTION

    I. Sparse matrices

    II. Graph theory and algorithmsIII. Linear algebra basics

    ------------------------------------------------------

    B. SPARSE GAUSSIAN ELIMINATION

    IV. Elimination tree and structure prediction

    V. Fill-reducing ordering methods

    VI. Matching in bipartite graphs

    VII. Factorization: Methods

    VIII. Factorization: Parallelization aspects

    ------------------------------------------------------

    C. SOME OTHER ESSENTIAL SPARSE MATRIX ALGORITHMS

    IX. Graph and hypergraph partitioningX. Iterative methods

    ------------------------------------------------------

    D. CLOSING

    XI. Current research activities

    XII. Presentations8/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    11/112

    Tentative organization of the course

    References, teaching material will be made available athttp://graal.ens-lyon.fr/~bucar/CR07/

    2+ hours dedicated to exercises (manipulation of sparsematrices and graphs)

    Evaluation: mainly based on the study of research articles

    related to the contents of the course; the students will write areport and do an oral presentation.

    9/ 94

    http://graal.ens-lyon.fr/~bucar/CR07/http://graal.ens-lyon.fr/~bucar/CR07/http://graal.ens-lyon.fr/~bucar/CR07/http://graal.ens-lyon.fr/~bucar/CR07/http://find/http://goback/
  • 8/6/2019 Intro Sparse

    12/112

    Outline

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matrices

    Gaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

    10/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    13/112

    A selection of references

    Books

    Duff, Erisman and Reid, Direct methods for Sparse Matrices,Clarendon Press, Oxford 1986. Dongarra, Duff, Sorensen and H. A. van der Vorst, Solving

    Linear Systems on Vector and Shared Memory Computers,SIAM, 1991.

    Davis, Direct methods for sparse linear systems, SIAM, 2006. Saad, Iterative methods for sparse linear systems, 2nd edition,

    SIAM, 2004.

    Articles

    Gilbert and Liu, Elimination structures for unsymmetric sparse

    LU factors, SIMAX, 1993. Liu, The role of elimination trees in sparse factorization,

    SIMAX, 1990. Heath and E. Ng and B. W. Peyton, Parallel Algorithms for

    Sparse Linear Systems, SIAM review 1991.

    11/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    14/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian elimination

    Parallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

    12/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    15/112

    Motivations

    solution of linear systems of equations key algorithmic

    kernelContinuous problem

    Discretization

    Solution of a linear system Ax = b Main parameters:

    Numerical properties of the linear system (symmetry, pos.definite, conditioning, . . . )

    Size and structure: Large (> 1000000 1000000 ?), square/rectangular Dense or sparse (structured / unstructured) Target computer (sequential/parallel/multicore)

    Algorithmic choices are critical

    13/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    16/112

    Motivations for designing efficient algorithms

    Time-critical applications

    Solve larger problems

    Decrease elapsed time (parallelism ?)

    Minimize cost of computations (time, memory)

    14/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    17/112

    Difficulties

    Access to data : Computer : complex memory hierarchy (registers, multilevel

    cache, main memory (shared or distributed), disk) Sparse matrix : large irregular dynamic data structures.

    Exploit the locality of references to data on the computer(design algorithms providing such locality)

    Efficiency (time and memory) Number of operations and memory depend very much on the

    algorithm used and on the numerical and structural propertiesof the problem.

    The algorithm depends on the target computer (vector, scalar,shared, distributed, clusters of Symmetric Multi-Processors(SMP), multicore).

    Algorithmic choices are critical

    15/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    18/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian elimination

    Parallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

    16/ 94

    S i

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    19/112

    Sparse matrices

    Example:3 x1 + 2 x2 = 5

    2 x2 - 5 x3 = 12 x1 + 3 x3 = 0

    can be represented as

    Ax = b,

    where A = 3 2 0

    0 2 52 0 3

    , x = x1

    x2x3

    , and b =

    510

    Sparse matrix: only nonzeros are stored.

    17/ 94

    S i ?

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    20/112

    Sparse matrix ?

    0 100 200 300 400 500

    0

    100

    200

    300

    400

    500

    nz = 5104

    Matrix dwt 592.rua (N=592, NZ=5104);Structural analysis of a submarine

    18/ 94

    S t i ?

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    21/112

    Sparse matrix ?

    Matrix from Computational Fluid Dynamics;(collaboration Univ. Tel Aviv)

    0 1000 2000 3000 4000 5000 6000 7000

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    nz = 43105

    Saddle-point problem

    19/ 94

    P i t i

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    22/112

    Preprocessing sparse matrices

    Original (A =lhr01) Preprocessed matrix (A(lhr01))

    0 200 400 600 800 1000 1200 1400

    0

    200

    400

    600

    800

    1000

    1200

    1400

    nz = 18427

    0 200 400 600 800 1000 1200 1400

    0

    200

    400

    600

    800

    1000

    1200

    1400

    nz = 18427

    Modified Problem:Ax = b with A = PnPDrADcQPt

    20/ 94

    F t i ti

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    23/112

    Factorization process

    Solution of Ax = b A is unsymmetric :

    A is factorized as: A = LU, whereL is a lower triangular matrix, andU is an upper triangular matrix.

    Forward-backward substitution: Ly = b then Ux = y A is symmetric:

    A = LDLT or LLT

    A is rectangular m n with m n and minxAx b2 : A = QR where Q is orthogonal (Q1 = QT) and R is

    triangular. Solve: y = QTb then Rx = y

    21/ 94

    Factorization process

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    24/112

    Factorization process

    Solution of Ax = b A is unsymmetric :

    A is factorized as: A = LU, whereL is a lower triangular matrix, andU is an upper triangular matrix.

    Forward-backward substitution: Ly = b then Ux = y A is symmetric:

    A = LDLT or LLT

    A is rectangular m n with m n and minxAx b2 : A = QR where Q is orthogonal (Q1 = QT) and R is

    triangular. Solve: y = QTb then Rx = y

    21/ 94

    Factorization process

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    25/112

    Factorization process

    Solution of Ax = b A is unsymmetric :

    A is factorized as: A = LU, whereL is a lower triangular matrix, andU is an upper triangular matrix.

    Forward-backward substitution: Ly = b then Ux = y A is symmetric:

    A = LDLT or LLT

    A is rectangular m n with m n and minxAx b2 : A = QR where Q is orthogonal (Q1 = QT) and R is

    triangular. Solve: y = QTb then Rx = y

    21/ 94

    Difficulties

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    26/112

    Difficulties

    Only non-zero values are stored

    Factors L and U have far more nonzeros than A

    Data structures are complex

    Computations are only a small portion of the code (the rest isdata manipulation)

    Memory size is a limiting factor ( out-of-core solvers )

    22/ 94

    Key numbers:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    27/112

    Key numbers:

    1- Small sizes : 500 MB matrix;Factors = 5 GB; Flops = 100 Gflops ;

    2- Example of 2D problem: Lab. Geosiences Azur, Valbonne Complex 2D finite difference matrix n=16 106 , 150 106

    nonzeros Storage (single prec): 2 GB (12 GB with the factors) Flops: 10 TeraFlops

    3- Example of 3D problem: EDF (Code Aster, structuralengineering) real matrix finite elements n = 106 , nz = 71 106 nonzeros Storage: 3.5 109 entries (28 GB) for factors, 35 GB total Flops: 2.1

    1013

    4- Typical performance (MUMPS): PC LINUX 1 core (P4, 2GHz) : 1.0 GFlops/s Cray T3E (512 procs) : Speed-up 170, Perf. 71 GFlops/s AMD Opteron 8431, 24 [email protected] GHz: 50 GFlops/s (1 core:

    7 GFlop/s)

    23/ 94

    Typical test problems:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    28/112

    Typical test problems:

    BMW car body,227,362 unknowns,5,757,996 nonzeros,MSC.Software

    Size of factors: 51.1 million entriesNumber of operations: 44.9 109

    24/ 94

    Typical test problems:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    29/112

    Typical test problems:

    BMW crankshaft,

    148,770 unknowns,5,396,386 nonzeros,MSC.Software

    Size of factors: 97.2 million entriesNumber of operations: 127.9 109

    25/ 94

    Sources of parallelism

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    30/112

    Sources of parallelism

    Several levels of parallelism can be exploited:

    At problem level: problem can be decomposed intosub-problems (e.g. domain decomposition)

    At matrix level: Sparsity implies independency in calculation

    At submatrix level: within dense linear algebra computations(parallel BLAS, . . . )

    26/ 94

    Data structure for sparse matrices

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    31/112

    Data structure for sparse matrices

    Storage scheme depends on the pattern of the matrix and onthe type of access required band or variable-band matrices block bordered or block tridiagonal matrices general matrix row, column or diagonal access

    27/ 94

    Data formats for a general sparse matrix A

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    32/112

    Data formats for a general sparse matrix A

    What needs to be represented

    Assembled matrices: MxN matrix A with NNZ nonzeros.

    Elemental matrices (unassembled): MxN matrix A with NELTelements.

    Arithmetic: Real (4 or 8 bytes) or complex (8 or 16 bytes)

    Symmetric (or Hermitian) store only part of the data.

    Distributed format ?

    Duplicate entries and/or out-of-range values ?

    28/ 94

    Classical Data Formats for Assembled Matrices

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    33/112

    Classical Data Formats for Assembled Matrices Example of a 3x3 matrix with NNZ=5 nonzeros

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    Coordinate formatIRN [1 : NNZ] = 1 3 2 2 3

    JCN [1 : NNZ] = 1 1 2 3 3VAL [1 : NNZ] = a11 a31 a22 a23 a33 Compressed Sparse Column (CSC) format

    IRN [1 : NNZ] = 1 3 2 2 3VAL [1 : NNZ] = a11 a31 a22 a23 a33

    COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)- Compressed Sparse Row (CSR) format:

    Similar to CSC, but row by row Diagonal format (M=N):

    NDIAG = 3 29/ 94

    Classical Data Formats for Assembled Matrices

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    34/112

    Example of a 3x3 matrix with NNZ=5 nonzeros

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    Coordinate formatIRN [1 : NNZ] = 1 3 2 2 3

    JCN [1 : NNZ] = 1 1 2 3 3VAL [1 : NNZ] = a11 a31 a22 a23 a33 Compressed Sparse Column (CSC) format

    IRN [1 : NNZ] = 1 3 2 2 3VAL [1 : NNZ] = a11 a31 a22 a23 a33

    COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)- Compressed Sparse Row (CSR) format:

    Similar to CSC, but row by row Diagonal format (M=N):

    NDIAG = 3 29/ 94

    Classical Data Formats for Assembled Matrices

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    35/112

    Example of a 3x3 matrix with NNZ=5 nonzeros

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    Coordinate formatIRN [1 : NNZ] = 1 3 2 2 3

    JCN [1 : NNZ] = 1 1 2 3 3VAL [1 : NNZ] = a11 a31 a22 a23 a33 Compressed Sparse Column (CSC) format

    IRN [1 : NNZ] = 1 3 2 2 3VAL [1 : NNZ] = a11 a31 a22 a23 a33

    COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)- Compressed Sparse Row (CSR) format:

    Similar to CSC, but row by row Diagonal format (M=N):

    NDIAG = 3 29/ 94

    Classical Data Formats for Assembled Matrices

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    36/112

    Example of a 3x3 matrix with NNZ=5 nonzeros

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    Diagonal format (M=N):NDIAG = 3IDIAG = 2 0 1

    VAL =

    na a11 0na a22 a23

    a31 a33 na

    (na: not accessed)

    VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 i + IDIAG(j) N)

    29/ 94

    Sparse Matrix-vector products Y AX

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    37/112

    p p

    Algorithm depends on sparse matrix format:

    Coordinate format:Y ( 1 : M ) = 0DO k=1,NNZ

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k ) X(JCN(k))ENDDO

    CSC format:

    CSR format

    30/ 94

    Sparse Matrix-vector products Y AX

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    38/112

    Algorithm depends on sparse matrix format:

    Coordinate format:Y ( 1 : M ) = 0DO k=1,NNZ

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k ) X(JCN(k))ENDDO

    CSC format:

    Y ( 1 : M ) = 0DO J=1,NXj=X(J)DO k=COLPTR( J ) ,COLPTR( J+1)1

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k )XjENDDO

    ENDDO

    CSR format

    30/ 94

    Sparse Matrix-vector products Y AX

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    39/112

    Algorithm depends on sparse matrix format:

    Coordinate format:Y ( 1 : M ) = 0DO k=1,NNZ

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k ) X(JCN(k))ENDDO

    CSC format:

    Y ( 1 : M ) = 0DO J=1,NXj=X(J)DO k=COLPTR( J ) ,COLPTR( J+1)1

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k )XjENDDO

    ENDDO

    CSR formatDO I =1,MYi=0

    DO k=ROWPTR( I ) ,ROWPTR( I +1)1Y i = Y i + V A L ( k )X(JCN(k))

    ENDDOY( I)=Yi

    ENDDO

    30/ 94

    Sparse Matrix-vector products Y AX

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    40/112

    Algorithm depends on sparse matrix format:

    Coordinate format:Y ( 1 : M ) = 0DO k=1,NNZ

    Y ( I R N ( k ) ) = Y ( I R N ( k ) ) + V A L ( k ) X(JCN(k))ENDDO

    Diagonal format: (VAL(i,j) corresponds to A(i,i+IDIAG(j)))

    Y ( 1 : N ) = 0DO j =1 ,NDIAG

    DO i= max(1,1 IDIAG( j )) , min(N ,NIDIAG( j ))Y( i ) = Y( i ) + VAL( i , j )X( i+IDIAG ( j ))

    END DOEND DO

    30/ 94

    Jagged diagonal storage (JDS)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    41/112

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    1. Shift all elements left (similar to CSR) and keep column

    indices

    a11 (1)

    a22 (2) a23 (3)a31 (1) a33 (3)

    2. Sort rows in decreasing order of their number of nonzeros

    3. Store corresponding row permutation: PERM = 2 3 1

    4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33COL IND = 2 1 1 3 3COL PTR = 1 4 6

    31/ 94

    Jagged diagonal storage (JDS)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    42/112

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    1. Shift all elements left (similar to CSR) and keep column

    indicesa11 (1)a22 (2) a23 (3)

    a31 (1) a33 (3)2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

    3. Store corresponding row permutation: PERM = 2 3 14. Stored jagged diagonals (columns of step 2)

    VAL = a22 a31 a11 a23 a33COL IND = 2 1 1 3 3COL PTR = 1 4 6

    31/ 94

    Jagged diagonal storage (JDS)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    43/112

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    1. Shift all elements left (similar to CSR) and keep columnindices

    2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

    3. Store corresponding row permutation: PERM = 2 3 1

    4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33COL IND = 2 1 1 3 3COL PTR = 1 4 6

    31/ 94

    Jagged diagonal storage (JDS)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    44/112

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    1. Shift all elements left (similar to CSR) and keep columnindices

    2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

    3. Store corresponding row permutation: PERM = 2 3 1

    4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33COL IND = 2 1 1 3 3COL PTR = 1 4 6

    31/ 94

    Jagged diagonal storage (JDS)1 2 3

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    45/112

    a31

    a23a22

    a11

    a33

    1 2 3

    1

    2

    3

    1. Shift all elements left (similar to CSR) and keep columnindices

    2. Sort rows in decreasing order of their number of nonzeros

    3. Store corresponding row permutation: PERM = 2 3 14. Stored jagged diagonals (columns of step 2)

    VAL = a22 a31 a11 a23 a33COL IND = 2 1 1 3 3

    COL PTR = 1 4 6

    Pros: manipulate longer vectors than CSR (interesting on vectorcomputers or GPUs).Cons: extra-indirection due to permutation array.

    31/ 94

    Example of elemental matrix format

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    46/112

    A =

    1 2 3 0 02 1 1 0 01 1 3 1 30 0 1 2 10 0 3 2 1

    = A1 + A2

    A1

    =123

    1 2 32 1 11 1 1

    , A2

    =345

    2 1 31 2

    1

    3 2 1

    32/ 94

    Example of elemental matrix format

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    47/112

    A1 =

    1

    23 1 2 3

    2 1 11 1 1

    , A2 =

    3

    45 2 1 3

    1 2 13 2 1

    N=5 NELT=2 NVAR=6 A =

    NELT

    i=1 Ai

    ELTPTR [1:NELT+1] = 1 4 7ELTVAR [1:NVAR] = 1 2 3 3 4 5ELTVAL [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1

    Remarks:

    NVAR = ELTPTR(NELT+1)-1 NVAL =

    S2i (unsym) ou

    Si(Si + 1)/2 (sym), avec

    Si = ELTPTR(i + 1) ELTPTR(i) storage of elements in ELTVAL: by columns

    32/ 94

    File storage: Rutherford-Boeing

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    48/112

    Standard ASCII format for files Header + Data (CSC format). key xyz:

    x=[rcp] (real, complex, pattern) y=[suhzr] (sym., uns., herm., skew sym., rectang.) z=[ae] (assembled, elemental) ex: M T1.RSA, SHIP003.RSE

    Supplementary files: right-hand-sides, solution,permutations. . .

    Canonical format introduced to guarantee a unique

    representation (order of entries in each column, no duplicates).

    33/ 94

    File storage: Rutherford-Boeing

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    49/112

    DNV-Ex 1 : Tubular joint-1999-01-17 M_T1

    1733710 9758 492558 1231394 0

    rsa 97578 97578 4925574 0(10I8) (10I8) (3e26.16)

    1 49 96 142 187 231 274 346 417 487

    556 624 691 763 834 904 973 1041 1108 11801251 1321 1390 1458 1525 1573 1620 1666 1711 1755

    1798 1870 1941 2011 2080 2148 2215 2287 2358 2428

    2497 2565 2632 2704 2775 2845 2914 2982 3049 3115...

    1 2 3 4 5 6 7 8 9 10

    11 12 49 50 51 52 53 54 55 5657 58 59 60 67 68 69 70 71 72

    223 224 225 226 227 228 229 230 231 232

    233 234 433 434 435 436 437 438 2 3

    4 5 6 7 8 9 10 11 12 4950 51 52 53 54 55 56 57 58 59

    ...

    -0.2624989288237320E+10 0.6622960540857440E+09 0.2362753266740760E+110.3372081648690030E+08 -0.4851430162799610E+08 0.1573652896140010E+08

    0.1704332388419270E+10 -0.7300763190874110E+09 -0.7113520995891850E+10

    0.1813048723097540E+08 0.2955124446119170E+07 -0.2606931100955540E+070.1606040913919180E+07 -0.2377860366909130E+08 -0.1105180386670390E+09

    0.1610636280324100E+08 0.4230082475435230E+07 -0.1951280618776270E+07

    0.4498200951891750E+08 0.2066239484615530E+09 0.3792237438608430E+080.9819999042370710E+08 0.3881169368090200E+08 -0.4624480572242580E+08

    34/ 94

    File storage: Matrix-market

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    50/112

    Example

    %%MatrixMarket matrix coordinate real general

    % Comments

    5 5 8

    1 1 1.000e+00

    2 2 1.050e+01

    3 3 1.500e-02

    1 4 6.000e+00

    4 2 2.505e+02

    4 4 -2.800e+02

    4 5 3.332e+01

    5 5 1.200e+01

    35/ 94

    Examples of sparse matrix collections

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    51/112

    The University of Florida Sparse Matrix Collectionhttp://www.cise.ufl.edu/research/sparse/matrices/

    Matrix market http://math.nist.gov/MatrixMarket/

    Rutherford-Boeinghttp://www.cerfacs.fr/algor/Softs/RB/index.html

    TLSE http://gridtlse.org/

    36/ 94

    http://www.cise.ufl.edu/research/sparse/matrices/http://math.nist.gov/MatrixMarket/http://www.cerfacs.fr/algor/Softs/RB/index.htmlhttp://gridtlse.org/http://gridtlse.org/http://www.cerfacs.fr/algor/Softs/RB/index.htmlhttp://math.nist.gov/MatrixMarket/http://www.cise.ufl.edu/research/sparse/matrices/http://find/http://goback/
  • 8/6/2019 Intro Sparse

    52/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

    37/ 94

    Gaussian elimination

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    53/112

    A = A(1), b = b(1), A(1)x = b(1):0@ a11 a12 a13a21 a22 a23

    a31 a32 a33

    1A0@ x1x2

    x3

    1A =

    0@ b1b2

    b3

    1A 2 2 1 a21/a11

    3 3 1 a31/a11

    A(2)x = b(2)0B@

    a11 a12 a13

    0 a(2)22 a

    (2)23

    0 a(2)32 a

    (2)33

    1CA0@

    x1x2x3

    1A =

    0B@

    b1

    b(2)2

    b(2)3

    1CA b(2)2 = b2 a21b1/a11 . . .

    a(2)32 = a32 a31a12/a11 . . .

    Finally A(3)x = b(3)0B@

    a11 a12 a13

    0 a(2)22 a

    (2)23

    0 0 a(3)33

    1CA0@

    x1x2x3

    1A =

    0B@

    b1

    b(2)2

    b(3)3

    1CA

    a(3)(33)

    = a(2)(33)

    a(2)32 a

    (2)23 /a

    (2)22 . . .

    Typical Gaussian elimination step k : a(k+1)ij = a

    (k)ij

    a(k)ik

    a(k)kj

    a(k)kk

    38/ 94

    Relation with A = LU factorization

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    54/112

    One step of Gaussian elimination can be written:

    A

    (k+1)

    = L

    (k)

    A

    (k)

    , with

    L(k) =

    0BBBBBBB@

    1.

    .1

    lk+1,k .. .

    ln,k 1

    1CCCCCCCA

    and lik =a

    (k)ik

    a(k)kk

    .

    Then, A(n)

    = U = L(n1)

    . . .L(1)

    A, which gives A = LU ,

    with L = [L(1)]1 . . . [L(n1)]1 =

    0BBB@

    1 0.

    ..

    li,j 1

    1CCCA ,

    In dense codes, entries of L and U overwrite entries of A.

    Furthermore, if A is symmetric, A = LDLT with dkk = a(k)kk :A = LU = At = UtLt implies (U)(Lt)1 = L1Ut = D diagonal

    and U = DLt, thus A = L(DLt) = LDLt

    Gaussian elimination and sparsity

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    55/112

    Step k of LU factorization (akk pivot):

    For i> k compute lik = aik/akk (= aik),

    For i> k,j> k

    aij = aij aik akj

    akkor

    a

    ij = aij lik akj If aik = 0 and akj = 0 then aij = 0 If aij was zero its non-zero value must be stored

    k j

    k

    i

    x

    x

    x

    x

    k j

    k

    i

    x

    x

    x

    0

    fill-in

    40/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    56/112

    Idem for Cholesky : For i> k compute lik = aik/

    akk (= a

    ik),

    For i> k,j> k,j i (lower triang.)

    aij = aij

    aik ajkakk

    oraij = aij lik ajk

    41/ 94

    Example

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    57/112

    Original matrix

    x x x x x

    x x

    x x

    x xx x

    Matrix is full after the first step of elimination

    After reordering the matrix (1st row and column last rowand column)

    42/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    58/112

    x x

    x xx x

    x x

    x x x x x

    No fill-in Ordering the variables has a strong impact on

    the fill-in the number of operations

    NP-hard problem in general (Yannakakis, 1981)

    43/ 94

    Illustration: Reverse Cuthill-McKee on matrixdwt 592.rua

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    59/112

    dwt 592.rua

    Harwell-Boeing matrix: dwt 592.rua, structural computing on a

    submarine. NZ(LU factors)=58202

    Original matrix Factorized matrix

    0 100 200 300 400 500

    0

    100

    200

    300

    400

    500

    nz = 5104

    0 100 200 300 400 500

    0

    100

    200

    300

    400

    500

    nz = 58202

    44/ 94

    Illustration: Reverse Cuthill-McKee on matrixdwt 592.rua

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    60/112

    NZ(LU factors)=16924

    Permuted matrix Factorized permuted matrix(RCM)

    0 100 200 300 400 500

    0

    100

    200

    300

    400

    500

    nz = 5104

    0 100 200 300 400 500

    0

    100

    200

    300

    400

    500

    nz = 16924

    44/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    61/112

    Table: Benefits of Sparsity on a matrix of order 2021 2021 with 7353nonzeros. (Dongarra etal 91) .

    Procedure Total storage Flops Time (sec.)on CRAY J90

    Full Syst. 4084 Kwords 5503 106

    34.5Sparse Syst. 71 Kwords 1073106 3.4Sparse Syst. and reordering 14 Kwords 42103 0.9

    45/ 94

    Control of numerical stability: numerical pivoting

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    62/112

    In dense linear algebra partial pivoting commonly used (ateach step the largest entry in the column is selected).

    In sparse linear algebra, flexibility to preserve sparsity isoffered : Partial threshold pivoting : Eligible pivots are not too small

    with respect to the maximum in the column.

    Set of eligible pivots =

    {r

    | |a

    (k)rk

    | u

    maxi

    |a

    (k)ik

    |}, where

    0 < u 1. Then among eligible pivots select one preserving better

    sparsity. u is called the threshold parameter (u = 1 partial pivoting). It restricts the maximum possible growth of: aij = aij

    aikakjakk

    to 1 + 1u which is sufficient to the preserve numerical stability. u 0.1 is often chosen in practice.

    For symmetric indefinite problems 2by2 pivots (withthreshold) is also used to preserve symmetry and sparsity.

    Threshold pivoting and numerical accuracy

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    63/112

    Table: Effect of variation in threshold parameter u on matrix 541 541with 4285 nonzeros (Dongarra etal 91) .

    u Nonzeros in LU factors Error

    1.0 16767 3

    109

    0.25 14249 6 10100.1 13660 4 1090.01 15045 1 105104 16198 1 102

    1010

    16553 3 1023

    47/ 94

    Threshold pivoting and numerical accuracy

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    64/112

    Table: Effect of variation in threshold parameter u on matrix 541 541with 4285 nonzeros (Dongarra etal 91) .

    u Nonzeros in LU factors Error

    1.0 16767 3 109

    0.25 14249 6 1010

    0.1 13660 4 1090.01 15045 1 105104 16198 1 1021010 16553 3 1023

    Difficulty: numerical pivoting implies dynamic datastructures thatcan not be forecasted symbolically

    47/ 94

    Three-phase scheme to solve Ax = b

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    65/112

    1. Analysis step Preprocessing of A (symmetric/unsymmetric orderings,

    scalings) Build the dependency graph (elimination tree, eDAG . . . )

    2. Factorization (A = LU, LDLT

    , LLT

    , QR)Numerical pivoting

    3. Solution based on factored matrices triangular solves: Ly = b, then Ux = y improvement of solution (iterative refinement), error analysis

    48/ 94

    Efficient implementation of sparse algorithms

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    66/112

    Indirect addressing is often used in sparse calculations: e.g.sparse SAXPY

    d o i = 1 , m

    A( ind(i) ) = A( ind(i) ) + alpha * w( i )

    enddo

    Even if manufacturers provide hardware for improving indirectaddressing It penalizes the performance

    Identify dense blocks or switch to dense calculations as soon

    as the matrix is not sparse enough

    49/ 94

    Effect of switch to dense calculations

    M t i f 5 i t di ti ti f th L l i 50 50

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    67/112

    Matrix from 5-point discretization of the Laplacian on a 50 50grid (Dongarra etal 91)

    Density for Order of Millions Timeswitch to full code full submatrix of flops (seconds)

    No switch 0 7 21.81.00 74 7 21.4

    0.80 190 8 15.00.60 235 11 12.50.40 305 21 9.00.20 422 50 5.50.10 531 100 3.70.005 1420 1908 6.1

    Sparse structure should be exploited if density < 10%.

    50/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    68/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computing

    Numerical simulation and sparse matricesDirect vs iterative methodsConclusion

    51/ 94

    Main processor (r)evolutions

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    69/112

    pipelined functional units

    superscalar processors

    out-of-order execution (ILP)

    larger caches

    evolution of instruction set (CISC, RISC, EPIC, . . . )

    multicores

    52/ 94

    Pourquoi des traitements paralleles ?

    B i d l l i f i d b d di i li

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    70/112

    Besoins de calcul non satisfaits dans beaucoup de disciplines(pour resoudre des problemes significatifs)

    Performance uniprocesseur proche des limites physiques

    Temps de cycle 0.5 nanoseconde 8 GFlop/s (avec 4 operations flottantes / cycle)

    Calculateur 40 TFlop/s

    5000 coeurs

    calculateurs massivement paralleles Pas parce que cest le plus simple mais parce que cest

    necessaire

    Puissance actuelle (juin 2010, cf http://www.top500.org):Cray XT5, Oak Ridge Natl Lab:

    1.7 PFlop/s, 300 TBytes de memoire, 224256 coeurs

    53/ 94

    Quelques unites pour le calcul haute performance

    http://www.top500.org/http://www.top500.org/http://find/http://goback/
  • 8/6/2019 Intro Sparse

    71/112

    Vitesse

    1 MFlop/s 1 Megaflop/s 106 operations / seconde1 GFlop/s 1 Gigaflop/s 109 operations / seconde1 TFlop/s 1 Teraflop/s 1012 operations / seconde1 PFlop/s 1 Petaflop/s 1015 operations / seconde1 EFlop/s 1 Exaflop/s 1015 operations / seconde

    Memoire

    1 kB / 1 ko 1 kilobyte 103 octets1 MB / 1 Mo 1 Megabyte 106 octets1 GB / 1 Go 1 Gigabyte 109 octets

    1 TB / 1 To 1 Terabyte 1012

    octets1 PB / 1 Po 1 Petabyte 1015 octets

    54/ 94

    Mesures de performance

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    72/112

    Nombre doperations flottantes par seconde (pas MIPS) Performance crete :

    Ce qui figure sur la publicite des constructeurs Suppose que toutes les unites de traitement sont actives On est sur de ne pas aller plus vite :

    Performance crete = #unites fonctionnellesclock (sec.)

    Performance reelle : Habituellement tres inferieure a la precedente

    (malheureusement)

    55/ 94

    Rapport (Performance reelle / performance de crete) souvent bas !!

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    73/112

    Soit P un programme :

    1. Processeur sequentiel: 1 unite scalaire (1 GFlop/s) Temps dexecution de P : 100 s

    2. Machine parallele a 100 processeurs: Chaque processor: 1 GFlop/s Performance crete: 100 GFlop/s

    3. Si P : code sequentiel (10%) + code parallelise (90%) Temps dexecution de P : 0.9 + 10 = 10.9 s Performance reelle : 9.2 GFlop/s

    4.Performance reelle

    Performance de crete = 0.1

    56/ 94

    Moores law

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    74/112

    Gordon Moore (co-fondateur dIntel) a predit en 1965 que ladensite en transitors des circuits integres doublerait tous les24 mois.

    A aussi servi de but a atteindre pour les fabriquants. A ete deforme:

    24 18 mois nombre de transistors performance

    57/ 94

    Comment accrotre la vitesse de calcul ?

    Accelerer la frequence avec des technologies plus rapides

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    75/112

    q g p p

    On approche des limites:

    Conception des puces Consommation electrique et chaleur dissipee Refroidissement probleme despace

    On peut encore miniaturiser, mais: pas indefiniment

    resistance des conducteurs (R =ls ) augmente et ..

    la resistance est responsable de la dissipation denergie (effetJoule).

    effets de capacites difficiles a matriser

    Remarque: 0.5 nanoseconde = temps pour quun signal

    parcourt 15 cm de cable

    Temps de cycle 0.5 nanosecond 8 GFlop/s (avec 4operations flottantes par cycle)

    58/ 94

    Seule solution: le parallelisme

    parallelisme: execution simultanee de plusieurs instructions a

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    76/112

    parallelisme: execution simultanee de plusieurs instructions alinterieur dun programme

    A linterieur dun cur : micro-instructions traitement pipeline recouvrement dinstructions executees par des unites de calcul

    distinctes

    transparent pour le programmeur(gere par le compilateur ou durant lexecution)

    Entre des processeurs ou curs distincts: suites dinstructions differentes executees synchronisations:

    implicites (compilateur, parallelisation automatique, utilisationde librairies paralleles)

    ou explicites (transfert de messages, programmationmultithreads, sections critiques)

    59/ 94

    Probleme dacces aux donnees

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    77/112

    On est souvent (en pratique) a 10% de la performance crete Processeurs plus rapides acces aux donnees plus rapide :

    organisation processeur, organisation memoire, communication inter-processeurs

    Hardware plus complexe : pipe, technologie, reseau, . . . Logiciel plus complexe : compilateur, systeme dexploitation,

    langages de programmation, gestion du parallelisme, outils demise au point . . . applications

    Il devient plus difficile de programmer efficacement

    60/ 94

    Problemes de debit memoire

    Lacces aux donnees est un probleme crucial dans les

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    78/112

    calculateurs modernes

    Performance processeur : + 60% par an Memoire DRAM : + 9% par an

    Ratio performance processeurtemps acces memoire augmente denviron 50% par an!MFlop/s plus faciles que MB/s pour debit memoire

    Hierarchie memoire de plus en plus complexe (mais latenceaugmente) Facon dacceder aux donnees de plus en plus critique:

    Minimiser les defauts de cache Minimiser la pagination memoire

    Localite: ameliorer le rapport references a des memoireslocales/ references a des memoires a distance Reutilisation, blocage: accrotre le ratio flops/memory access Gestion des transferts de donnees a la main ? (Cell, GPU)

    61/ 94

    Average access time (# cycles) hit/missSize

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    79/112

    Cache level #2

    Cache level #1 12 / 8 66

    615 / 30 200

    Main memory 10 100

    Remote memory 500 5000

    Registers < 1

    256 KB 16 MB

    1 128 KB

    Disks 700,000 / 6,000,000

    1 10 GB

    Figure: Exemple de hierarchie memoire.

    62/ 94

    Conception memoire pour nombre important deprocesseurs ?

    Comment 500 processeurs peuvent ils avoir acces a des donnees

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    80/112

    Comment 500 processeurs peuvent-ils avoir acces a des donneesrangees dans une memoire partagee (technologie, interconnexion,prix ?) Solution a cout raisonnable : memoire physiquement distribuee(chaque processeur ou groupe de processeurs a sa propre memoirelocale)

    2 solutions : memoires locales globalement adressables : Calulateurs a

    memoire partagee virtuelle transferts explicites des donnees entre processeurs avec

    echanges de messages Scalabilite impose :

    augmentation lineaire debit memoire / vitesse du processeur augmentation du debit des communications / nombre de

    processeurs Rapport cout/performance memoire distribuee et bon

    rapport cout/performance sur les processeurs63/ 94

    Architecture des multiprocesseurs

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    81/112

    Nombre eleve de processeurs

    memoire physiquement distribuee

    Organisation Organisation physiquelogique Partagee (32 procs max) DistribueePartagee multiprocesseurs espace dadressage global

    a memoire partagee (hard/soft) au dessus de messagesmemoire partagee virtuelleDistribuee emulation de messages echange de messages

    (buffers)

    Table: Organisation des processeurs

    64/ 94

    Terminologie

    Architecture SMP (Symmetric Multi Processor)

    M i ( h i l i )

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    82/112

    Memoire partagee (physiquement et logiquement)

    Temps dacces uniforme a la memoire Similaire du point de vue applicatif aux architectures

    multi-curs (1 cur = 1 processeur logique)

    Mais communications bcp plus rapides dans les multi-curs

    (latence < 3ns, bande passantee > 20 GB/s) que dans lesSMP (latence 60ns, bande passantee 2 GB/s)

    Architecture NUMA (Non Uniform Memory Access)

    Memoire physiquement distribuee et logiquement partagee

    Plus facile daugmenter le nombre de procs quen SMP

    Temps dacces depend de la localisation de la donnee

    Acces locaux plus rapides quacces distants

    hardware permet la coherence des caches (ccNUMA)65/ 94

    Programmation

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    83/112

    Standards de programmation

    Org. logique partagee: threads POSIX, directives OpenMPOrg. logique distribuee: PVM, MPI, sockets (Message Passing)

    Programmation hybride: MPI + OpenMP

    Machines a 1 million de curs? architectures emergentes typeGPGPU ? pas encore de standard !

    66/ 94

    Evolution du calcul haute performance

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    84/112

    Evolution rapide des architectures haute performance (SMP,clusters, NUMA, multicoeurs, Cell, GPUs, . . . )

    Parallelisme a plusieurs niveaux Hierarchie memoire de plus en plus complexe

    Programmation de plus en plus difficile avec des outils logicielset des standards qui ont toujours un temps de retard.

    67/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    85/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computing

    Numerical simulation and sparse matricesDirect vs iterative methodsConclusion

    68/ 94

    Simulation numerique et matrices creuses

    Demarche generale pour le calcul scientifique:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    86/112

    Demarche generale pour le calcul scientifique:

    1. Probleme de simulation (probleme continu)2. Application de lois physiques (Equations aux deriveespartielles)

    3. Discretisation, mise en equations en dimension finie4. Resolution de systemes lineaires (Ax = b)5. (Etude des resultats, remise en cause eventuelle du modele ou

    de la methode)

    Resolution de systemes lineaires=noyau algorithmiquefondamental. Parametres a prendre en compte: Proprietes du systeme (symetrie, defini positif,

    conditionnement, sur-determine, . . . ) Structure: dense ou creux, Taille: plusieurs millions dequations ?

    69/ 94

    Equations aux derivees partielles

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    87/112

    Modelisation dun phenomene physique

    Equations differentielles impliquant: forces moments

    temperatures vitesses energies temps

    Solutions analytiques rarement disponibles

    70/ 94

    Exemples dequations aux derivees partielles

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    88/112

    Trouver le potentiel electrique pour une distribution de

    charge donnee f:2 = f = f, or2

    x2(x, y, z) +

    2

    y2(x, y, z) +

    2

    z2(x, y, z) = f(x, y, z)

    Equation de la chaleur (ou equation de Fourier):

    2

    ux2 + 2

    uy2 + 2

    uz2 = 1 utavec u = u(x, y, z, t): temperature, : diffusivite thermique du milieu.

    Equations de propagation dondes, equation de Schrodinger,Navier-Stokes,. . .

    71/ 94

    Discretisation (etape qui suit la modelisationphysique)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    89/112

    Travail du numericien: Realisation dun maillage (regulier, irregulier)

    Choix des methodes de resolution et etude de leurcomportement

    Etude de la perte dinformation due au passage a la dimensionfinie

    Principales techniques de discretisation

    Differences finies

    Elements finis Volumes finis

    72/ 94

    Discretization with finite differences (1D)

    Basic approximation (ok if h is small enough):

    ( ) ( )

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    90/112

    dudx

    (x) u(x + h)

    u(x)

    h

    Results from Taylors formula

    u(x + h) = u(x) + hdu

    dx

    +h2

    2

    d2u

    dx2

    +h3

    6

    d3u

    dx3

    + O(h4)

    Replacing h by h:

    u(x h) = u(x) h dudx

    +h2

    2

    d2u

    dx2 h

    3

    6

    d3u

    dx3+ O(h4)

    Thus:

    d2u

    dx2=

    u(x + h) 2u(x) + u(x h)h2

    + O(h2)

    73/ 94

    Discretization with finite differences (1D)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    91/112

    d

    2

    udx2 = u(x + h) 2u(x) + u(x h)h2 + O(h2)

    3-point stencil for the centered difference approximation to

    the second order derivative:

    21 1

    74/ 94

    Finite Differences for the Laplacian Operator (2D)

    Assuming same mesh refinement h in x and y directions:

    ( ) u(xh y)2u(x y)+u(x+h y) u(x yh)2u(x y)+u(x y+h)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    92/112

    u(x)

    u(x h,y) 2u(x,y)+u(x+h,y)

    h2+ u(x,y h) 2u(x,y)+u(x,y+h)

    h2

    u(x) 1h2

    (u(xh, y)+ u(x+h, y)+ u(x, yh)+ u(x, y+ h)4u(x, y))

    1 1

    1

    1

    4 4

    1 1

    11

    5-point stencils for the centered difference approximation tothe Laplacian operator (left) standard (right) skewed

    75/ 94

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    93/112

    27-point stencil usedfor 3D geophysicalapplications (collabo-ration with GeoscienceAzur, S.Operto andJ.Virieux).

    1D example

    Consider the problem

    u(x) = f(x) for x (0, 1)u(0) = u(1) = 0

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    94/112

    xi = i

    h, i = 0, . . . , n + 1, f(xi) = fi, u(xi) = ui

    h = 1/(n + 1)

    Centered difference approximation:

    ui1 + 2ui ui+1 = h2fi (u0 = un+1 = 0),

    We obtain a linear system Au = f or (for n = 6):

    1

    h2

    2 1 0 0 0 01 2 1 0 0 0

    0

    1 2

    1 0 00 0 1 2 1 00 0 0 1 2 10 0 0 0 1 2

    u1u2u

    3u4u5u6

    =

    f1f2f

    3f4f5f6

    77/ 94

    Slightly more complicated (2D)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    95/112

    Consider an elliptic PDE:

    (a(x, y)ux

    )

    x

    (b(x, y)uy

    )

    y+ c(x, y) u = g(x, y) s u r

    u(x, y) = 0 sur

    0 x, y 1a(x, y) > 0

    b(x, y) > 0

    c(x, y) 0

    78/ 94

    Case of a regular 2D mesh:1

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    96/112

    0 1

    2 3 41

    5

    discretization step: h = 1n+1 , n = 4

    5-point finite difference scheme:

    (a(x, y)ux

    )ij

    x=

    ai+ 12,j(ui+1,j ui,j)

    h2

    ai 12,j(ui,j ui1,j)

    h2+O(h2)

    Similarly:

    (b(x, y)uy

    )ij

    y=

    bi,j+ 12

    (ui,j+1 ui,j)h2

    bi,j 1

    2(ui,j ui,j1)

    h2+O(h2)

    ai+ 12,j, bi+ 1

    2,j, cij, . . . known.

    With the ordering of unknows of the example, we obtain ali f h f

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    97/112

    linear system of the form:

    Ax = b,

    where

    x1

    u1,1 = u(1

    n+1 ,1

    n+1 )

    x2 u2,1 = u( 2n+1 , 1n+1 )x3 u3,1x4 u4,1x

    5 u

    1,2, . . .

    and A is n2 by n2, b is of size n2, with the following structure:

    80/ 94

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    |x x x | 1 |g11|

    |x x x x | 2 |g21|

    | | 3 | 31|

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    98/112

    | x x x x | 3 |g31|

    | x x 0 x | 4 |g41||x 0 x x x | 5 |g12|

    | x x x x x | 6 |g22|

    | x x x x x | 7 |g32|

    A=| x x x 0 x | 8 b=|g42|

    | x 0 x x x | 9 |g13|| x x x x x |10 |g23|

    | x x x x x |11 |g33|

    | x x x 0 x |12 |g43|

    | x 0 x x |13 |g14|

    | x x x x |14 |g24|

    | x x x x |15 |g34|

    | x x x |16 |g44|

    81/ 94

    Solution of the linear system

    Oft th t tl t i i l i l ti d

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    99/112

    Often the most costly part in a numerical simulation code

    Direct methods: L U factorization followed by triangular substitutions parallelism depends highly on the structure of the matrix

    Iterative methods: usually rely on sparse matrix-vector products algebraic preconditioner useful

    82/ 94

    Evolution in time of a complex phenomenon

    Examples:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    100/112

    climate modeling, evolution of radioactive waste, . . . heat equation:

    u(x, y, z, t) = u(x,y,z,t)

    t

    u(x, y, z, t0) = u0(x, y, z)

    Discretization in both space and time (1D case): Explicit approaches:

    u

    n+1

    j

    u

    n

    j

    tn+1tn= u

    n

    j+12un

    j +un

    j

    1h2 .

    Implicit approaches:un+1junj

    tn+1tn=

    un+1j+12u

    n+1j

    +un+1j1

    h2.

    Implicit approaches are preferred (more stable, larger timestep

    possible) but are more numerically intensive: a sparse linearsystem must be solved at each iteration.

    83/ 94

    Discretization with Finite elements

    Consider a partial differential equation of the form (Poisson

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    101/112

    Equation): u =

    2ux2

    + 2u

    y2= f

    u = 0 on

    we can show (using Greens formula) that the previousproblem is equivalent to:

    a(u, v) =

    f v dx dy v such that v = 0 on

    where a(u, v) =

    ux

    vx

    + uy

    vy

    dxdy

    84/ 94

    Finite element scheme: 1D Poisson Equation u =

    2ux2

    = f, u = 0 on

    Equivalent to

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    102/112

    a(u, v) = g(v) for all v (v| = 0)

    where a(u, v) =

    ux

    vx

    and g(v) = f(x)v(x)dx(1D: similar to integration by parts)

    Idea: we search u of the form = kkk(x)(k)k=1,n basis of functions such that k is linear on all Ei,

    and k(xi) = ik = 1 if k = i, 0 otherwise.

    k k+1k1

    xkEk Ek+1

    85/ 94

    Finite Element Scheme: 1D Poisson Equationk k+1k1

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    103/112

    xk

    Ek Ek+1

    We rewrite a(u, v) = g(v) for all k:

    a(u,k) = g(k) for all k iia(i,k) = g(k)a(i,k) =

    ix

    kx

    = 0 when |i k| 2 kth equation associated with k

    k1a(k1,k) + ka(k,k) + k+1a(k+1,k) = g(k)

    a(k1,k) =Ek

    k1

    x

    kx

    a(k+1,k) =Ek+1

    k+1x

    kx

    a(k,k) =Ek

    kx

    kx

    +Ek+1

    kx

    kx

    86/ 94

    Finite Element Scheme: 1D Poisson Equation

    From the point of view of Ek, we have a 2x2 contribution matrix:

    Ek

    k1x

    k1x Ek

    k1x

    kx IEk (k1,k1) IEk (k1,k)

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    104/112

    Ek x x Ek x xEk

    k1x

    kx

    Ek

    kx

    kx

    = Ek( k 1, k 1) Ek( k 1, k)IEk(k,k1) IEk(k,k)

    210 3 4

    E3

    1 2 3

    E4E1 E2

    IE1 (1,1) + IE2 (1,1) IE2 (1,2)

    IE2 (2,1) IE2 (2,2) + IE3 (2,2) IE3 (2,3)IE3 (2,3) IE3 (3,3) + IE4 (3,3)

    12

    3

    = g(1)g(2)

    g(3)

    87/ 94

    Finite Element Scheme in Higher Dimension

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    105/112

    Can be used for higher dimensions

    Mesh can be irregular i can be a higher degree polynomial

    Matrix pattern depends on mesh connectivity/ordering

    88/ 94

    Finite Element Scheme in Higher Dimension

    Set of elements (tetrahedras, triangles) to assemble:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    106/112

    j

    T

    i

    k C(T) =

    aTi,i aTi,j aTi,kaTj,i aTj,j aTj,k

    aTk,i aTk,j a

    Tk,k

    Needs for the parallel case Assemble the sparse matrix A =

    kC(Tk): graph coloring

    algorithms

    Parallelization domain by domain: graph partitioning

    Solution of Ax = b: high performance matrix computationkernels

    88/ 94

    Other example: linear least squares

    mathematical model + approximate measures estimatet f th d l

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    107/112

    parameters of the model

    m experiments + n parameters xi:minAx b avec: A Rmn,m n: data matrix b Rm: vector of observations

    x Rn

    : parameters of the model Solving the problem:

    Decompose A under the form A = QR, with Q orthogonal, Rtriangular

    Axb = QTAxQTb = QTQRxQTb = RxQTb Problems can be large (meteorological data, . . . ), sparse or

    not

    89/ 94

    Introduction to Sparse Matrix Computations

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    108/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computing

    Numerical simulation and sparse matricesDirect vs iterative methodsConclusion

    90/ 94

    Solution of sparse linear systems Ax = b (Direct or Iterative approaches ?

    Direct methods

    Very general/robust

    Iterative methods

    Efficiency depends on:

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    109/112

    Numerical accuracy Irregular/unstructured

    problems

    Factorization of matrix A

    May be costly

    (flops/memory) Factors can be reused for

    multiple right-hand sides b

    Computing issues

    Good granularity of

    computations Several levels of parallelism

    can be exploited

    convergence preconditioning

    numerical properties /structure of A

    Rely on efficient Mat-Vect product

    memory effective successive right-hand sides

    is problematic

    Computing issues

    Smaller granularity of

    computations Often, only one level of

    parallelism

    Introduction to Sparse Matrix Computations

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    110/112

    Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computing

    Numerical simulation and sparse matricesDirect vs iterative methodsConclusion

    92/ 94

    Summary sparse matrices

    Widely used in engineering and industry

    Irregular data structures

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    111/112

    Strong relations with graph theory Efficient algorithms are critical

    Ordering Sparse Gaussian elimination Sparse matrix-vector multiplication Parallelization

    Challenges: Modern applications leading to

    bigger and bigger problems different types of matrices and requirements

    Dynamic data structures (numerical pivoting) need fordynamic scheduling

    More and more parallelism (evolution of parallel architectures)

    93/ 94

    Suggested home reading

    Google page rank, The worlds largest matrix computation,

    http://find/http://goback/
  • 8/6/2019 Intro Sparse

    112/112

    Cleve Moler.

    Architecture and Performance Characteristics of ModernHigh Performance Computers, Georg Hager and Gerhard

    Wellein, Lect. Notes in Physics

    Optimization Techniques for Modern High PerformanceComputers, Georg Hager and Gerhard Wellein, Lect. Notes

    in Physics

    94/ 94

    http://find/http://goback/